Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (1): 24-31.DOI: 10.11772/j.issn.1001-9081.2023010047
• Cross-media representation learning and cognitive reasoning • Previous Articles Next Articles
Qiujie LIU, Yuan WAN(), Jie WU
Received:
2023-01-17
Revised:
2023-05-11
Accepted:
2023-05-12
Online:
2023-06-06
Published:
2024-01-10
Contact:
Yuan WAN
About author:
LIU Qiujie, born in 1999, M. S. candidate. His research interests include machine learning, pattern recognition.Supported by:
通讯作者:
万源
作者简介:
刘秋杰(1999—),男,河南驻马店人,硕士研究生,主要研究方向:机器学习、模式识别;基金资助:
CLC Number:
Qiujie LIU, Yuan WAN, Jie WU. Deep bi-modal source domain symmetrical transfer learning for cross-modal retrieval[J]. Journal of Computer Applications, 2024, 44(1): 24-31.
刘秋杰, 万源, 吴杰. 深度双模态源域对称迁移学习的跨模态检索[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 24-31.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023010047
数据集 | 分类 | CCA | CFA | KCCA | CMDN | Deep-SM | DSCMR | CHTN | DCKT | MHTN | DMTL | 本文方法 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Pascal | 图像→文本 | |||||||||||
文本→图像 | ||||||||||||
平均 | ||||||||||||
NUS-WIDE-10k | 图像→文本 | |||||||||||
文本→图像 | ||||||||||||
平均 | ||||||||||||
Wikipedia | 图像→文本 | 0.176 | 0.330 | 0.230 | 0.409 | 0.458 | 0.521 | 0.508 | 0.537 | 0.514 | 0.531 | 0.570 |
文本→图像 | 0.178 | 0.306 | 0.224 | 0.364 | 0.345 | 0.478 | 0.432 | 0.485 | 0.444 | 0.574 | 0.505 | |
平均 | 0.177 | 0.318 | 0.227 | 0.387 | 0.402 | 0.499 | 0.470 | 0.511 | 0.479 | 0.552 | 0.564 |
Tab. 1 Comparison of mAP results of different methods on three datasets
数据集 | 分类 | CCA | CFA | KCCA | CMDN | Deep-SM | DSCMR | CHTN | DCKT | MHTN | DMTL | 本文方法 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Pascal | 图像→文本 | |||||||||||
文本→图像 | ||||||||||||
平均 | ||||||||||||
NUS-WIDE-10k | 图像→文本 | |||||||||||
文本→图像 | ||||||||||||
平均 | ||||||||||||
Wikipedia | 图像→文本 | 0.176 | 0.330 | 0.230 | 0.409 | 0.458 | 0.521 | 0.508 | 0.537 | 0.514 | 0.531 | 0.570 |
文本→图像 | 0.178 | 0.306 | 0.224 | 0.364 | 0.345 | 0.478 | 0.432 | 0.485 | 0.444 | 0.574 | 0.505 | |
平均 | 0.177 | 0.318 | 0.227 | 0.387 | 0.402 | 0.499 | 0.470 | 0.511 | 0.479 | 0.552 | 0.564 |
方法 | Pascal | NUS-WIDE-10k | Wikipedia | ||||||
---|---|---|---|---|---|---|---|---|---|
图像→文本 | 文本→图像 | 平均 | 图像→文本 | 文本→图像 | 平均 | 图像→文本 | 文本→图像 | 平均 | |
DBSTL1 | 0.443 | 0.450 | 0.448 | ||||||
DBSTL2 | 0.420 | 0.410 | 0.413 | ||||||
DBSTL3 | 0.501 | 0.490 | 0.494 | ||||||
DBSTL4 | 0.492 | 0.509 | 0.503 | ||||||
DBSTL | 0.570 | 0.505 | 0.564 |
Tab. 2 Performance comparison between DBSTL and four variants
方法 | Pascal | NUS-WIDE-10k | Wikipedia | ||||||
---|---|---|---|---|---|---|---|---|---|
图像→文本 | 文本→图像 | 平均 | 图像→文本 | 文本→图像 | 平均 | 图像→文本 | 文本→图像 | 平均 | |
DBSTL1 | 0.443 | 0.450 | 0.448 | ||||||
DBSTL2 | 0.420 | 0.410 | 0.413 | ||||||
DBSTL3 | 0.501 | 0.490 | 0.494 | ||||||
DBSTL4 | 0.492 | 0.509 | 0.503 | ||||||
DBSTL | 0.570 | 0.505 | 0.564 |
1 | HOTELLING H. Relations between two sets of variates [M]// KOTZ S, JOHNSON N L. Breakthroughs in Statistics: Methodology and Distribution, Springer Series in Statistics. New York: Springer, 1992: 162-190. 10.1007/978-1-4612-4380-9_14 |
2 | FENG F, WANG X, LI R. Cross-modal retrieval with correspondence autoencoder [C]// Proceedings of the 22nd ACM International Conference on Multimedia. New York: ACM, 2014: 7-16. 10.1145/2647868.2654902 |
3 | PENG Y, CHI J. Unsupervised cross-media retrieval using domain adaptation with scene graph [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(11): 4368-4379. 10.1109/tcsvt.2019.2953692 |
4 | HU P, ZHEN L, PENG D, et al. Scalable deep multimodal learning for cross-modal retrieval [C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2019: 635-644. 10.1145/3331184.3331213 |
5 | WANG J, HE Y, KANG C, et al. Image-text cross-modal retrieval via modality-specific feature learning [C]// Proceedings of the 5th ACM International Conference on Multimedia Retrieval. New York: ACM, 2015: 347-354. 10.1145/2671188.2749341 |
6 | PENG Y, HUANG X, ZHAO Y. An overview of cross-media retrieval: concepts, methodologies, benchmarks and challenges [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(9): 2372-2385. 10.1109/tcsvt.2017.2705068 |
7 | TSAI Y H H, YEH Y R, WANG Y C F. Learning cross-domain landmarks for heterogeneous domain adaptation [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 5081-5090. 10.1109/cvpr.2016.549 |
8 | HUANG X, PENG Y, YUAN M. Cross-modal common representation learning by hybrid transfer network [C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2017: 1893-1900. 10.24963/ijcai.2017/263 |
9 | HUANG X, PENG Y. Deep cross-media knowledge transfer [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8837-8846. 10.1109/cvpr.2018.00921 |
10 | WEN X, HAN Z, YIN X, et al. Adversarial cross-modal retrieval via learning and transferring single-modal similarities [C]// Proceedings of the 2019 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2019: 478-483. 10.1109/icme.2019.00089 |
11 | COSTA PEREIRA J, COVIELLO E, DOYLE G, et al. On the role of correlation and abstraction in cross-modal multimedia retrieval [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 521-535. 10.1109/tpami.2013.142 |
12 | LI D, DIMITROVA N, LI M, et al. Multimedia content processing through cross-modal association [C]// Proceedings of the 11th ACM International Conference on Multimedia. New York: ACM, 2003: 604-611. 10.1145/957013.957143 |
13 | ANDREW G, ARORA R, BILMES J, et al. Deep canonical correlation analysis [C]// Proceedings of the 30th International Conference on Machine Learning. New York: JMLR.org, 2013: 1247-1255. |
14 | WANG B, YANG Y, XU X, et al. Adversarial cross-modal retrieval [C]// Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM, 2017: 154-162. 10.1145/3123266.3123326 |
15 | ZHEN L, HU P, WANG X, et al. Deep supervised cross-modal retrieval [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 10386-10395. 10.1109/cvpr.2019.01064 |
16 | PENG Y, HUANG X, QI J. Cross-media shared representation by hierarchical learning with multiple deep networks [C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2016: 3846-3853. |
17 | WEI Y, ZHAO Y, LU C, et al. Cross-modal retrieval with CNN visual features: a new baseline [J]. IEEE Transactions on Cybernetics, 2017, 47(2): 449-460. |
18 | PAN S J, YANG Q. A survey on transfer learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359. 10.1109/tkde.2009.191 |
19 | LONG M, WANG J, WANG J, et al. Learning transferable features with deep adaptation networks [C]// Proceedings of the 32nd International Conference on Machine Learning. New York: JMLR.org, 2015: 97-105. |
20 | HUANG X, PENG Y, YUAN M. MHTN: modal-adversarial hybrid transfer network for cross-modal retrieval [J]. IEEE Transactions on Cybernetics, 2020, 50(3): 1047-1059. 10.1109/tcyb.2018.2879846 |
21 | ZHEN L, HU P, PENG X, et al. Deep multimodal transfer learning for cross-modal retrieval [J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 798-810. 10.1109/tnnls.2020.3029181 |
22 | GRETTON A, BORGWARDT K M, RASCH M J, et al. A kernel two-sample test [J]. Journal of Machine Learning Research, 2012, 13: 723-773. |
23 | KINGMA D, BA J L. Adam: a method for stochastic optimization [EB/OL]. (2017-01-30) [2021-08-03]. . |
24 | RASHTCHIAN C, YOUNG P, HODOSH M, et al. Collecting image annotations using Amazon's Mechanical Turk [C]// Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Stroudsburg, PA: ACL, 2010: 139-147. |
25 | CHUA T S, TANG J, HONG R, et al. NUS-WIDE: a real-world web image database from National University of Singapore [C]// Proceedings of the 2009 ACM International Conference on Image and Video Retrieval. New York: ACM, 2009: No.48. 10.1145/1646396.1646452 |
26 | HARDOON D R, SZEDMAK S, SHAWETAYLOR J. Canonical correlation analysis: an overview with application to learning methods [J]. Neural Computation, 2004, 16(12): 2639-2664. 10.1162/0899766042321814 |
27 | HENDERSON P, FERRARI V. End-to-end training of object class detectors for mean average precision [C]// Proceedings of the 2016 Asian Conference on Computer Vision, LNCS 10115. Cham: Springer, 2017: 198-213. |
28 | GOUTTE C, GAUSSIER E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation [C]// Proceedings of the 2005 European Conference on Information Retrieval, LNCS 3408. Berlin: Springer, 2005: 345-359. |
29 | VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE [J]. Journal of Machine Learning Research, 2008, 9: 2579-2605. |
[1] | Kezheng CHEN, Xiaoran GUO, Yong ZHONG, Zhenping LI. Relation extraction method based on negative training and transfer learning [J]. Journal of Computer Applications, 2023, 43(8): 2426-2430. |
[2] | Zexi JIN, Lei LI, Ji LIU. Transfer learning model based on improved domain separation network [J]. Journal of Computer Applications, 2023, 43(8): 2382-2389. |
[3] | Bona XUAN, Jin LI, Yafei SONG, Zexuan MA. Malicious code classification method based on improved MobileNetV2 [J]. Journal of Computer Applications, 2023, 43(7): 2217-2225. |
[4] | Huibin ZHANG, Liping FENG, Yaojun HAO, Yining WANG. Ancient mural dynasty identification based on attention mechanism and transfer learning [J]. Journal of Computer Applications, 2023, 43(6): 1826-1832. |
[5] | Yu TAN, Xiaoqin WANG, Rushi LAN, Zhenbing LIU, Xiaonan LUO. Multi-label cross-modal hashing retrieval based on discriminative matrix factorization [J]. Journal of Computer Applications, 2023, 43(5): 1349-1354. |
[6] | Chuanbiao LI, Yuanwei BI. Stereo matching algorithm based on cross-domain adaptation [J]. Journal of Computer Applications, 2023, 43(10): 3230-3235. |
[7] | Xiaoyu WANG, Zhanqing WANG, Wei XIONG. Deep asymmetric discrete cross-modal hashing method [J]. Journal of Computer Applications, 2022, 42(8): 2461-2470. |
[8] | Ruijie YANG, Guilin ZHENG. Face liveness detection based on InceptionV3 and feature fusion [J]. Journal of Computer Applications, 2022, 42(7): 2037-2042. |
[9] | Ying CHEN, Jiong YU, Jiaying CHEN, Xusheng DU. Cross-layer data sharing based multi-task model [J]. Journal of Computer Applications, 2022, 42(5): 1447-1454. |
[10] | Mo LI, Tianliang LU, Ziheng XIE. Android malware family classification method based on code image integration [J]. Journal of Computer Applications, 2022, 42(5): 1490-1499. |
[11] | Zumin WANG, Zhihao ZHANG, Jing QIN, Changqing JI. Review of mechanical fault diagnosis technology based on convolutional neural network [J]. Journal of Computer Applications, 2022, 42(4): 1036-1043. |
[12] | Xinghua LIU, Guitao CAO, Qiubin LIN, Wenming CAO. Adaptive hybrid attention hashing for deep cross-modal retrieval [J]. Journal of Computer Applications, 2022, 42(12): 3663-3670. |
[13] | Tiankai LIANG, Bi ZENG, Guang CHEN. Federated learning survey:concepts, technologies, applications and challenges [J]. Journal of Computer Applications, 2022, 42(12): 3651-3662. |
[14] | Xiayang SHI, Fengyuan ZHANG, Jiaqi YUAN, Min HUANG. Detection of unsupervised offensive speech based on multilingual BERT [J]. Journal of Computer Applications, 2022, 42(11): 3379-3385. |
[15] | Chenguang LI, Bo ZHANG, Qian ZHAO, Xiaoping CHEN, Xingfu WANG. Empathy prediction from texts based on transfer learning [J]. Journal of Computer Applications, 2022, 42(11): 3603-3609. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||