| 1 | HOTELLING H. Relations between two sets of variates [M]// KOTZ S, JOHNSON N L. Breakthroughs in Statistics: Methodology and Distribution, Springer Series in Statistics. New York: Springer, 1992: 162-190.  10.1007/978-1-4612-4380-9_14 | 
																													
																						| 2 | FENG F, WANG X, LI R. Cross-modal retrieval with correspondence autoencoder [C]// Proceedings of the 22nd ACM International Conference on Multimedia. New York: ACM, 2014: 7-16.  10.1145/2647868.2654902 | 
																													
																						| 3 | PENG Y, CHI J. Unsupervised cross-media retrieval using domain adaptation with scene graph [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(11): 4368-4379.  10.1109/tcsvt.2019.2953692 | 
																													
																						| 4 | HU P, ZHEN L, PENG D, et al. Scalable deep multimodal learning for cross-modal retrieval [C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2019: 635-644.  10.1145/3331184.3331213 | 
																													
																						| 5 | WANG J, HE Y, KANG C, et al. Image-text cross-modal retrieval via modality-specific feature learning [C]// Proceedings of the 5th ACM International Conference on Multimedia Retrieval. New York: ACM, 2015: 347-354.  10.1145/2671188.2749341 | 
																													
																						| 6 | PENG Y, HUANG X, ZHAO Y. An overview of cross-media retrieval: concepts, methodologies, benchmarks and challenges [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(9): 2372-2385.  10.1109/tcsvt.2017.2705068 | 
																													
																						| 7 | TSAI Y H H, YEH Y R, WANG Y C F. Learning cross-domain landmarks for heterogeneous domain adaptation [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 5081-5090.  10.1109/cvpr.2016.549 | 
																													
																						| 8 | HUANG X, PENG Y, YUAN M. Cross-modal common representation learning by hybrid transfer network [C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2017: 1893-1900.  10.24963/ijcai.2017/263 | 
																													
																						| 9 | HUANG X, PENG Y. Deep cross-media knowledge transfer [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8837-8846.  10.1109/cvpr.2018.00921 | 
																													
																						| 10 | WEN X, HAN Z, YIN X, et al. Adversarial cross-modal retrieval via learning and transferring single-modal similarities [C]// Proceedings of the 2019 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2019: 478-483.  10.1109/icme.2019.00089 | 
																													
																						| 11 | COSTA PEREIRA J, COVIELLO E, DOYLE G, et al. On the role of correlation and abstraction in cross-modal multimedia retrieval [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 521-535.  10.1109/tpami.2013.142 | 
																													
																						| 12 | LI D, DIMITROVA N, LI M, et al. Multimedia content processing through cross-modal association [C]// Proceedings of the 11th ACM International Conference on Multimedia. New York: ACM, 2003: 604-611.  10.1145/957013.957143 | 
																													
																						| 13 | ANDREW G, ARORA R, BILMES J, et al. Deep canonical correlation analysis [C]// Proceedings of the 30th International Conference on Machine Learning. New York: JMLR.org, 2013: 1247-1255. | 
																													
																						| 14 | WANG B, YANG Y, XU X, et al. Adversarial cross-modal retrieval [C]// Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM, 2017: 154-162.  10.1145/3123266.3123326 | 
																													
																						| 15 | ZHEN L, HU P, WANG X, et al. Deep supervised cross-modal retrieval [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 10386-10395.  10.1109/cvpr.2019.01064 | 
																													
																						| 16 | PENG Y, HUANG X, QI J. Cross-media shared representation by hierarchical learning with multiple deep networks [C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2016: 3846-3853. | 
																													
																						| 17 | WEI Y, ZHAO Y, LU C, et al. Cross-modal retrieval with CNN visual features: a new baseline [J]. IEEE Transactions on Cybernetics, 2017, 47(2): 449-460. | 
																													
																						| 18 | PAN S J, YANG Q. A survey on transfer learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359.  10.1109/tkde.2009.191 | 
																													
																						| 19 | LONG M, WANG J, WANG J, et al. Learning transferable features with deep adaptation networks [C]// Proceedings of the 32nd International Conference on Machine Learning. New York: JMLR.org, 2015: 97-105. | 
																													
																						| 20 | HUANG X, PENG Y, YUAN M. MHTN: modal-adversarial hybrid transfer network for cross-modal retrieval [J]. IEEE Transactions on Cybernetics, 2020, 50(3): 1047-1059.  10.1109/tcyb.2018.2879846 | 
																													
																						| 21 | ZHEN L, HU P, PENG X, et al. Deep multimodal transfer learning for cross-modal retrieval [J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 798-810.  10.1109/tnnls.2020.3029181 | 
																													
																						| 22 | GRETTON A, BORGWARDT K M, RASCH M J, et al. A kernel two-sample test [J]. Journal of Machine Learning Research, 2012, 13: 723-773. | 
																													
																						| 23 | KINGMA D, BA J L. Adam: a method for stochastic optimization [EB/OL]. (2017-01-30) [2021-08-03]. . | 
																													
																						| 24 | RASHTCHIAN C, YOUNG P, HODOSH M, et al. Collecting image annotations using Amazon's Mechanical Turk [C]// Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Stroudsburg, PA: ACL, 2010: 139-147. | 
																													
																						| 25 | CHUA T S, TANG J, HONG R, et al. NUS-WIDE: a real-world web image database from National University of Singapore [C]// Proceedings of the 2009 ACM International Conference on Image and Video Retrieval. New York: ACM, 2009: No.48.  10.1145/1646396.1646452 | 
																													
																						| 26 | HARDOON D R, SZEDMAK S, SHAWETAYLOR J. Canonical correlation analysis: an overview with application to learning methods [J]. Neural Computation, 2004, 16(12): 2639-2664.  10.1162/0899766042321814 | 
																													
																						| 27 | HENDERSON P, FERRARI V. End-to-end training of object class detectors for mean average precision [C]// Proceedings of the 2016 Asian Conference on Computer Vision, LNCS 10115. Cham: Springer, 2017: 198-213. | 
																													
																						| 28 | GOUTTE C, GAUSSIER E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation [C]// Proceedings of the 2005 European Conference on Information Retrieval, LNCS 3408. Berlin: Springer, 2005: 345-359. | 
																													
																						| 29 | VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE [J]. Journal of Machine Learning Research, 2008, 9: 2579-2605. |