[1] 吴飞, 庄越挺.互联网跨媒体分析与检索:理论与算法[J]. 计算机辅助设计与图形学学报, 2010, 22(1):1-9.(WU F, ZHUANG Y T. Cross media analysis and retrieval on the Web: theory and algorithm[J]. Journal of Computer-Aided Design and Computer Graphics, 2010, 22(1):1-9.) [2] CHEN X, LIU H, CARBONELL J G. Structured sparse canonical correlation analysis[EB/OL].[2016-03-10]. https://www.cs.cmu.edu/~jgc/StructuredSparseCanonicalCorrelationAnalysisAISTATS2012.pdf. [3] 张鸿, 吴飞, 庄越挺, 等.一种基于内容相关性的跨媒体检索方法[J]. 计算机学报, 2008, 31(5):820-826.(ZHANG H, WU F, ZHUANG Y T, et al. Cross-media retrieval method based on content correlation[J]. Chinese Journal of Computers, 2008, 31(5):820-826.) [4] PUTTHIVIDHY D, ATTIAS H T, NAGARAJAN S S. Topic regression multi-modal latent Dirichlet allocation for image annotation[C]//Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2010: 3408-3415. [5] WU F, JIANG X, LI X, et al., Cross-modal learning to rank via latent joint representation[J]. IEEE Transactions on Image Processing, 2015, 24(5): 1497-1509. [6] GONG Y, KE Q, ISARD M, et al. A multi-view embedding space for modeling Internet images, tags, and their semantics[J]. International Journal of Computer Vision, 2014, 106(2):210-233. [7] ZHEN Y, YEUNG D Y. A probabilistic model for multimodal hash function learning[C]//KDD 2012: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2012: 940-948. [8] SHANG X, ZHANG H, CHUA T-S. Deep learning generic features for cross-media retrieval[C]//MMM 2016: Proceedings of the 22nd International Conference on MultiMedia Modeling, LNCS 9516. Berlin: Springer, 2016: 264-275. [9] FROME A, CORRADO G, SHLENS J, et al. DeViSE: a deep visual-semantic embedding model[EB/OL].[2016-03-10]. https://papers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model.pdf. [10] MA L, LU Z, SHANG L, et al. Multimodal convolutional neural networks for matching image and sentence[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 2623-2631. [11] SRIVASTAVA N, SALAKHUTDINOV R. Multimodal learning with deep Botzmann machines[EB/OL].[2016-03-10]. http://jmlr.org/papers/volume15/srivastava14b/srivastava14b.pdf. [12] WU F, YU Z, YI Y, et al. Sparse multi-modal hashing[J]. IEEE Transactions on Multimedia, 2014, 16(2):427-439. [13] ZHUANG Y, YU Z, WANG W, et al. Cross-media hashing with neural networks[C]//MM 2014: Proceedings of the 22nd ACM International Conference on Multimedia. New York: ACM, 2014: 901-904. [14] RAFAILIDIS D, CRESTANI F. Cluster-based joint matrix factorization hashing for cross-modal retrieval[C]//SIGIR 2016: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2016: 781-784. [15] ZHAO F, HUANG Y, WANG L, et al. Deep semantic ranking based hashing for multi-label image retrieval[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2015: 1556-1564. |