Deep asymmetric discrete cross-modal hashing method

doi:10.11772/j.issn.1001-9081.2021061017

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (8): 2461-2470.DOI: 10.11772/j.issn.1001-9081.2021061017

• Data science and technology • Previous Articles

Deep asymmetric discrete cross-modal hashing method

Xiaoyu WANG, Zhanqing WANG(), Wei XIONG

School of Science，Wuhan University of Technology，Wuhan Hubei 430070，China

Received:2021-06-15 Revised:2021-09-15 Accepted:2021-10-12 Online:2021-12-27 Published:2022-08-10
Contact: Zhanqing WANG
About author:WANG Xiaoyu， born in 1997， M. S. candidate. Her research interests include machine learning， cross-modal retrieval.
WANG Zhanqing， born in 1965， Ph. D.， professor. His research interests include pattern recognition， digital image processing， information computing.
XIONG Wei， born in 1996， M. S. candidate. His research interests include image processing， cross-modal retrieval.
Supported by:
Fundamental Research Funds for Central Universities(2019ZY232)

深度非对称离散跨模态哈希方法

王晓雨, 王展青(), 熊威

武汉理工大学理学院，武汉 430070

通讯作者: 王展青
作者简介:王晓雨（1997—），女，河南汝州人，硕士研究生，主要研究方向：机器学习、跨模态检索；
王展青（1965—），男，湖北武汉人，教授，博士，主要研究方向：模式识别、数字图像处理、信息计算；
熊威（1996—），男，湖北黄石人，硕士研究生，主要研究方向：图像处理、跨模态检索。
基金资助:
中央高校基本科研业务费专项资金资助项目(2019ZY232)

Abstract

Abstract:

Most deep supervised cross-modal hashing methods adopt a symmetric strategy to learn hash code， so that the supervision information in large-scale datasets cannot be used effectively. And for the problem of discrete constraints of hash code， relaxation-based strategy is typically adopted， resulting in large quantization error which leads to the sub-optimal hash code. Aiming at the above problems， a Deep Asymmetric Discrete Cross-modal Hashing （DADCH） method was proposed. Firstly， an asymmetric learning framework combining deep neural networks and dictionary learning was proposed to learn the hash code of query instances and database instances， thereby mining the supervision information of the data more effectively and reducing the training time of the model. Then， the discrete optimization algorithm was used to optimize the hash code matrix column by column to reduce the quantization error of the hash code binarization. At the same time， in order to fully mine the semantic information of the data， a label layer was added to the neural network for label prediction， and the semantic information embedding was used to embed discrimination information of different categories into the hash code through linear mapping to make the hash code more discriminative. Experimental results show that on IAPR-TC12， MIRFLICKR-25K and NUS-WIDE datasets， the mean Average Precision （mAP） of the proposed method on retrieval text by image is about 11.6， 5.2 and 14.7 percentage points higher than that of the advanced deep cross-modal retrieval method — Self-Supervised Adversarial Hashing （SSAH） proposed in recent years respectively.

Key words: cross-modal retrieval, deep neural network, asymmetric hashing, semantic information embedding, discrete optimization

摘要：

大多数深度监督跨模态哈希方法采用对称的方式学习哈希码，导致其不能有效利用大规模数据集中的监督信息；并且对于哈希码的离散约束问题，常采用的基于松弛的策略会产生较大的量化误差，导致哈希码次优。针对以上问题，提出深度非对称离散跨模态哈希（DADCH）方法。首先构造了深度神经网络和字典学习相结合的非对称学习框架，以学习查询实例和数据库实例的哈希码，从而更有效地挖掘数据的监督信息，减少模型的训练时间；然后采用离散优化算法逐列优化哈希码矩阵，降低哈希码二值化的量化误差；同时为充分挖掘数据的语义信息，在神经网络中添加了标签层进行标签预测，并利用语义信息嵌入将不同类别的判别信息通过线性映射嵌入到哈希码中，增强哈希码的判别性。实验结果表明，在IAPR-TC12、MIRFLICKR-25K和NUS-WIDE数据集上，哈希码长度为64 bit时，所提方法在图像检索文本时的平均精度均值（mAP）较近年来提出的先进的深度跨模态检索方法——自监督对抗哈希（SSAH）分别高出约11.6、5.2、14.7个百分点。

关键词: 跨模态检索, 深度神经网络, 非对称哈希, 语义信息嵌入, 离散优化

CLC Number:

TP391.3

Xiaoyu WANG, Zhanqing WANG, Wei XIONG. Deep asymmetric discrete cross-modal hashing method[J]. Journal of Computer Applications, 2022, 42(8): 2461-2470.

王晓雨, 王展青, 熊威. 深度非对称离散跨模态哈希方法[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2461-2470.

Figures/Tables 9

References 38

1	SONG J K， HE T， GAO L L， et al. Unified binary generative adversarial networks for image retrieval and compression［J］. International Journal of Computer Vision， 2020， 128（8/9）： 2243-2264. 10.1007/s11263-020-01305-2
2	LIN J， LI Z C， TANG J H. Discriminative deep hashing for scalable face image retrieval ［C］// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2017： 2266-2272. 10.24963/ijcai.2017/315
3	欧卫华，刘彬，周永辉，等.跨模态检索研究综述［J］.贵州师范大学学报（自然科学版）， 2018， 36（2）： 114-120.
	OU W H， LIU B， ZHOU Y H， et al. A review of cross-modal retrieval research［J］. Journal of Guizhou Normal University （Natural Sciences）， 2018， 36（2）： 114-120.
4	邓一姣，张凤荔，陈学勤，等.面向跨模态检索的协同注意力网络模型［J］.计算机科学， 2020， 47（4）： 54-59. 10.11896/jsjkx.190600181
	DENG Y J， ZHANG F L， CHEN X Q， et al. Collaborative attention network model for cross-modal retrieval［J］. Computer Science， 2020， 47（4）： 54-59. 10.11896/jsjkx.190600181
5	PENG Y X， HUAGN X， ZHAO Y Z. An overview of cross-media retrieval： concepts， methodologies， benchmarks， and challenges［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2018， 28（9）： 2372-2385.
6	WANG D， GAO X B， WANG X M， et al. Semantic topic multimodal hashing for cross-media retrieval ［C］// Proceedings of the 24th International Joint Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2015： 3890-3896.
7	KUMAR S， UDUPA R. Learning hash functions for cross-view similarity search ［C］// Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2011： 1360-1365.
8	ZHANG D Q， LI W J. Large-scale supervised multimodal hashing with semantic correlation maximization ［C］// Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2014： 2177-2183. 10.1609/aaai.v28i1.8995
9	LIN Z J， DING G G， HU M Q， et al. Semantics-preserving hashing for cross-view retrieval ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3864-3872. 10.1109/cvpr.2015.7299011
10	ZHANG J， PENG Y X， YUAN M K. Unsupervised generative adversarial cross-modal hashing ［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 539-546. 10.1609/aaai.v32i1.11263
11	ZHANG X， LAN H J， FENG J S. Attention-aware deep adversarial hashing for cross-modal retrieval ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11219. Cham： Springer， 2018： 614-629.
12	MENG M， WANG H T， YU J， et al. Asymmetric supervised consistent and specific hashing for cross-modal retrieval［J］. IEEE Transactions on Image Processing， 2021， 30： 986-1000. 10.1109/tip.2020.3038365
13	JIANG Q Y， LI W J. Asymmetric deep supervised hashing ［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 3342-3349. 10.1609/aaai.v32i1.11814
14	QIANG H P， WAN Y， LIU Z Y， et al. Discriminative deep asymmetric supervised hashing for cross-modal retrieval［J］. Knowledge-Based Systems， 2020， 204： No.106188. 10.1016/j.knosys.2020.106188
15	LIN G S， SHEN C H， SUTER D， et al. A general two-step approach to learning-based hashing ［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2013： 2552-2559. 10.1109/iccv.2013.317
16	LIN Q B， CAO W M， HE Z Q， et al. Semantic deep cross-modal hashing［J］. Neurocomputing， 2020， 396： 113-122. 10.1016/j.neucom.2020.02.043
17	DING G G， GUO Y C， ZHOU J L. Collective matrix factorization hashing for multimodal data ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 2083-2090. 10.1109/cvpr.2014.267
18	LIU H， JI R R， WU Y J， et al. Cross-modality binary code learning via fusion similarity hashing ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6345-6353. 10.1109/cvpr.2017.672
19	XU X S. Dictionary learning based hashing for cross-modal retrieval ［C］// Proceedings of the 24th ACM International Conference on Multimedia. New York： ACM， 2016： 177-181. 10.1145/2964284.2967206
20	HU D， NIE F P， LI X L. Deep binary reconstruction for cross-modal hashing［J］. IEEE Transactions on Multimedia， 2019， 21（4）： 973-985. 10.1109/tmm.2018.2866771
21	YANG D J， WU D Y， ZHANG W Q， et al. Deep semantic-alignment hashing for unsupervised cross-modal retrieval ［C］// Proceedings of the 2020 International Conference on Multimedia Retrieval. New York： ACM， 2020： 44-52. 10.1145/3372278.3390673
22	TANG J， WANG K， SHAO L. Supervised matrix factorization hashing for cross-modal retrieval［J］. IEEE Transactions on Image Processing， 2016， 25（7）： 3157-3166. 10.1109/tip.2016.2564638
23	JIANG Q Y， LI W J. Discrete latent factor model for cross-modal hashing［J］. IEEE Transactions on Image Processing， 2019， 28（7）： 3490-3501. 10.1109/tip.2019.2897944
24	WU Y， LUO X， XU X S， et al. Dictionary learning based supervised discrete hashing for cross-media retrieval ［C］// Proceedings of the 2018 ACM International Conference on Multimedia Retrieval. New York： ACM， 2018： 222-230. 10.1145/3206025.3206045
25	JIANG Q Y， LI W J. Deep cross-modal hashing ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3270-3278. 10.1109/cvpr.2017.348
26	LI C， DENG C， LI N， et al. Self-supervised adversarial hashing networks for cross-modal retrieval ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4242-4251. 10.1109/cvpr.2018.00446
27	JIN L， LI K， LI Z C， et al. Deep semantic-preserving ordinal hashing for cross-modal similarity search［J］. IEEE Transactions on Neural Networks and Learning Systems， 2019， 30（5）： 1429-1440. 10.1109/tnnls.2018.2869601
28	CHATFIELD K， SIMONYAN K， VEDALDI A， et al. Return of the devil in the details： delving deep into convolutional nets ［C］// Proceedings of the 2014 British Machine Vision Conference. Durham： BMVA Press， 2014： No.54. 10.5244/c.28.6
29	戚玉丹，张化祥，刘一鹤.基于字典学习的跨媒体检索技术［J］.计算机应用研究， 2019， 36（4）： 1265-1269.
	QI Y D， ZHANG H X， LIU Y H. Cross-media retrieval technology based on dictionary learning［J］. Application Research of Computers， 2019， 36（4）： 1265-1269.
30	WU Y L， WANG S H， HUANG Q M. Multi-modal semantic autoencoder for cross-modal retrieval［J］. Neurocomputing， 2019， 331： 165-175. 10.1016/j.neucom.2018.11.042
31	ZHAN Y B， YU J， YU Z， et al. Comprehensive distance-preserving autoencoders for cross-modal retrieval ［C］// Proceedings of the 26th ACM International Conference on Multimedia. New York： ACM， 2018： 1137-1145. 10.1145/3240508.3240607
32	AMARI S. Backpropagation and stochastic gradient descent method［J］. Neurocomputing， 1993， 5（4/5）： 185-196. 10.1016/0925-2312(93)90006-o
33	SHEN F M， SHEN C H， LIU W， et al. Supervised discrete hashing ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 37-45. 10.1109/cvpr.2015.7298598
34	ESCALANTE H J， HERNÁNDEZ C A， GONZALEZ J， et al. The segmented and annotated IAPR TC-12 benchmark［J］. Computer Vision and Image Understanding， 2010， 114（4）： 419-428. 10.1016/j.cviu.2009.03.008
35	HUISKES M J， LEW M S. The MIR Flickr retrieval evaluation ［C］// Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. New York： ACM， 2008： 39-43. 10.1145/1460096.1460104
36	CHUA T S， TANG J H， HONG R C， et al. NUS-WIDE： a real-world web image database from National University of Singapore ［C］// Proceedings of the 2009 ACM International Conference on Image and Video Retrieval. New York： ACM， 2009： No.48. 10.1145/1646396.1646452
37	HENDERSON P， FERRARI V. End-to-end training of object class detectors for mean average precision ［C］// Proceedings of the 13th Asian Conference on Computer Vision， LNCS 10115. Cham： Springer， 2017： 198-213.
38	GOUTTE C， GAUSSIER E. A probabilistic interpretation of precision， recall and F-score， with implication for evaluation ［C］// Proceedings of the 2005 European Conference on Advances in Information Retrieval Research， LNCS 3408. Berlin： Springer， 2005： 345-359.

网络层	设置
conv1	k. 64×11×11； s. 4×4， pad 0， LRN， ×2 pool
conv2	k. 256×5×5； s. 1×1， pad 2， LRN， ×2 pool
conv3	k. 256×3×3； s. 1×1， pad 1
conv4	k. 256×3×3； s. 1×1， pad 1
conv5	k. 256×3×3； s. 1×1， pad 1，Max Pooling
fc6	4 096
fc7	512
hash/label layer	r + c

网络层	设置
conv1	k. 64×11×11； s. 4×4， pad 0， LRN， ×2 pool
conv2	k. 256×5×5； s. 1×1， pad 2， LRN， ×2 pool
conv3	k. 256×3×3； s. 1×1， pad 1
conv4	k. 256×3×3； s. 1×1， pad 1
conv5	k. 256×3×3； s. 1×1， pad 1，Max Pooling
fc6	4 096
fc7	512
hash/label layer	r + c

网络层	参数配置
fc1	4 096
fc2	512
hash/label layer	r + c

网络层	参数配置
fc1	4 096
fc2	512
hash/label layer	r + c

任务	方法	IAPR-TC12			MIRFLICKR-25K			NUS-WIDE
任务	方法	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit
I→T	CVH	0.342	0.336	0.330	0.557	0.554	0.554	0.374	0.366	0.361
	STMH	0.377	0.400	0.413	0.613	0.621	0.627	0.471	0.486	0.494
	SCM	0.369	0.366	0.380	0.671	0.682	0.685	0.540	0.548	0.555
	SePH	0.444	0.456	0.463	0.712	0.719	0.723	0.603	0.613	0.621
	DCMH	0.453	0.473	0.484	0.741	0.746	0.749	0.590	0.603	0.609
	ADAH	0.529	0.528	0.544	0.756	0.711	0.772	0.640	0.629	0.652
	SSAH	0.500	0.533	0.553	0.782	0.790	0.800	0.642	0.636	0.639
	DADCH	0.602	0.643	0.669	0.832	0.843	0.852	0.765	0.778	0.786
T→I	CVH	0.349	0.343	0.337	0.574	0.571	0.571	0.361	0.349	0.339
	STMH	0.368	0.389	0.404	0.607	0.615	0.621	0.447	0.467	0.478
	SCM	0.345	0.341	0.347	0.693	0.701	0.706	0.534	0.541	0.548
	SePH	0.442	0.456	0.464	0.721	0.726	0.731	0.598	0.602	0.610
	DCMH	0.518	0.537	0.546	0.782	0.790	0.793	0.638	0.651	0.657
	ADAH	0.535	0.556	0.564	0.792	0.806	0.807	0.678	0.697	0.703
	SSAH	0.516	0.551	0.570	0.791	0.795	0.803	0.669	0.662	0.666
	DADCH	0.611	0.649	0.671	0.835	0.847	0.857	0.767	0.786	0.795

Deep asymmetric discrete cross-modal hashing method

深度非对称离散跨模态哈希方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 38

Related Articles 15

Recommended Articles

Metrics

任务	数据集	DADCH-Ⅰ	DADCH-Ⅱ	DADCH-Ⅲ	DADCH
I→T	MIRFLICKR-25K	0.789	0.827	0.831	0.843
I→T	NUS-WIDE	0.717	0.742	0.745	0.778
T→I	MIRFLICKR-25K	0.796	0.830	0.828	0.847
T→I	NUS-WIDE	0.723	0.746	0.753	0.786

[1]	Kun LI, Qing HOU. Lightweight human pose estimation based on attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2407-2414.
[2]	Bo YANG, Hengwei ZHANG, Zheming LI, Kaiyong XU. Adversarial example generation method based on image flipping transform [J]. Journal of Computer Applications, 2022, 42(8): 2319-2325.
[3]	Yinglü XUAN, Yuan WAN, Jiahui CHEN. Time series classification by LSTM based on multi-scale convolution and attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2343-2352.
[4]	Wentao MAO, Guifang WU, Chao WU, Zhi DOU. Animation video generation model based on Chinese impressionistic style transfer [J]. Journal of Computer Applications, 2022, 42(7): 2162-2169.
[5]	Rongyuan CHEN, Jianmin YAO, Qun YAN, Zhixian LIN. Video playback speed recognition based on deep neural network [J]. Journal of Computer Applications, 2022, 42(7): 2043-2051.
[6]	Meng YU, Wentao HE, Xuchuan ZHOU, Mengtian CUI, Keqi WU, Wenjie ZHOU. Review of recommendation system [J]. Journal of Computer Applications, 2022, 42(6): 1898-1913.
[7]	Quan CHEN, Li LI, Yongle CHEN, Yuexing DUAN. Adversarial attack algorithm for deep learning interpretability [J]. Journal of Computer Applications, 2022, 42(2): 510-518.
[8]	LIU Fangming, ZHANG Hong. Cross-modal retrieval algorithm based on multi-level semantic discriminative guided hashing [J]. Journal of Computer Applications, 2021, 41(8): 2187-2192.
[9]	WANG Shuyan, HOU Zeyu, SUN Jiaze. Difference detection method of adversarial samples oriented to deep learning [J]. Journal of Computer Applications, 2021, 41(7): 1849-1856.
[10]	CHENG Meiying, QIAN Qian, NI Zhiwei, ZHU Xuhui. Self-organized migrating algorithm for multi-task optimization with information filtering [J]. Journal of Computer Applications, 2021, 41(6): 1748-1755.
[11]	ZHANG Mingming, LU Qingning, LI Wenzhong, SONG Hu. Deep neural network compression algorithm based on combined dynamic pruning [J]. Journal of Computer Applications, 2021, 41(6): 1589-1596.
[12]	ZHANG Wenye, SHANG Fangxin, GUO Hao. Mixed precision neural network quantization method based on Octave convolution [J]. Journal of Computer Applications, 2021, 41(5): 1299-1304.
[13]	YANG Li, WANG Shihui, ZHU Bo. Point-of-interest recommendation algorithm combing dynamic and static preferences [J]. Journal of Computer Applications, 2021, 41(2): 398-406.
[14]	Huibo LI, Yunxiao ZHAO, Liang BAI. Dynamic graph representation learning method based on deep neural network and gated recurrent unit [J]. Journal of Computer Applications, 2021, 41(12): 3432-3437.
[15]	CHEN Yanru, ZHANG Tujingwa, DU Qian, RAN Maoliang, WANG Hongjun. Prediction of indoor thermal comfort level of high-speed railway station based on deep forest [J]. Journal of Computer Applications, 2021, 41(1): 258-264.