深度非对称离散跨模态哈希方法

doi:10.11772/j.issn.1001-9081.2021061017

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2461-2470.DOI: 10.11772/j.issn.1001-9081.2021061017

深度非对称离散跨模态哈希方法

王晓雨, 王展青(), 熊威

武汉理工大学理学院，武汉 430070

收稿日期:2021-06-15 修回日期:2021-09-15 接受日期:2021-10-12 发布日期:2021-12-27 出版日期:2022-08-10
通讯作者: 王展青
作者简介:王晓雨（1997—），女，河南汝州人，硕士研究生，主要研究方向：机器学习、跨模态检索；
王展青（1965—），男，湖北武汉人，教授，博士，主要研究方向：模式识别、数字图像处理、信息计算；
熊威（1996—），男，湖北黄石人，硕士研究生，主要研究方向：图像处理、跨模态检索。
基金资助:
中央高校基本科研业务费专项资金资助项目(2019ZY232)

Deep asymmetric discrete cross-modal hashing method

Xiaoyu WANG, Zhanqing WANG(), Wei XIONG

School of Science，Wuhan University of Technology，Wuhan Hubei 430070，China

Received:2021-06-15 Revised:2021-09-15 Accepted:2021-10-12 Online:2021-12-27 Published:2022-08-10
Contact: Zhanqing WANG
About author:WANG Xiaoyu， born in 1997， M. S. candidate. Her research interests include machine learning， cross-modal retrieval.
WANG Zhanqing， born in 1965， Ph. D.， professor. His research interests include pattern recognition， digital image processing， information computing.
XIONG Wei， born in 1996， M. S. candidate. His research interests include image processing， cross-modal retrieval.
Supported by:
Fundamental Research Funds for Central Universities(2019ZY232)

摘要/Abstract

摘要：

大多数深度监督跨模态哈希方法采用对称的方式学习哈希码，导致其不能有效利用大规模数据集中的监督信息；并且对于哈希码的离散约束问题，常采用的基于松弛的策略会产生较大的量化误差，导致哈希码次优。针对以上问题，提出深度非对称离散跨模态哈希（DADCH）方法。首先构造了深度神经网络和字典学习相结合的非对称学习框架，以学习查询实例和数据库实例的哈希码，从而更有效地挖掘数据的监督信息，减少模型的训练时间；然后采用离散优化算法逐列优化哈希码矩阵，降低哈希码二值化的量化误差；同时为充分挖掘数据的语义信息，在神经网络中添加了标签层进行标签预测，并利用语义信息嵌入将不同类别的判别信息通过线性映射嵌入到哈希码中，增强哈希码的判别性。实验结果表明，在IAPR-TC12、MIRFLICKR-25K和NUS-WIDE数据集上，哈希码长度为64 bit时，所提方法在图像检索文本时的平均精度均值（mAP）较近年来提出的先进的深度跨模态检索方法——自监督对抗哈希（SSAH）分别高出约11.6、5.2、14.7个百分点。

关键词: 跨模态检索, 深度神经网络, 非对称哈希, 语义信息嵌入, 离散优化

Abstract:

Most deep supervised cross-modal hashing methods adopt a symmetric strategy to learn hash code， so that the supervision information in large-scale datasets cannot be used effectively. And for the problem of discrete constraints of hash code， relaxation-based strategy is typically adopted， resulting in large quantization error which leads to the sub-optimal hash code. Aiming at the above problems， a Deep Asymmetric Discrete Cross-modal Hashing （DADCH） method was proposed. Firstly， an asymmetric learning framework combining deep neural networks and dictionary learning was proposed to learn the hash code of query instances and database instances， thereby mining the supervision information of the data more effectively and reducing the training time of the model. Then， the discrete optimization algorithm was used to optimize the hash code matrix column by column to reduce the quantization error of the hash code binarization. At the same time， in order to fully mine the semantic information of the data， a label layer was added to the neural network for label prediction， and the semantic information embedding was used to embed discrimination information of different categories into the hash code through linear mapping to make the hash code more discriminative. Experimental results show that on IAPR-TC12， MIRFLICKR-25K and NUS-WIDE datasets， the mean Average Precision （mAP） of the proposed method on retrieval text by image is about 11.6， 5.2 and 14.7 percentage points higher than that of the advanced deep cross-modal retrieval method — Self-Supervised Adversarial Hashing （SSAH） proposed in recent years respectively.

Key words: cross-modal retrieval, deep neural network, asymmetric hashing, semantic information embedding, discrete optimization

中图分类号:

TP391.3

王晓雨, 王展青, 熊威. 深度非对称离散跨模态哈希方法[J]. 计算机应用, 2022, 42(8): 2461-2470.

Xiaoyu WANG, Zhanqing WANG, Wei XIONG. Deep asymmetric discrete cross-modal hashing method[J]. Journal of Computer Applications, 2022, 42(8): 2461-2470.

图/表 9

参考文献 38

1	SONG J K， HE T， GAO L L， et al. Unified binary generative adversarial networks for image retrieval and compression［J］. International Journal of Computer Vision， 2020， 128（8/9）： 2243-2264. 10.1007/s11263-020-01305-2
2	LIN J， LI Z C， TANG J H. Discriminative deep hashing for scalable face image retrieval ［C］// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2017： 2266-2272. 10.24963/ijcai.2017/315
3	欧卫华，刘彬，周永辉，等.跨模态检索研究综述［J］.贵州师范大学学报（自然科学版）， 2018， 36（2）： 114-120.
	OU W H， LIU B， ZHOU Y H， et al. A review of cross-modal retrieval research［J］. Journal of Guizhou Normal University （Natural Sciences）， 2018， 36（2）： 114-120.
4	邓一姣，张凤荔，陈学勤，等.面向跨模态检索的协同注意力网络模型［J］.计算机科学， 2020， 47（4）： 54-59. 10.11896/jsjkx.190600181
	DENG Y J， ZHANG F L， CHEN X Q， et al. Collaborative attention network model for cross-modal retrieval［J］. Computer Science， 2020， 47（4）： 54-59. 10.11896/jsjkx.190600181
5	PENG Y X， HUAGN X， ZHAO Y Z. An overview of cross-media retrieval： concepts， methodologies， benchmarks， and challenges［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2018， 28（9）： 2372-2385.
6	WANG D， GAO X B， WANG X M， et al. Semantic topic multimodal hashing for cross-media retrieval ［C］// Proceedings of the 24th International Joint Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2015： 3890-3896.
7	KUMAR S， UDUPA R. Learning hash functions for cross-view similarity search ［C］// Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2011： 1360-1365.
8	ZHANG D Q， LI W J. Large-scale supervised multimodal hashing with semantic correlation maximization ［C］// Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2014： 2177-2183. 10.1609/aaai.v28i1.8995
9	LIN Z J， DING G G， HU M Q， et al. Semantics-preserving hashing for cross-view retrieval ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3864-3872. 10.1109/cvpr.2015.7299011
10	ZHANG J， PENG Y X， YUAN M K. Unsupervised generative adversarial cross-modal hashing ［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 539-546. 10.1609/aaai.v32i1.11263
11	ZHANG X， LAN H J， FENG J S. Attention-aware deep adversarial hashing for cross-modal retrieval ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11219. Cham： Springer， 2018： 614-629.
12	MENG M， WANG H T， YU J， et al. Asymmetric supervised consistent and specific hashing for cross-modal retrieval［J］. IEEE Transactions on Image Processing， 2021， 30： 986-1000. 10.1109/tip.2020.3038365
13	JIANG Q Y， LI W J. Asymmetric deep supervised hashing ［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 3342-3349. 10.1609/aaai.v32i1.11814
14	QIANG H P， WAN Y， LIU Z Y， et al. Discriminative deep asymmetric supervised hashing for cross-modal retrieval［J］. Knowledge-Based Systems， 2020， 204： No.106188. 10.1016/j.knosys.2020.106188
15	LIN G S， SHEN C H， SUTER D， et al. A general two-step approach to learning-based hashing ［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2013： 2552-2559. 10.1109/iccv.2013.317
16	LIN Q B， CAO W M， HE Z Q， et al. Semantic deep cross-modal hashing［J］. Neurocomputing， 2020， 396： 113-122. 10.1016/j.neucom.2020.02.043
17	DING G G， GUO Y C， ZHOU J L. Collective matrix factorization hashing for multimodal data ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 2083-2090. 10.1109/cvpr.2014.267
18	LIU H， JI R R， WU Y J， et al. Cross-modality binary code learning via fusion similarity hashing ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6345-6353. 10.1109/cvpr.2017.672
19	XU X S. Dictionary learning based hashing for cross-modal retrieval ［C］// Proceedings of the 24th ACM International Conference on Multimedia. New York： ACM， 2016： 177-181. 10.1145/2964284.2967206
20	HU D， NIE F P， LI X L. Deep binary reconstruction for cross-modal hashing［J］. IEEE Transactions on Multimedia， 2019， 21（4）： 973-985. 10.1109/tmm.2018.2866771
21	YANG D J， WU D Y， ZHANG W Q， et al. Deep semantic-alignment hashing for unsupervised cross-modal retrieval ［C］// Proceedings of the 2020 International Conference on Multimedia Retrieval. New York： ACM， 2020： 44-52. 10.1145/3372278.3390673
22	TANG J， WANG K， SHAO L. Supervised matrix factorization hashing for cross-modal retrieval［J］. IEEE Transactions on Image Processing， 2016， 25（7）： 3157-3166. 10.1109/tip.2016.2564638
23	JIANG Q Y， LI W J. Discrete latent factor model for cross-modal hashing［J］. IEEE Transactions on Image Processing， 2019， 28（7）： 3490-3501. 10.1109/tip.2019.2897944
24	WU Y， LUO X， XU X S， et al. Dictionary learning based supervised discrete hashing for cross-media retrieval ［C］// Proceedings of the 2018 ACM International Conference on Multimedia Retrieval. New York： ACM， 2018： 222-230. 10.1145/3206025.3206045
25	JIANG Q Y， LI W J. Deep cross-modal hashing ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3270-3278. 10.1109/cvpr.2017.348
26	LI C， DENG C， LI N， et al. Self-supervised adversarial hashing networks for cross-modal retrieval ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4242-4251. 10.1109/cvpr.2018.00446
27	JIN L， LI K， LI Z C， et al. Deep semantic-preserving ordinal hashing for cross-modal similarity search［J］. IEEE Transactions on Neural Networks and Learning Systems， 2019， 30（5）： 1429-1440. 10.1109/tnnls.2018.2869601
28	CHATFIELD K， SIMONYAN K， VEDALDI A， et al. Return of the devil in the details： delving deep into convolutional nets ［C］// Proceedings of the 2014 British Machine Vision Conference. Durham： BMVA Press， 2014： No.54. 10.5244/c.28.6
29	戚玉丹，张化祥，刘一鹤.基于字典学习的跨媒体检索技术［J］.计算机应用研究， 2019， 36（4）： 1265-1269.
	QI Y D， ZHANG H X， LIU Y H. Cross-media retrieval technology based on dictionary learning［J］. Application Research of Computers， 2019， 36（4）： 1265-1269.
30	WU Y L， WANG S H， HUANG Q M. Multi-modal semantic autoencoder for cross-modal retrieval［J］. Neurocomputing， 2019， 331： 165-175. 10.1016/j.neucom.2018.11.042
31	ZHAN Y B， YU J， YU Z， et al. Comprehensive distance-preserving autoencoders for cross-modal retrieval ［C］// Proceedings of the 26th ACM International Conference on Multimedia. New York： ACM， 2018： 1137-1145. 10.1145/3240508.3240607
32	AMARI S. Backpropagation and stochastic gradient descent method［J］. Neurocomputing， 1993， 5（4/5）： 185-196. 10.1016/0925-2312(93)90006-o
33	SHEN F M， SHEN C H， LIU W， et al. Supervised discrete hashing ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 37-45. 10.1109/cvpr.2015.7298598
34	ESCALANTE H J， HERNÁNDEZ C A， GONZALEZ J， et al. The segmented and annotated IAPR TC-12 benchmark［J］. Computer Vision and Image Understanding， 2010， 114（4）： 419-428. 10.1016/j.cviu.2009.03.008
35	HUISKES M J， LEW M S. The MIR Flickr retrieval evaluation ［C］// Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. New York： ACM， 2008： 39-43. 10.1145/1460096.1460104
36	CHUA T S， TANG J H， HONG R C， et al. NUS-WIDE： a real-world web image database from National University of Singapore ［C］// Proceedings of the 2009 ACM International Conference on Image and Video Retrieval. New York： ACM， 2009： No.48. 10.1145/1646396.1646452
37	HENDERSON P， FERRARI V. End-to-end training of object class detectors for mean average precision ［C］// Proceedings of the 13th Asian Conference on Computer Vision， LNCS 10115. Cham： Springer， 2017： 198-213.
38	GOUTTE C， GAUSSIER E. A probabilistic interpretation of precision， recall and F-score， with implication for evaluation ［C］// Proceedings of the 2005 European Conference on Advances in Information Retrieval Research， LNCS 3408. Berlin： Springer， 2005： 345-359.

网络层	设置
conv1	k. 64×11×11； s. 4×4， pad 0， LRN， ×2 pool
conv2	k. 256×5×5； s. 1×1， pad 2， LRN， ×2 pool
conv3	k. 256×3×3； s. 1×1， pad 1
conv4	k. 256×3×3； s. 1×1， pad 1
conv5	k. 256×3×3； s. 1×1， pad 1，Max Pooling
fc6	4 096
fc7	512
hash/label layer	r + c

网络层	设置
conv1	k. 64×11×11； s. 4×4， pad 0， LRN， ×2 pool
conv2	k. 256×5×5； s. 1×1， pad 2， LRN， ×2 pool
conv3	k. 256×3×3； s. 1×1， pad 1
conv4	k. 256×3×3； s. 1×1， pad 1
conv5	k. 256×3×3； s. 1×1， pad 1，Max Pooling
fc6	4 096
fc7	512
hash/label layer	r + c

网络层	参数配置
fc1	4 096
fc2	512
hash/label layer	r + c

网络层	参数配置
fc1	4 096
fc2	512
hash/label layer	r + c

任务	方法	IAPR-TC12			MIRFLICKR-25K			NUS-WIDE
任务	方法	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit
I→T	CVH	0.342	0.336	0.330	0.557	0.554	0.554	0.374	0.366	0.361
	STMH	0.377	0.400	0.413	0.613	0.621	0.627	0.471	0.486	0.494
	SCM	0.369	0.366	0.380	0.671	0.682	0.685	0.540	0.548	0.555
	SePH	0.444	0.456	0.463	0.712	0.719	0.723	0.603	0.613	0.621
	DCMH	0.453	0.473	0.484	0.741	0.746	0.749	0.590	0.603	0.609
	ADAH	0.529	0.528	0.544	0.756	0.711	0.772	0.640	0.629	0.652
	SSAH	0.500	0.533	0.553	0.782	0.790	0.800	0.642	0.636	0.639
	DADCH	0.602	0.643	0.669	0.832	0.843	0.852	0.765	0.778	0.786
T→I	CVH	0.349	0.343	0.337	0.574	0.571	0.571	0.361	0.349	0.339
	STMH	0.368	0.389	0.404	0.607	0.615	0.621	0.447	0.467	0.478
	SCM	0.345	0.341	0.347	0.693	0.701	0.706	0.534	0.541	0.548
	SePH	0.442	0.456	0.464	0.721	0.726	0.731	0.598	0.602	0.610
	DCMH	0.518	0.537	0.546	0.782	0.790	0.793	0.638	0.651	0.657
	ADAH	0.535	0.556	0.564	0.792	0.806	0.807	0.678	0.697	0.703
	SSAH	0.516	0.551	0.570	0.791	0.795	0.803	0.669	0.662	0.666
	DADCH	0.611	0.649	0.671	0.835	0.847	0.857	0.767	0.786	0.795

深度非对称离散跨模态哈希方法

Deep asymmetric discrete cross-modal hashing method

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 38

相关文章 15

编辑推荐

Metrics

任务	数据集	DADCH-Ⅰ	DADCH-Ⅱ	DADCH-Ⅲ	DADCH
I→T	MIRFLICKR-25K	0.789	0.827	0.831	0.843
I→T	NUS-WIDE	0.717	0.742	0.745	0.778
T→I	MIRFLICKR-25K	0.796	0.830	0.828	0.847
T→I	NUS-WIDE	0.723	0.746	0.753	0.786

[1]	石锐, 李勇, 朱延晗. 基于特征梯度均值化的调制信号对抗样本攻击算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2521-2527.
[2]	王美, 苏雪松, 刘佳, 殷若南, 黄珊. 时频域多尺度交叉注意力融合的时间序列分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1842-1847.
[3]	肖斌, 杨模, 汪敏, 秦光源, 李欢. 独立性视角下的相频融合领域泛化方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1002-1009.
[4]	颜梦玫, 杨冬平. 深度神经网络平均场理论综述[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 331-343.
[5]	吴祖成, 吴小俊, 徐天阳. 基于模态内细粒度特征关系提取的图像文本检索模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3776-3783.
[6]	柴汶泽, 范菁, 孙书魁, 梁一鸣, 刘竟锋. 深度度量学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 2995-3010.
[7]	刘秋杰, 万源, 吴杰. 深度双模态源域对称迁移学习的跨模态检索[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 24-31.
[8]	黄懿蕊, 罗俊玮, 陈景强. 基于对比学习和GIF标记的多模态对话回复检索[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 32-38.
[9]	赵旭剑, 李杭霖. 基于混合机制的深度神经网络压缩算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2686-2691.
[10]	申云飞, 申飞, 李芳, 张俊. 基于张量虚拟机的深度神经网络模型加速方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2836-2844.
[11]	李校林, 杨松佳. 基于深度学习的多用户毫米波中继网络混合波束赋形[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2511-2516.
[12]	李淦, 牛洺第, 陈路, 杨静, 闫涛, 陈斌. 融合视觉特征增强机制的机器人弱光环境抓取检测[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2564-2571.
[13]	程美英, 钱乾, 熊伟清. 信息迁移多任务优化共生生物搜索算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2237-2247.
[14]	谭钰, 王小琴, 蓝如师, 刘振丙, 罗笑南. 基于判别性矩阵分解的多标签跨模态哈希检索[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1349-1354.
[15]	杨海宇, 郭文普, 康凯. 基于卷积长短时深度神经网络的信号调制方式识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1318-1322.