Adaptive hybrid attention hashing for deep cross-modal retrieval

doi:10.11772/j.issn.1001-9081.2021101806

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3663-3670.DOI: 10.11772/j.issn.1001-9081.2021101806

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Adaptive hybrid attention hashing for deep cross-modal retrieval

Xinghua LIU¹^,², Guitao CAO³, Qiubin LIN¹^,², Wenming CAO¹^,²()

^1.College of Electronics and Information Engineering，Shenzhen University，Shenzhen Guangdong 518060，China
^2.Guangdong Multimedia Information Service Engineering Technology Research Center （Shenzhen University），Shenzhen Guangdong 518060，China
^3.Software Engineering Institute，East China Normal University，Shanghai 200062，China

Received:2021-10-22 Revised:2021-12-20 Accepted:2021-12-23 Online:2021-12-31 Published:2022-12-10
Contact: Wenming CAO
About author:LIU Xinghua， born in 1995， M. S. candidate. His research interests include multimedia information processing， cross-modal retrieval.
CAO Guitao，born in 1970， Ph. D.， professor. Her research interests include multimedia information processing， artificial intelligence.
LIN Qiubin， born in 1994， Ph. D. candidate. His research interests include multimedia information processing， cross-modal retrieval.
Supported by:
National Natural Science Foundation of China(61771322)

自适应混合注意力深度跨模态哈希

柳兴华¹^,², 曹桂涛³, 林秋斌¹^,², 曹文明¹^,²()

^1.深圳大学电子与信息工程学院, 广东深圳 518060
^2.广东省多媒体信息服务工程技术研究中心(深圳大学), 广东深圳 518060
^3.华东师范大学软件工程学院, 上海 200062

通讯作者: 曹文明
作者简介:柳兴华（1995—），男，河南信阳人，硕士研究生，主要研究方向：多媒体信息处理、跨模态检索
曹桂涛（1970—），女，山东烟台人，教授，博士，CCF会员，主要研究方向：多媒体信息处理、人工智能
林秋斌（1994—），男，广东潮州人，博士研究生，主要研究方向：多媒体信息处理、跨模态检索
基金资助:
国家自然科学基金资助项目(61771322)

Abstract

Abstract:

In feature learning process， the existing hashing methods cannot distinguish the importance of the feature information of each region， and cannot utilize the label information to explore the correlation between modalities. Therefore， an Adaptive Hybrid Attention Hashing for deep cross-modal retrieval （AHAH） model was proposed. Firstly， channel attention and spatial attention were combined by the weights obtained by autonomous learning to strengthen the attention to the relevant target area and weaken the attention to the irrelevant target area. Secondly， the similarity between modalities was expressed more finely through the statistical analysis of modality labels and quantification of similarity degrees to numbers between 0 and 1 by using the proposed similarity measurement method. Compared with the most advanced method Multi-Label Semantics Preserving Hashing （MLSPH） on four commonly used datasets MIRFLICKR-25K， NUS-WIDE， MSCOCO， and IAPR TC-12， when the hash code length is 16 bit， the proposed method has the retrieval mean Average Precision （mAP） increased by 2.25%， 1.75%， 6.8%， and 2.15%， respectively. In addition， ablation experiments and efficiency analysis also prove the effectiveness of the proposed method.

Key words: cross-modal retrieval, hashing method, deep neural network, adaptive, hybrid attention

摘要：

针对现有哈希方法在特征学习过程中无法区分各区域特征信息的重要程度和不能充分利用标签信息来深度挖掘模态间相关性的问题，提出了自适应混合注意力深度跨模态哈希检索（AHAH）模型。首先，通过自主学习得到的权重将通道注意力和空间注意力有机结合来强化对特征图中相关目标区域的关注度，同时弱化对不相关目标区域的关注度；其次，通过对模态标签进行统计分析，并使用所提出的相似度计算方法将相似度量化为0~1的数字以更精细地表示模态间的相似性。在4个常用的数据集MIRFLICKR-25K、NUS-WIDE、MSCOCO和IAPR TC-12上，当哈希码长度为16 bit时，与最先进的方法多标签语义保留哈希（MLSPH）相比，所提方法的检索平均准确率均值（mAP）分别提高了2.25%、1.75%、6.8%和2.15%。此外，消融实验和效率分析也证明了所提方法的有效性。

关键词: 跨模态检索, 哈希方法, 深度神经网络, 自适应, 混合注意力

CLC Number:

TP391.3

Xinghua LIU, Guitao CAO, Qiubin LIN, Wenming CAO. Adaptive hybrid attention hashing for deep cross-modal retrieval[J]. Journal of Computer Applications, 2022, 42(12): 3663-3670.

柳兴华, 曹桂涛, 林秋斌, 曹文明. 自适应混合注意力深度跨模态哈希[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3663-3670.

Figures/Tables 9

References 22

1	ARYA S， MOUNT D M， NETANYAHU N S， et al. An optimal algorithm for approximate nearest neighbor searching fixed dimensions［J］. Journal of the ACM， 1998， 45（6）： 891-923. 10.1145/293347.293348
2	SONG J K， YANG Y， YANG Y， et al. Inter-media hashing for large-scale retrieval from heterogeneous data sources［C］// Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York： ACM， 2013： 785-796. 10.1145/2463676.2465274
3	DING G G， GUO Y C， ZHOU J L. Collective matrix factorization hashing for multimodal data［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 2083-2090. 10.1109/cvpr.2014.267
4	ZHOU J， DING G， GUO Y. Latent semantic sparse hashing for cross-modal similarity search［C］// Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2014： 415-424. 10.1145/2600428.2609610
5	MANDAL D， CHAUDHURY K N， BISWAS S. Generalized semantic preserving hashing for n-label cross-modal retrieval ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2633-2641. 10.1109/cvpr.2017.282
6	ZHANG D Q， LI W J. Large-scale supervised multimodal hashing with semantic correlation maximization［C］// Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2014： 2177-2183. 10.1609/aaai.v28i1.8995
7	MANDAL D， CHAUDHURY K N， BISWAS S. Generalized semantic preserving hashing for cross-modal retrieval［J］. IEEE Transactions on Image Processing， 2019， 28（1）： 102-112. 10.1109/tip.2018.2863040
8	WANG H T， MENG M， CHEN H， et al. Supervised consistent and specific hashing［C］// Proceedings of the 2019 IEEE International Conference on Multimedia and Expo. Piscataway： IEEE， 2019： 1822-1827. 10.1109/icme.2019.00313
9	JIANG Q Y， LI W J. Deep cross-modal hashing［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3270-3278. 10.1109/cvpr.2017.348
10	YANG E K， DENG C， LIU W， et al. Pairwise relationship guided deep hashing for cross-modal retrieval［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017： 1618-1625. 10.1609/aaai.v31i1.10719
11	LIN Q M， CAO W M， HE Z H， et al. Semantic deep cross-modal hashing［J］. Neurocomputing， 2020， 396： 113-122. 10.1016/j.neucom.2020.02.043
12	LIU H， FENG Y， ZHOU M L， et al. Semantic ranking structure preserving for cross-modal retrieval［J］. Applied Intelligence， 2021， 51（3）： 1802-1812. 10.1007/s10489-020-01930-x
13	LI C， DENG C， LI N， et al. Self-supervised adversarial hashing networks for cross-modal retrieval［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4242-4251. 10.1109/cvpr.2018.00446
14	MA X H， ZHANG T Z， XU C S. Multi-level correlation adversarial hashing for cross-modal retrieval［J］. IEEE Transactions on Multimedia， 2020， 22（12）： 3101-3114. 10.1109/tmm.2020.2969792
15	ZOU X T， WANG X Z， BAKKER E M， et al. Multi-label semantics preserving based deep cross-modal hashing［J］. Signal Processing： Image Communication， 2021， 93： No.116131. 10.1016/j.image.2020.116131
16	刘芳名，张鸿. 基于多级语义的判别式跨模态哈希检索算法［J］. 计算机应用， 2021， 41（8）： 2187-2192. 10.11772/j.issn.1001-9081.2020101607
	LIU F M， ZHANG H. Cross-modal retrieval algorithm based on multi-level semantic discriminative guided hashing［J］. Journal of Computer Applications， 2021， 41（8）： 2187-2192. 10.11772/j.issn.1001-9081.2020101607
17	张成，万源，强浩鹏. 基于知识蒸馏的深度无监督离散跨模态哈希［J］.计算机应用， 2021， 41（9）：2523-2531. 10.11772/j.issn.1001-9081.2020111785
	ZHANG C， WAN Y， QIANG H P. Deep unsupervised discrete cross-modal hashing based on knowledge distillation［J］. Journal of Computer Applications， 2021， 41（9）：2523-2531. 10.11772/j.issn.1001-9081.2020111785
18	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
19	LI X， WANG W H， HU X L， et al. Selective kernel networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 510-519. 10.1109/cvpr.2019.00060
20	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
21	HUANG Z L， WANG X G， HUANG L C， et al. CCNet： criss-cross attention for semantic segmentation［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 603-612. 10.1109/iccv.2019.00069
22	FU J， LIU J， TIAN H J， et al. Dual attention network for scene segmentation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3141-3149. 10.1109/cvpr.2019.00326

数据集	训练集	检索集	索引集	文本
MIRFLICKR-25k	10 000	18 015	2 000	1 386
NUS-WIDE	10 500	188 321	2 100	1 000
MSCOCO	10 000	80 000	5 000	2 000
IAPR TC-12	10 000	18 000	2 000	2 912

数据集	训练集	检索集	索引集	文本
MIRFLICKR-25k	10 000	18 015	2 000	1 386
NUS-WIDE	10 500	188 321	2 100	1 000
MSCOCO	10 000	80 000	5 000	2 000
IAPR TC-12	10 000	18 000	2 000	2 912

检索任务	方法	MIRFLICKR-15K			NUS-WIDE			MS COCO			IAPR TC-12
检索任务	方法	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit
图像检索文本	SCM	0.685	0.693	0.697	0.497	0.502	0.499	0.498	0.556	0.565	0.369	0.367	0.380
	SePH	0.709	0.711	0.716	0.479	0.501	0.492	0.489	0.502	0.499	0.444	0.456	0.464
	GSPH	0.607	0.619	0.623	0.402	0.415	0.421	0.443	0.473	0.484	0.372	0.392	0.402
	DCMH	0.677	0.703	0.725	0.590	0.603	0.609	0.497	0.506	0.511	0.453	0.473	0.484
	SSAH	0.797	0.809	0.810	0.636	0.636	0.637	0.550	0.577	0.576	0.544	0.537	0.549
	MLCAH	0.808	0.816	0.828	0.645	0.640	0.653	0.582	0.597	0.595	0.550	0.565	0.563
	MLSPH	0.808	0.824	0.834	0.641	0.660	0.673	0.574	0.592	0.601	0.426	0.426	0.475
	本文方法	0.821	0.832	0.836	0.662	0.682	0.692	0.613	0.655	0.675	0.557	0.587	0.602
文本检索图像	SCM	0.707	0.714	0.719	0.567	0.583	0.597	0.492	0.556	0.568	0.345	0.341	0.347
	SePH	0.722	0.723	0.727	0.487	0.493	0.488	0.485	0.495	0.485	0.442	0.456	0.465
	GSPH	0.628	0.646	0.650	0.500	0.523	0.535	0.544	0.604	0.646	0.418	0.445	0.464
	DCMH	0.705	0.717	0.724	0.620	0.634	0.643	0.507	0.520	0.527	0.519	0.538	0.549
	SSAH	0.782	0.797	0.799	0.653	0.676	0.683	0.552	0.578	0.578	0.531	0.534	0.566
	MLCAH	0.793	0.811	0.807	0.679	0.698	0.704	0.569	0.593	0.583	0.554	0.563	0.566
	MLSPH	0.785	0.804	0.815	0.643	0.663	0.672	0.556	0.586	0.613	0.435	0.452	0.473
	本文方法	0.816	0.825	0.831	0.685	0.706	0.713	0.617	0.659	0.672	0.571	0.603	0.620

检索任务	方法	MIRFLICKR-15K			NUS-WIDE			MS COCO			IAPR TC-12
检索任务	方法	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit
图像检索文本	SCM	0.685	0.693	0.697	0.497	0.502	0.499	0.498	0.556	0.565	0.369	0.367	0.380
	SePH	0.709	0.711	0.716	0.479	0.501	0.492	0.489	0.502	0.499	0.444	0.456	0.464
	GSPH	0.607	0.619	0.623	0.402	0.415	0.421	0.443	0.473	0.484	0.372	0.392	0.402
	DCMH	0.677	0.703	0.725	0.590	0.603	0.609	0.497	0.506	0.511	0.453	0.473	0.484
	SSAH	0.797	0.809	0.810	0.636	0.636	0.637	0.550	0.577	0.576	0.544	0.537	0.549
	MLCAH	0.808	0.816	0.828	0.645	0.640	0.653	0.582	0.597	0.595	0.550	0.565	0.563
	MLSPH	0.808	0.824	0.834	0.641	0.660	0.673	0.574	0.592	0.601	0.426	0.426	0.475
	本文方法	0.821	0.832	0.836	0.662	0.682	0.692	0.613	0.655	0.675	0.557	0.587	0.602
文本检索图像	SCM	0.707	0.714	0.719	0.567	0.583	0.597	0.492	0.556	0.568	0.345	0.341	0.347
	SePH	0.722	0.723	0.727	0.487	0.493	0.488	0.485	0.495	0.485	0.442	0.456	0.465
	GSPH	0.628	0.646	0.650	0.500	0.523	0.535	0.544	0.604	0.646	0.418	0.445	0.464
	DCMH	0.705	0.717	0.724	0.620	0.634	0.643	0.507	0.520	0.527	0.519	0.538	0.549
	SSAH	0.782	0.797	0.799	0.653	0.676	0.683	0.552	0.578	0.578	0.531	0.534	0.566
	MLCAH	0.793	0.811	0.807	0.679	0.698	0.704	0.569	0.593	0.583	0.554	0.563	0.566
	MLSPH	0.785	0.804	0.815	0.643	0.663	0.672	0.556	0.586	0.613	0.435	0.452	0.473
	本文方法	0.816	0.825	0.831	0.685	0.706	0.713	0.617	0.659	0.672	0.571	0.603	0.620

模型	图像检索文本			文本检索图像
模型	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit
通道	0.808	0.828	0.829	0.811	0.818	0.825
空间	0.809	0.826	0.830	0.809	0.819	0.824
混合	0.821	0.832	0.836	0.817	0.825	0.831

Adaptive hybrid attention hashing for deep cross-modal retrieval

自适应混合注意力深度跨模态哈希

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 22

Related Articles 15

Recommended Articles

Metrics

任务	方法	MIRFKICKR-25K			NUS-WIDE			MS COCO			IAPR TC-12
任务	方法	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit
图像检索文本	AHAH	0.821	0.832	0.836	0.662	0.682	0.692	0.613	0.655	0.685	0.557	0.587	0.602
	AHAH-1	0.815	0.823	0.829	0.651	0.674	0.685	0.607	0.44	0.676	0.548	0.579	0.595
	AHAH-2	0.817	0.824	0.830	0.655	0.677	0.684	0.609	0.650	0.680	0.550	0.582	0.596
文本检索图像	AHAH	0.816	0.825	0.831	0.685	0.706	0.713	0.617	0.659	0.682	0.571	0.603	0.620
	AHAH-1	0.811	0.820	0.823	0.678	0.695	0.702	0.610	0.650	0.673	0.562	0.595	0.609
	AHAH-2	0.810	0.813	0.825	0.681	0.699	0.706	0.607	0.654	0.677	0.566	0.593	0.614

[1]	Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682.
[2]	Guanglei YAO, Juxia XIONG, Guowu YANG. Flower pollination algorithm based on neural network optimization [J]. Journal of Computer Applications, 2024, 44(9): 2829-2837.
[3]	Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769.
[4]	Qinzhuang ZHAO, Hongye TAN. Time series causal inference method based on adaptive threshold learning [J]. Journal of Computer Applications, 2024, 44(9): 2660-2666.
[5]	Le YANG, Damin ZHANG, Qing HE, Jiaxin DENG, Fengqin ZUO. Application of improved hunter-prey optimization algorithm in WSN coverage [J]. Journal of Computer Applications, 2024, 44(8): 2506-2513.
[6]	Rui SHI, Yong LI, Yanhan ZHU. Adversarial sample attack algorithm of modulation signal based on equalization of feature gradient [J]. Journal of Computer Applications, 2024, 44(8): 2521-2527.
[7]	Hang XU, Zhi YANG, Xingyuan CHEN, Bing HAN, Xuehui DU. Coverage-guided fuzzing based on adaptive sensitive region mutation [J]. Journal of Computer Applications, 2024, 44(8): 2528-2535.
[8]	Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG. Time series classification method based on multi-scale cross-attention fusion in time-frequency domain [J]. Journal of Computer Applications, 2024, 44(6): 1842-1847.
[9]	Yan LI, Dazhi PAN, Siqing ZHENG. Improved adaptive large neighborhood search algorithm for multi-depot vehicle routing problem with time window [J]. Journal of Computer Applications, 2024, 44(6): 1897-1904.
[10]	Jinfu WU, Yi LIU. Fast adversarial training method based on random noise and adaptive step size [J]. Journal of Computer Applications, 2024, 44(6): 1807-1815.
[11]	Bin XIAO, Mo YANG, Min WANG, Guangyuan QIN, Huan LI. Domain generalization method of phase-frequency fusion from independent perspective [J]. Journal of Computer Applications, 2024, 44(4): 1002-1009.
[12]	Haoran WANG, Dan YU, Yuli YANG, Yao MA, Yongle CHEN. Domain transfer intrusion detection method for unknown attacks on industrial control systems [J]. Journal of Computer Applications, 2024, 44(4): 1158-1165.
[13]	Wei LI, Ling CHEN, Xiuyuan XU, Min ZHU, Jixiang GUO, Kai ZHOU, Hao NIU, Yuchen ZHANG, Shanye YI, Yi ZHANG, Fengming LUO. Interstitial lung disease segmentation algorithm based on multi-task learning [J]. Journal of Computer Applications, 2024, 44(4): 1285-1293.
[14]	Xiuxi WEI, Maosong PENG, Huajuan HUANG. Node coverage optimization of wireless sensor network based on multi-strategy improved butterfly optimization algorithm [J]. Journal of Computer Applications, 2024, 44(4): 1009-1017.
[15]	Yidi LIU, Zihao WEN, Fuxiang REN, Shiyin LI, Deyu TANG. Self-adaptive spherical evolution for prediction of drug target interaction [J]. Journal of Computer Applications, 2024, 44(3): 989-994.