Adaptive hybrid attention hashing for deep cross-modal retrieval

doi:10.11772/j.issn.1001-9081.2021101806

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3663-3670.DOI: 10.11772/j.issn.1001-9081.2021101806

• Artificial intelligence • Previous Articles

Adaptive hybrid attention hashing for deep cross-modal retrieval

Xinghua LIU¹^,², Guitao CAO³, Qiubin LIN¹^,², Wenming CAO¹^,²()

^1.College of Electronics and Information Engineering，Shenzhen University，Shenzhen Guangdong 518060，China
^2.Guangdong Multimedia Information Service Engineering Technology Research Center （Shenzhen University），Shenzhen Guangdong 518060，China
^3.Software Engineering Institute，East China Normal University，Shanghai 200062，China

Received:2021-10-22 Revised:2021-12-20 Accepted:2021-12-23 Online:2021-12-31 Published:2022-12-10
Contact: Wenming CAO
About author:LIU Xinghua， born in 1995， M. S. candidate. His research interests include multimedia information processing， cross-modal retrieval.
CAO Guitao，born in 1970， Ph. D.， professor. Her research interests include multimedia information processing， artificial intelligence.
LIN Qiubin， born in 1994， Ph. D. candidate. His research interests include multimedia information processing， cross-modal retrieval.
Supported by:
National Natural Science Foundation of China(61771322)

自适应混合注意力深度跨模态哈希

柳兴华¹^,², 曹桂涛³, 林秋斌¹^,², 曹文明¹^,²()

^1.深圳大学电子与信息工程学院, 广东深圳 518060
^2.广东省多媒体信息服务工程技术研究中心(深圳大学), 广东深圳 518060
^3.华东师范大学软件工程学院, 上海 200062

通讯作者: 曹文明
作者简介:柳兴华（1995—），男，河南信阳人，硕士研究生，主要研究方向：多媒体信息处理、跨模态检索
曹桂涛（1970—），女，山东烟台人，教授，博士，CCF会员，主要研究方向：多媒体信息处理、人工智能
林秋斌（1994—），男，广东潮州人，博士研究生，主要研究方向：多媒体信息处理、跨模态检索
基金资助:
国家自然科学基金资助项目(61771322)

Abstract

Abstract:

In feature learning process， the existing hashing methods cannot distinguish the importance of the feature information of each region， and cannot utilize the label information to explore the correlation between modalities. Therefore， an Adaptive Hybrid Attention Hashing for deep cross-modal retrieval （AHAH） model was proposed. Firstly， channel attention and spatial attention were combined by the weights obtained by autonomous learning to strengthen the attention to the relevant target area and weaken the attention to the irrelevant target area. Secondly， the similarity between modalities was expressed more finely through the statistical analysis of modality labels and quantification of similarity degrees to numbers between 0 and 1 by using the proposed similarity measurement method. Compared with the most advanced method Multi-Label Semantics Preserving Hashing （MLSPH） on four commonly used datasets MIRFLICKR-25K， NUS-WIDE， MSCOCO， and IAPR TC-12， when the hash code length is 16 bit， the proposed method has the retrieval mean Average Precision （mAP） increased by 2.25%， 1.75%， 6.8%， and 2.15%， respectively. In addition， ablation experiments and efficiency analysis also prove the effectiveness of the proposed method.

Key words: cross-modal retrieval, hashing method, deep neural network, adaptive, hybrid attention

摘要：

针对现有哈希方法在特征学习过程中无法区分各区域特征信息的重要程度和不能充分利用标签信息来深度挖掘模态间相关性的问题，提出了自适应混合注意力深度跨模态哈希检索（AHAH）模型。首先，通过自主学习得到的权重将通道注意力和空间注意力有机结合来强化对特征图中相关目标区域的关注度，同时弱化对不相关目标区域的关注度；其次，通过对模态标签进行统计分析，并使用所提出的相似度计算方法将相似度量化为0~1的数字以更精细地表示模态间的相似性。在4个常用的数据集MIRFLICKR-25K、NUS-WIDE、MSCOCO和IAPR TC-12上，当哈希码长度为16 bit时，与最先进的方法多标签语义保留哈希（MLSPH）相比，所提方法的检索平均准确率均值（mAP）分别提高了2.25%、1.75%、6.8%和2.15%。此外，消融实验和效率分析也证明了所提方法的有效性。

关键词: 跨模态检索, 哈希方法, 深度神经网络, 自适应, 混合注意力

CLC Number:

TP391.3

Xinghua LIU, Guitao CAO, Qiubin LIN, Wenming CAO. Adaptive hybrid attention hashing for deep cross-modal retrieval[J]. Journal of Computer Applications, 2022, 42(12): 3663-3670.

柳兴华, 曹桂涛, 林秋斌, 曹文明. 自适应混合注意力深度跨模态哈希[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3663-3670.

Figures/Tables 9

References 22

1	ARYA S， MOUNT D M， NETANYAHU N S， et al. An optimal algorithm for approximate nearest neighbor searching fixed dimensions［J］. Journal of the ACM， 1998， 45（6）： 891-923. 10.1145/293347.293348
2	SONG J K， YANG Y， YANG Y， et al. Inter-media hashing for large-scale retrieval from heterogeneous data sources［C］// Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York： ACM， 2013： 785-796. 10.1145/2463676.2465274
3	DING G G， GUO Y C， ZHOU J L. Collective matrix factorization hashing for multimodal data［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 2083-2090. 10.1109/cvpr.2014.267
4	ZHOU J， DING G， GUO Y. Latent semantic sparse hashing for cross-modal similarity search［C］// Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2014： 415-424. 10.1145/2600428.2609610
5	MANDAL D， CHAUDHURY K N， BISWAS S. Generalized semantic preserving hashing for n-label cross-modal retrieval ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2633-2641. 10.1109/cvpr.2017.282
6	ZHANG D Q， LI W J. Large-scale supervised multimodal hashing with semantic correlation maximization［C］// Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2014： 2177-2183. 10.1609/aaai.v28i1.8995
7	MANDAL D， CHAUDHURY K N， BISWAS S. Generalized semantic preserving hashing for cross-modal retrieval［J］. IEEE Transactions on Image Processing， 2019， 28（1）： 102-112. 10.1109/tip.2018.2863040
8	WANG H T， MENG M， CHEN H， et al. Supervised consistent and specific hashing［C］// Proceedings of the 2019 IEEE International Conference on Multimedia and Expo. Piscataway： IEEE， 2019： 1822-1827. 10.1109/icme.2019.00313
9	JIANG Q Y， LI W J. Deep cross-modal hashing［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3270-3278. 10.1109/cvpr.2017.348
10	YANG E K， DENG C， LIU W， et al. Pairwise relationship guided deep hashing for cross-modal retrieval［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017： 1618-1625. 10.1609/aaai.v31i1.10719
11	LIN Q M， CAO W M， HE Z H， et al. Semantic deep cross-modal hashing［J］. Neurocomputing， 2020， 396： 113-122. 10.1016/j.neucom.2020.02.043
12	LIU H， FENG Y， ZHOU M L， et al. Semantic ranking structure preserving for cross-modal retrieval［J］. Applied Intelligence， 2021， 51（3）： 1802-1812. 10.1007/s10489-020-01930-x
13	LI C， DENG C， LI N， et al. Self-supervised adversarial hashing networks for cross-modal retrieval［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4242-4251. 10.1109/cvpr.2018.00446
14	MA X H， ZHANG T Z， XU C S. Multi-level correlation adversarial hashing for cross-modal retrieval［J］. IEEE Transactions on Multimedia， 2020， 22（12）： 3101-3114. 10.1109/tmm.2020.2969792
15	ZOU X T， WANG X Z， BAKKER E M， et al. Multi-label semantics preserving based deep cross-modal hashing［J］. Signal Processing： Image Communication， 2021， 93： No.116131. 10.1016/j.image.2020.116131
16	刘芳名，张鸿. 基于多级语义的判别式跨模态哈希检索算法［J］. 计算机应用， 2021， 41（8）： 2187-2192. 10.11772/j.issn.1001-9081.2020101607
	LIU F M， ZHANG H. Cross-modal retrieval algorithm based on multi-level semantic discriminative guided hashing［J］. Journal of Computer Applications， 2021， 41（8）： 2187-2192. 10.11772/j.issn.1001-9081.2020101607
17	张成，万源，强浩鹏. 基于知识蒸馏的深度无监督离散跨模态哈希［J］.计算机应用， 2021， 41（9）：2523-2531. 10.11772/j.issn.1001-9081.2020111785
	ZHANG C， WAN Y， QIANG H P. Deep unsupervised discrete cross-modal hashing based on knowledge distillation［J］. Journal of Computer Applications， 2021， 41（9）：2523-2531. 10.11772/j.issn.1001-9081.2020111785
18	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
19	LI X， WANG W H， HU X L， et al. Selective kernel networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 510-519. 10.1109/cvpr.2019.00060
20	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
21	HUANG Z L， WANG X G， HUANG L C， et al. CCNet： criss-cross attention for semantic segmentation［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 603-612. 10.1109/iccv.2019.00069
22	FU J， LIU J， TIAN H J， et al. Dual attention network for scene segmentation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3141-3149. 10.1109/cvpr.2019.00326

数据集	训练集	检索集	索引集	文本
MIRFLICKR-25k	10 000	18 015	2 000	1 386
NUS-WIDE	10 500	188 321	2 100	1 000
MSCOCO	10 000	80 000	5 000	2 000
IAPR TC-12	10 000	18 000	2 000	2 912

数据集	训练集	检索集	索引集	文本
MIRFLICKR-25k	10 000	18 015	2 000	1 386
NUS-WIDE	10 500	188 321	2 100	1 000
MSCOCO	10 000	80 000	5 000	2 000
IAPR TC-12	10 000	18 000	2 000	2 912

检索任务	方法	MIRFLICKR-15K			NUS-WIDE			MS COCO			IAPR TC-12
检索任务	方法	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit
图像检索文本	SCM	0.685	0.693	0.697	0.497	0.502	0.499	0.498	0.556	0.565	0.369	0.367	0.380
	SePH	0.709	0.711	0.716	0.479	0.501	0.492	0.489	0.502	0.499	0.444	0.456	0.464
	GSPH	0.607	0.619	0.623	0.402	0.415	0.421	0.443	0.473	0.484	0.372	0.392	0.402
	DCMH	0.677	0.703	0.725	0.590	0.603	0.609	0.497	0.506	0.511	0.453	0.473	0.484
	SSAH	0.797	0.809	0.810	0.636	0.636	0.637	0.550	0.577	0.576	0.544	0.537	0.549
	MLCAH	0.808	0.816	0.828	0.645	0.640	0.653	0.582	0.597	0.595	0.550	0.565	0.563
	MLSPH	0.808	0.824	0.834	0.641	0.660	0.673	0.574	0.592	0.601	0.426	0.426	0.475
	本文方法	0.821	0.832	0.836	0.662	0.682	0.692	0.613	0.655	0.675	0.557	0.587	0.602
文本检索图像	SCM	0.707	0.714	0.719	0.567	0.583	0.597	0.492	0.556	0.568	0.345	0.341	0.347
	SePH	0.722	0.723	0.727	0.487	0.493	0.488	0.485	0.495	0.485	0.442	0.456	0.465
	GSPH	0.628	0.646	0.650	0.500	0.523	0.535	0.544	0.604	0.646	0.418	0.445	0.464
	DCMH	0.705	0.717	0.724	0.620	0.634	0.643	0.507	0.520	0.527	0.519	0.538	0.549
	SSAH	0.782	0.797	0.799	0.653	0.676	0.683	0.552	0.578	0.578	0.531	0.534	0.566
	MLCAH	0.793	0.811	0.807	0.679	0.698	0.704	0.569	0.593	0.583	0.554	0.563	0.566
	MLSPH	0.785	0.804	0.815	0.643	0.663	0.672	0.556	0.586	0.613	0.435	0.452	0.473
	本文方法	0.816	0.825	0.831	0.685	0.706	0.713	0.617	0.659	0.672	0.571	0.603	0.620

检索任务	方法	MIRFLICKR-15K			NUS-WIDE			MS COCO			IAPR TC-12
检索任务	方法	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit
图像检索文本	SCM	0.685	0.693	0.697	0.497	0.502	0.499	0.498	0.556	0.565	0.369	0.367	0.380
	SePH	0.709	0.711	0.716	0.479	0.501	0.492	0.489	0.502	0.499	0.444	0.456	0.464
	GSPH	0.607	0.619	0.623	0.402	0.415	0.421	0.443	0.473	0.484	0.372	0.392	0.402
	DCMH	0.677	0.703	0.725	0.590	0.603	0.609	0.497	0.506	0.511	0.453	0.473	0.484
	SSAH	0.797	0.809	0.810	0.636	0.636	0.637	0.550	0.577	0.576	0.544	0.537	0.549
	MLCAH	0.808	0.816	0.828	0.645	0.640	0.653	0.582	0.597	0.595	0.550	0.565	0.563
	MLSPH	0.808	0.824	0.834	0.641	0.660	0.673	0.574	0.592	0.601	0.426	0.426	0.475
	本文方法	0.821	0.832	0.836	0.662	0.682	0.692	0.613	0.655	0.675	0.557	0.587	0.602
文本检索图像	SCM	0.707	0.714	0.719	0.567	0.583	0.597	0.492	0.556	0.568	0.345	0.341	0.347
	SePH	0.722	0.723	0.727	0.487	0.493	0.488	0.485	0.495	0.485	0.442	0.456	0.465
	GSPH	0.628	0.646	0.650	0.500	0.523	0.535	0.544	0.604	0.646	0.418	0.445	0.464
	DCMH	0.705	0.717	0.724	0.620	0.634	0.643	0.507	0.520	0.527	0.519	0.538	0.549
	SSAH	0.782	0.797	0.799	0.653	0.676	0.683	0.552	0.578	0.578	0.531	0.534	0.566
	MLCAH	0.793	0.811	0.807	0.679	0.698	0.704	0.569	0.593	0.583	0.554	0.563	0.566
	MLSPH	0.785	0.804	0.815	0.643	0.663	0.672	0.556	0.586	0.613	0.435	0.452	0.473
	本文方法	0.816	0.825	0.831	0.685	0.706	0.713	0.617	0.659	0.672	0.571	0.603	0.620

模型	图像检索文本			文本检索图像
模型	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit
通道	0.808	0.828	0.829	0.811	0.818	0.825
空间	0.809	0.826	0.830	0.809	0.819	0.824
混合	0.821	0.832	0.836	0.817	0.825	0.831

Adaptive hybrid attention hashing for deep cross-modal retrieval

自适应混合注意力深度跨模态哈希

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 22

Related Articles 15

Recommended Articles

Metrics

任务	方法	MIRFKICKR-25K			NUS-WIDE			MS COCO			IAPR TC-12
任务	方法	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit	16 bit	32 bit	64 bit
图像检索文本	AHAH	0.821	0.832	0.836	0.662	0.682	0.692	0.613	0.655	0.685	0.557	0.587	0.602
	AHAH-1	0.815	0.823	0.829	0.651	0.674	0.685	0.607	0.44	0.676	0.548	0.579	0.595
	AHAH-2	0.817	0.824	0.830	0.655	0.677	0.684	0.609	0.650	0.680	0.550	0.582	0.596
文本检索图像	AHAH	0.816	0.825	0.831	0.685	0.706	0.713	0.617	0.659	0.682	0.571	0.603	0.620
	AHAH-1	0.811	0.820	0.823	0.678	0.695	0.702	0.610	0.650	0.673	0.562	0.595	0.609
	AHAH-2	0.810	0.813	0.825	0.681	0.699	0.706	0.607	0.654	0.677	0.566	0.593	0.614

[1]	Zhihao XIAO, Zhihua HU, Lin ZHU. Hybrid adaptive large neighborhood search algorithm for solving time-dependent vehicle routing problem in cold chain logistics [J]. Journal of Computer Applications, 2022, 42(9): 2926-2935.
[2]	Xiaoyu WANG, Zhanqing WANG, Wei XIONG. Deep asymmetric discrete cross-modal hashing method [J]. Journal of Computer Applications, 2022, 42(8): 2461-2470.
[3]	Bo YANG, Hengwei ZHANG, Zheming LI, Kaiyong XU. Adversarial example generation method based on image flipping transform [J]. Journal of Computer Applications, 2022, 42(8): 2319-2325.
[4]	Yinglü XUAN, Yuan WAN, Jiahui CHEN. Time series classification by LSTM based on multi-scale convolution and attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2343-2352.
[5]	Kun LI, Qing HOU. Lightweight human pose estimation based on attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2407-2414.
[6]	Shangwang LIU, Xinming ZHANG, Fei ZHANG. Image character editing method based on improved font adaptive neural network [J]. Journal of Computer Applications, 2022, 42(7): 2227-2238.
[7]	Liang ZHU, Hua XU, Jinhai CHENG, Shen ZHU. Analysis and improvement of AdaBoost’s sample weight and combination coefficient [J]. Journal of Computer Applications, 2022, 42(7): 2022-2029.
[8]	Wentao MAO, Guifang WU, Chao WU, Zhi DOU. Animation video generation model based on Chinese impressionistic style transfer [J]. Journal of Computer Applications, 2022, 42(7): 2162-2169.
[9]	Houming FAN, Shuang MU, Lijun YUE. Collaborative optimization of automated guided vehicle scheduling and path planning considering conflict and congestion [J]. Journal of Computer Applications, 2022, 42(7): 2281-2291.
[10]	Rongyuan CHEN, Jianmin YAO, Qun YAN, Zhixian LIN. Video playback speed recognition based on deep neural network [J]. Journal of Computer Applications, 2022, 42(7): 2043-2051.
[11]	Zheng DI, Yifan CAO, Chao QIU, Tao LUO, Xiaofei WANG. New computing power network architecture and application case analysis [J]. Journal of Computer Applications, 2022, 42(6): 1656-1661.
[12]	Zhonghua ZHANG, Fuyuan ZHAO, Junfeng GUO, Gaochang ZHAO. Integrated prediction model of Cauchy adaptive backtracking search and least square support vector machine [J]. Journal of Computer Applications, 2022, 42(6): 1829-1836.
[13]	Weikang ZHANG, Sheng LIU, Qian HUANG, Yuxin GUO. Equilibrium optimizer considering distance factor and elite evolutionary strategy [J]. Journal of Computer Applications, 2022, 42(6): 1844-1851.
[14]	Meng YU, Wentao HE, Xuchuan ZHOU, Mengtian CUI, Keqi WU, Wenjie ZHOU. Review of recommendation system [J]. Journal of Computer Applications, 2022, 42(6): 1898-1913.
[15]	Man ZHANG, Zhengjun ZHANG, Junqi FENG, Tao YAN. Density peak clustering algorithm based on adaptive reachable distance [J]. Journal of Computer Applications, 2022, 42(6): 1914-1921.