Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3663-3670.DOI: 10.11772/j.issn.1001-9081.2021101806
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Xinghua LIU1,2, Guitao CAO3, Qiubin LIN1,2, Wenming CAO1,2()
Received:
2021-10-22
Revised:
2021-12-20
Accepted:
2021-12-23
Online:
2021-12-31
Published:
2022-12-10
Contact:
Wenming CAO
About author:
LIU Xinghua, born in 1995, M. S. candidate. His research interests include multimedia information processing, cross-modal retrieval.Supported by:
柳兴华1,2, 曹桂涛3, 林秋斌1,2, 曹文明1,2()
通讯作者:
曹文明
作者简介:
柳兴华(1995—),男,河南信阳人,硕士研究生,主要研究方向:多媒体信息处理、跨模态检索基金资助:
CLC Number:
Xinghua LIU, Guitao CAO, Qiubin LIN, Wenming CAO. Adaptive hybrid attention hashing for deep cross-modal retrieval[J]. Journal of Computer Applications, 2022, 42(12): 3663-3670.
柳兴华, 曹桂涛, 林秋斌, 曹文明. 自适应混合注意力深度跨模态哈希[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3663-3670.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021101806
数据集 | 训练集 | 检索集 | 索引集 | 文本 |
---|---|---|---|---|
MIRFLICKR-25k | 10 000 | 18 015 | 2 000 | 1 386 |
NUS-WIDE | 10 500 | 188 321 | 2 100 | 1 000 |
MSCOCO | 10 000 | 80 000 | 5 000 | 2 000 |
IAPR TC-12 | 10 000 | 18 000 | 2 000 | 2 912 |
Tab. 1 Detailed configuration of experimental datasets
数据集 | 训练集 | 检索集 | 索引集 | 文本 |
---|---|---|---|---|
MIRFLICKR-25k | 10 000 | 18 015 | 2 000 | 1 386 |
NUS-WIDE | 10 500 | 188 321 | 2 100 | 1 000 |
MSCOCO | 10 000 | 80 000 | 5 000 | 2 000 |
IAPR TC-12 | 10 000 | 18 000 | 2 000 | 2 912 |
检索任务 | 方法 | MIRFLICKR-15K | NUS-WIDE | MS COCO | IAPR TC-12 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | ||
图像 检索 文本 | SCM | 0.685 | 0.693 | 0.697 | 0.497 | 0.502 | 0.499 | 0.498 | 0.556 | 0.565 | 0.369 | 0.367 | 0.380 |
SePH | 0.709 | 0.711 | 0.716 | 0.479 | 0.501 | 0.492 | 0.489 | 0.502 | 0.499 | 0.444 | 0.456 | 0.464 | |
GSPH | 0.607 | 0.619 | 0.623 | 0.402 | 0.415 | 0.421 | 0.443 | 0.473 | 0.484 | 0.372 | 0.392 | 0.402 | |
DCMH | 0.677 | 0.703 | 0.725 | 0.590 | 0.603 | 0.609 | 0.497 | 0.506 | 0.511 | 0.453 | 0.473 | 0.484 | |
SSAH | 0.797 | 0.809 | 0.810 | 0.636 | 0.636 | 0.637 | 0.550 | 0.577 | 0.576 | 0.544 | 0.537 | 0.549 | |
MLCAH | 0.808 | 0.816 | 0.828 | 0.645 | 0.640 | 0.653 | 0.582 | 0.597 | 0.595 | 0.550 | 0.565 | 0.563 | |
MLSPH | 0.808 | 0.824 | 0.834 | 0.641 | 0.660 | 0.673 | 0.574 | 0.592 | 0.601 | 0.426 | 0.426 | 0.475 | |
本文方法 | 0.821 | 0.832 | 0.836 | 0.662 | 0.682 | 0.692 | 0.613 | 0.655 | 0.675 | 0.557 | 0.587 | 0.602 | |
文本 检索 图像 | SCM | 0.707 | 0.714 | 0.719 | 0.567 | 0.583 | 0.597 | 0.492 | 0.556 | 0.568 | 0.345 | 0.341 | 0.347 |
SePH | 0.722 | 0.723 | 0.727 | 0.487 | 0.493 | 0.488 | 0.485 | 0.495 | 0.485 | 0.442 | 0.456 | 0.465 | |
GSPH | 0.628 | 0.646 | 0.650 | 0.500 | 0.523 | 0.535 | 0.544 | 0.604 | 0.646 | 0.418 | 0.445 | 0.464 | |
DCMH | 0.705 | 0.717 | 0.724 | 0.620 | 0.634 | 0.643 | 0.507 | 0.520 | 0.527 | 0.519 | 0.538 | 0.549 | |
SSAH | 0.782 | 0.797 | 0.799 | 0.653 | 0.676 | 0.683 | 0.552 | 0.578 | 0.578 | 0.531 | 0.534 | 0.566 | |
MLCAH | 0.793 | 0.811 | 0.807 | 0.679 | 0.698 | 0.704 | 0.569 | 0.593 | 0.583 | 0.554 | 0.563 | 0.566 | |
MLSPH | 0.785 | 0.804 | 0.815 | 0.643 | 0.663 | 0.672 | 0.556 | 0.586 | 0.613 | 0.435 | 0.452 | 0.473 | |
本文方法 | 0.816 | 0.825 | 0.831 | 0.685 | 0.706 | 0.713 | 0.617 | 0.659 | 0.672 | 0.571 | 0.603 | 0.620 |
Tab. 2 mAP comparison of each algorithm on four public datasets
检索任务 | 方法 | MIRFLICKR-15K | NUS-WIDE | MS COCO | IAPR TC-12 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | ||
图像 检索 文本 | SCM | 0.685 | 0.693 | 0.697 | 0.497 | 0.502 | 0.499 | 0.498 | 0.556 | 0.565 | 0.369 | 0.367 | 0.380 |
SePH | 0.709 | 0.711 | 0.716 | 0.479 | 0.501 | 0.492 | 0.489 | 0.502 | 0.499 | 0.444 | 0.456 | 0.464 | |
GSPH | 0.607 | 0.619 | 0.623 | 0.402 | 0.415 | 0.421 | 0.443 | 0.473 | 0.484 | 0.372 | 0.392 | 0.402 | |
DCMH | 0.677 | 0.703 | 0.725 | 0.590 | 0.603 | 0.609 | 0.497 | 0.506 | 0.511 | 0.453 | 0.473 | 0.484 | |
SSAH | 0.797 | 0.809 | 0.810 | 0.636 | 0.636 | 0.637 | 0.550 | 0.577 | 0.576 | 0.544 | 0.537 | 0.549 | |
MLCAH | 0.808 | 0.816 | 0.828 | 0.645 | 0.640 | 0.653 | 0.582 | 0.597 | 0.595 | 0.550 | 0.565 | 0.563 | |
MLSPH | 0.808 | 0.824 | 0.834 | 0.641 | 0.660 | 0.673 | 0.574 | 0.592 | 0.601 | 0.426 | 0.426 | 0.475 | |
本文方法 | 0.821 | 0.832 | 0.836 | 0.662 | 0.682 | 0.692 | 0.613 | 0.655 | 0.675 | 0.557 | 0.587 | 0.602 | |
文本 检索 图像 | SCM | 0.707 | 0.714 | 0.719 | 0.567 | 0.583 | 0.597 | 0.492 | 0.556 | 0.568 | 0.345 | 0.341 | 0.347 |
SePH | 0.722 | 0.723 | 0.727 | 0.487 | 0.493 | 0.488 | 0.485 | 0.495 | 0.485 | 0.442 | 0.456 | 0.465 | |
GSPH | 0.628 | 0.646 | 0.650 | 0.500 | 0.523 | 0.535 | 0.544 | 0.604 | 0.646 | 0.418 | 0.445 | 0.464 | |
DCMH | 0.705 | 0.717 | 0.724 | 0.620 | 0.634 | 0.643 | 0.507 | 0.520 | 0.527 | 0.519 | 0.538 | 0.549 | |
SSAH | 0.782 | 0.797 | 0.799 | 0.653 | 0.676 | 0.683 | 0.552 | 0.578 | 0.578 | 0.531 | 0.534 | 0.566 | |
MLCAH | 0.793 | 0.811 | 0.807 | 0.679 | 0.698 | 0.704 | 0.569 | 0.593 | 0.583 | 0.554 | 0.563 | 0.566 | |
MLSPH | 0.785 | 0.804 | 0.815 | 0.643 | 0.663 | 0.672 | 0.556 | 0.586 | 0.613 | 0.435 | 0.452 | 0.473 | |
本文方法 | 0.816 | 0.825 | 0.831 | 0.685 | 0.706 | 0.713 | 0.617 | 0.659 | 0.672 | 0.571 | 0.603 | 0.620 |
模型 | 图像检索文本 | 文本检索图像 | ||||
---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | |
通道 | 0.808 | 0.828 | 0.829 | 0.811 | 0.818 | 0.825 |
空间 | 0.809 | 0.826 | 0.830 | 0.809 | 0.819 | 0.824 |
混合 | 0.821 | 0.832 | 0.836 | 0.817 | 0.825 | 0.831 |
Tab. 3 Comparison of mAP experimental results of attention networks
模型 | 图像检索文本 | 文本检索图像 | ||||
---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | |
通道 | 0.808 | 0.828 | 0.829 | 0.811 | 0.818 | 0.825 |
空间 | 0.809 | 0.826 | 0.830 | 0.809 | 0.819 | 0.824 |
混合 | 0.821 | 0.832 | 0.836 | 0.817 | 0.825 | 0.831 |
任务 | 方法 | MIRFKICKR-25K | NUS-WIDE | MS COCO | IAPR TC-12 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | ||
图像检索 文本 | AHAH | 0.821 | 0.832 | 0.836 | 0.662 | 0.682 | 0.692 | 0.613 | 0.655 | 0.685 | 0.557 | 0.587 | 0.602 |
AHAH-1 | 0.815 | 0.823 | 0.829 | 0.651 | 0.674 | 0.685 | 0.607 | 0.44 | 0.676 | 0.548 | 0.579 | 0.595 | |
AHAH-2 | 0.817 | 0.824 | 0.830 | 0.655 | 0.677 | 0.684 | 0.609 | 0.650 | 0.680 | 0.550 | 0.582 | 0.596 | |
文本检索 图像 | AHAH | 0.816 | 0.825 | 0.831 | 0.685 | 0.706 | 0.713 | 0.617 | 0.659 | 0.682 | 0.571 | 0.603 | 0.620 |
AHAH-1 | 0.811 | 0.820 | 0.823 | 0.678 | 0.695 | 0.702 | 0.610 | 0.650 | 0.673 | 0.562 | 0.595 | 0.609 | |
AHAH-2 | 0.810 | 0.813 | 0.825 | 0.681 | 0.699 | 0.706 | 0.607 | 0.654 | 0.677 | 0.566 | 0.593 | 0.614 |
Tab. 4 mAP results of ablation experiments of AHAH
任务 | 方法 | MIRFKICKR-25K | NUS-WIDE | MS COCO | IAPR TC-12 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | ||
图像检索 文本 | AHAH | 0.821 | 0.832 | 0.836 | 0.662 | 0.682 | 0.692 | 0.613 | 0.655 | 0.685 | 0.557 | 0.587 | 0.602 |
AHAH-1 | 0.815 | 0.823 | 0.829 | 0.651 | 0.674 | 0.685 | 0.607 | 0.44 | 0.676 | 0.548 | 0.579 | 0.595 | |
AHAH-2 | 0.817 | 0.824 | 0.830 | 0.655 | 0.677 | 0.684 | 0.609 | 0.650 | 0.680 | 0.550 | 0.582 | 0.596 | |
文本检索 图像 | AHAH | 0.816 | 0.825 | 0.831 | 0.685 | 0.706 | 0.713 | 0.617 | 0.659 | 0.682 | 0.571 | 0.603 | 0.620 |
AHAH-1 | 0.811 | 0.820 | 0.823 | 0.678 | 0.695 | 0.702 | 0.610 | 0.650 | 0.673 | 0.562 | 0.595 | 0.609 | |
AHAH-2 | 0.810 | 0.813 | 0.825 | 0.681 | 0.699 | 0.706 | 0.607 | 0.654 | 0.677 | 0.566 | 0.593 | 0.614 |
1 | ARYA S, MOUNT D M, NETANYAHU N S, et al. An optimal algorithm for approximate nearest neighbor searching fixed dimensions[J]. Journal of the ACM, 1998, 45(6): 891-923. 10.1145/293347.293348 |
2 | SONG J K, YANG Y, YANG Y, et al. Inter-media hashing for large-scale retrieval from heterogeneous data sources[C]// Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2013: 785-796. 10.1145/2463676.2465274 |
3 | DING G G, GUO Y C, ZHOU J L. Collective matrix factorization hashing for multimodal data[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 2083-2090. 10.1109/cvpr.2014.267 |
4 | ZHOU J, DING G, GUO Y. Latent semantic sparse hashing for cross-modal similarity search[C]// Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2014: 415-424. 10.1145/2600428.2609610 |
5 | MANDAL D, CHAUDHURY K N, BISWAS S. Generalized semantic preserving hashing for n-label cross-modal retrieval [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2633-2641. 10.1109/cvpr.2017.282 |
6 | ZHANG D Q, LI W J. Large-scale supervised multimodal hashing with semantic correlation maximization[C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2014: 2177-2183. 10.1609/aaai.v28i1.8995 |
7 | MANDAL D, CHAUDHURY K N, BISWAS S. Generalized semantic preserving hashing for cross-modal retrieval[J]. IEEE Transactions on Image Processing, 2019, 28(1): 102-112. 10.1109/tip.2018.2863040 |
8 | WANG H T, MENG M, CHEN H, et al. Supervised consistent and specific hashing[C]// Proceedings of the 2019 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2019: 1822-1827. 10.1109/icme.2019.00313 |
9 | JIANG Q Y, LI W J. Deep cross-modal hashing[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3270-3278. 10.1109/cvpr.2017.348 |
10 | YANG E K, DENG C, LIU W, et al. Pairwise relationship guided deep hashing for cross-modal retrieval[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2017: 1618-1625. 10.1609/aaai.v31i1.10719 |
11 | LIN Q M, CAO W M, HE Z H, et al. Semantic deep cross-modal hashing[J]. Neurocomputing, 2020, 396: 113-122. 10.1016/j.neucom.2020.02.043 |
12 | LIU H, FENG Y, ZHOU M L, et al. Semantic ranking structure preserving for cross-modal retrieval[J]. Applied Intelligence, 2021, 51(3): 1802-1812. 10.1007/s10489-020-01930-x |
13 | LI C, DENG C, LI N, et al. Self-supervised adversarial hashing networks for cross-modal retrieval[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4242-4251. 10.1109/cvpr.2018.00446 |
14 | MA X H, ZHANG T Z, XU C S. Multi-level correlation adversarial hashing for cross-modal retrieval[J]. IEEE Transactions on Multimedia, 2020, 22(12): 3101-3114. 10.1109/tmm.2020.2969792 |
15 | ZOU X T, WANG X Z, BAKKER E M, et al. Multi-label semantics preserving based deep cross-modal hashing[J]. Signal Processing: Image Communication, 2021, 93: No.116131. 10.1016/j.image.2020.116131 |
16 | 刘芳名,张鸿. 基于多级语义的判别式跨模态哈希检索算法[J]. 计算机应用, 2021, 41(8): 2187-2192. 10.11772/j.issn.1001-9081.2020101607 |
LIU F M, ZHANG H. Cross-modal retrieval algorithm based on multi-level semantic discriminative guided hashing[J]. Journal of Computer Applications, 2021, 41(8): 2187-2192. 10.11772/j.issn.1001-9081.2020101607 | |
17 | 张成,万源,强浩鹏. 基于知识蒸馏的深度无监督离散跨模态哈希[J].计算机应用, 2021, 41(9):2523-2531. 10.11772/j.issn.1001-9081.2020111785 |
ZHANG C, WAN Y, QIANG H P. Deep unsupervised discrete cross-modal hashing based on knowledge distillation[J]. Journal of Computer Applications, 2021, 41(9):2523-2531. 10.11772/j.issn.1001-9081.2020111785 | |
18 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 |
19 | LI X, WANG W H, HU X L, et al. Selective kernel networks[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 510-519. 10.1109/cvpr.2019.00060 |
20 | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11211. Cham: Springer, 2018: 3-19. |
21 | HUANG Z L, WANG X G, HUANG L C, et al. CCNet: criss-cross attention for semantic segmentation[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 603-612. 10.1109/iccv.2019.00069 |
22 | FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3141-3149. 10.1109/cvpr.2019.00326 |
[1] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[2] | Guanglei YAO, Juxia XIONG, Guowu YANG. Flower pollination algorithm based on neural network optimization [J]. Journal of Computer Applications, 2024, 44(9): 2829-2837. |
[3] | Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769. |
[4] | Qinzhuang ZHAO, Hongye TAN. Time series causal inference method based on adaptive threshold learning [J]. Journal of Computer Applications, 2024, 44(9): 2660-2666. |
[5] | Le YANG, Damin ZHANG, Qing HE, Jiaxin DENG, Fengqin ZUO. Application of improved hunter-prey optimization algorithm in WSN coverage [J]. Journal of Computer Applications, 2024, 44(8): 2506-2513. |
[6] | Rui SHI, Yong LI, Yanhan ZHU. Adversarial sample attack algorithm of modulation signal based on equalization of feature gradient [J]. Journal of Computer Applications, 2024, 44(8): 2521-2527. |
[7] | Hang XU, Zhi YANG, Xingyuan CHEN, Bing HAN, Xuehui DU. Coverage-guided fuzzing based on adaptive sensitive region mutation [J]. Journal of Computer Applications, 2024, 44(8): 2528-2535. |
[8] | Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG. Time series classification method based on multi-scale cross-attention fusion in time-frequency domain [J]. Journal of Computer Applications, 2024, 44(6): 1842-1847. |
[9] | Yan LI, Dazhi PAN, Siqing ZHENG. Improved adaptive large neighborhood search algorithm for multi-depot vehicle routing problem with time window [J]. Journal of Computer Applications, 2024, 44(6): 1897-1904. |
[10] | Jinfu WU, Yi LIU. Fast adversarial training method based on random noise and adaptive step size [J]. Journal of Computer Applications, 2024, 44(6): 1807-1815. |
[11] | Bin XIAO, Mo YANG, Min WANG, Guangyuan QIN, Huan LI. Domain generalization method of phase-frequency fusion from independent perspective [J]. Journal of Computer Applications, 2024, 44(4): 1002-1009. |
[12] | Haoran WANG, Dan YU, Yuli YANG, Yao MA, Yongle CHEN. Domain transfer intrusion detection method for unknown attacks on industrial control systems [J]. Journal of Computer Applications, 2024, 44(4): 1158-1165. |
[13] | Wei LI, Ling CHEN, Xiuyuan XU, Min ZHU, Jixiang GUO, Kai ZHOU, Hao NIU, Yuchen ZHANG, Shanye YI, Yi ZHANG, Fengming LUO. Interstitial lung disease segmentation algorithm based on multi-task learning [J]. Journal of Computer Applications, 2024, 44(4): 1285-1293. |
[14] | Xiuxi WEI, Maosong PENG, Huajuan HUANG. Node coverage optimization of wireless sensor network based on multi-strategy improved butterfly optimization algorithm [J]. Journal of Computer Applications, 2024, 44(4): 1009-1017. |
[15] | Yidi LIU, Zihao WEN, Fuxiang REN, Shiyin LI, Deyu TANG. Self-adaptive spherical evolution for prediction of drug target interaction [J]. Journal of Computer Applications, 2024, 44(3): 989-994. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||