Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3663-3670.DOI: 10.11772/j.issn.1001-9081.2021101806
• Artificial intelligence • Previous Articles
Xinghua LIU1,2, Guitao CAO3, Qiubin LIN1,2, Wenming CAO1,2()
Received:
2021-10-22
Revised:
2021-12-20
Accepted:
2021-12-23
Online:
2021-12-31
Published:
2022-12-10
Contact:
Wenming CAO
About author:
LIU Xinghua, born in 1995, M. S. candidate. His research interests include multimedia information processing, cross-modal retrieval.Supported by:
柳兴华1,2, 曹桂涛3, 林秋斌1,2, 曹文明1,2()
通讯作者:
曹文明
作者简介:
柳兴华(1995—),男,河南信阳人,硕士研究生,主要研究方向:多媒体信息处理、跨模态检索基金资助:
CLC Number:
Xinghua LIU, Guitao CAO, Qiubin LIN, Wenming CAO. Adaptive hybrid attention hashing for deep cross-modal retrieval[J]. Journal of Computer Applications, 2022, 42(12): 3663-3670.
柳兴华, 曹桂涛, 林秋斌, 曹文明. 自适应混合注意力深度跨模态哈希[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3663-3670.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021101806
数据集 | 训练集 | 检索集 | 索引集 | 文本 |
---|---|---|---|---|
MIRFLICKR-25k | 10 000 | 18 015 | 2 000 | 1 386 |
NUS-WIDE | 10 500 | 188 321 | 2 100 | 1 000 |
MSCOCO | 10 000 | 80 000 | 5 000 | 2 000 |
IAPR TC-12 | 10 000 | 18 000 | 2 000 | 2 912 |
Tab. 1 Detailed configuration of experimental datasets
数据集 | 训练集 | 检索集 | 索引集 | 文本 |
---|---|---|---|---|
MIRFLICKR-25k | 10 000 | 18 015 | 2 000 | 1 386 |
NUS-WIDE | 10 500 | 188 321 | 2 100 | 1 000 |
MSCOCO | 10 000 | 80 000 | 5 000 | 2 000 |
IAPR TC-12 | 10 000 | 18 000 | 2 000 | 2 912 |
检索任务 | 方法 | MIRFLICKR-15K | NUS-WIDE | MS COCO | IAPR TC-12 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | ||
图像 检索 文本 | SCM | 0.685 | 0.693 | 0.697 | 0.497 | 0.502 | 0.499 | 0.498 | 0.556 | 0.565 | 0.369 | 0.367 | 0.380 |
SePH | 0.709 | 0.711 | 0.716 | 0.479 | 0.501 | 0.492 | 0.489 | 0.502 | 0.499 | 0.444 | 0.456 | 0.464 | |
GSPH | 0.607 | 0.619 | 0.623 | 0.402 | 0.415 | 0.421 | 0.443 | 0.473 | 0.484 | 0.372 | 0.392 | 0.402 | |
DCMH | 0.677 | 0.703 | 0.725 | 0.590 | 0.603 | 0.609 | 0.497 | 0.506 | 0.511 | 0.453 | 0.473 | 0.484 | |
SSAH | 0.797 | 0.809 | 0.810 | 0.636 | 0.636 | 0.637 | 0.550 | 0.577 | 0.576 | 0.544 | 0.537 | 0.549 | |
MLCAH | 0.808 | 0.816 | 0.828 | 0.645 | 0.640 | 0.653 | 0.582 | 0.597 | 0.595 | 0.550 | 0.565 | 0.563 | |
MLSPH | 0.808 | 0.824 | 0.834 | 0.641 | 0.660 | 0.673 | 0.574 | 0.592 | 0.601 | 0.426 | 0.426 | 0.475 | |
本文方法 | 0.821 | 0.832 | 0.836 | 0.662 | 0.682 | 0.692 | 0.613 | 0.655 | 0.675 | 0.557 | 0.587 | 0.602 | |
文本 检索 图像 | SCM | 0.707 | 0.714 | 0.719 | 0.567 | 0.583 | 0.597 | 0.492 | 0.556 | 0.568 | 0.345 | 0.341 | 0.347 |
SePH | 0.722 | 0.723 | 0.727 | 0.487 | 0.493 | 0.488 | 0.485 | 0.495 | 0.485 | 0.442 | 0.456 | 0.465 | |
GSPH | 0.628 | 0.646 | 0.650 | 0.500 | 0.523 | 0.535 | 0.544 | 0.604 | 0.646 | 0.418 | 0.445 | 0.464 | |
DCMH | 0.705 | 0.717 | 0.724 | 0.620 | 0.634 | 0.643 | 0.507 | 0.520 | 0.527 | 0.519 | 0.538 | 0.549 | |
SSAH | 0.782 | 0.797 | 0.799 | 0.653 | 0.676 | 0.683 | 0.552 | 0.578 | 0.578 | 0.531 | 0.534 | 0.566 | |
MLCAH | 0.793 | 0.811 | 0.807 | 0.679 | 0.698 | 0.704 | 0.569 | 0.593 | 0.583 | 0.554 | 0.563 | 0.566 | |
MLSPH | 0.785 | 0.804 | 0.815 | 0.643 | 0.663 | 0.672 | 0.556 | 0.586 | 0.613 | 0.435 | 0.452 | 0.473 | |
本文方法 | 0.816 | 0.825 | 0.831 | 0.685 | 0.706 | 0.713 | 0.617 | 0.659 | 0.672 | 0.571 | 0.603 | 0.620 |
Tab. 2 mAP comparison of each algorithm on four public datasets
检索任务 | 方法 | MIRFLICKR-15K | NUS-WIDE | MS COCO | IAPR TC-12 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | ||
图像 检索 文本 | SCM | 0.685 | 0.693 | 0.697 | 0.497 | 0.502 | 0.499 | 0.498 | 0.556 | 0.565 | 0.369 | 0.367 | 0.380 |
SePH | 0.709 | 0.711 | 0.716 | 0.479 | 0.501 | 0.492 | 0.489 | 0.502 | 0.499 | 0.444 | 0.456 | 0.464 | |
GSPH | 0.607 | 0.619 | 0.623 | 0.402 | 0.415 | 0.421 | 0.443 | 0.473 | 0.484 | 0.372 | 0.392 | 0.402 | |
DCMH | 0.677 | 0.703 | 0.725 | 0.590 | 0.603 | 0.609 | 0.497 | 0.506 | 0.511 | 0.453 | 0.473 | 0.484 | |
SSAH | 0.797 | 0.809 | 0.810 | 0.636 | 0.636 | 0.637 | 0.550 | 0.577 | 0.576 | 0.544 | 0.537 | 0.549 | |
MLCAH | 0.808 | 0.816 | 0.828 | 0.645 | 0.640 | 0.653 | 0.582 | 0.597 | 0.595 | 0.550 | 0.565 | 0.563 | |
MLSPH | 0.808 | 0.824 | 0.834 | 0.641 | 0.660 | 0.673 | 0.574 | 0.592 | 0.601 | 0.426 | 0.426 | 0.475 | |
本文方法 | 0.821 | 0.832 | 0.836 | 0.662 | 0.682 | 0.692 | 0.613 | 0.655 | 0.675 | 0.557 | 0.587 | 0.602 | |
文本 检索 图像 | SCM | 0.707 | 0.714 | 0.719 | 0.567 | 0.583 | 0.597 | 0.492 | 0.556 | 0.568 | 0.345 | 0.341 | 0.347 |
SePH | 0.722 | 0.723 | 0.727 | 0.487 | 0.493 | 0.488 | 0.485 | 0.495 | 0.485 | 0.442 | 0.456 | 0.465 | |
GSPH | 0.628 | 0.646 | 0.650 | 0.500 | 0.523 | 0.535 | 0.544 | 0.604 | 0.646 | 0.418 | 0.445 | 0.464 | |
DCMH | 0.705 | 0.717 | 0.724 | 0.620 | 0.634 | 0.643 | 0.507 | 0.520 | 0.527 | 0.519 | 0.538 | 0.549 | |
SSAH | 0.782 | 0.797 | 0.799 | 0.653 | 0.676 | 0.683 | 0.552 | 0.578 | 0.578 | 0.531 | 0.534 | 0.566 | |
MLCAH | 0.793 | 0.811 | 0.807 | 0.679 | 0.698 | 0.704 | 0.569 | 0.593 | 0.583 | 0.554 | 0.563 | 0.566 | |
MLSPH | 0.785 | 0.804 | 0.815 | 0.643 | 0.663 | 0.672 | 0.556 | 0.586 | 0.613 | 0.435 | 0.452 | 0.473 | |
本文方法 | 0.816 | 0.825 | 0.831 | 0.685 | 0.706 | 0.713 | 0.617 | 0.659 | 0.672 | 0.571 | 0.603 | 0.620 |
模型 | 图像检索文本 | 文本检索图像 | ||||
---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | |
通道 | 0.808 | 0.828 | 0.829 | 0.811 | 0.818 | 0.825 |
空间 | 0.809 | 0.826 | 0.830 | 0.809 | 0.819 | 0.824 |
混合 | 0.821 | 0.832 | 0.836 | 0.817 | 0.825 | 0.831 |
Tab. 3 Comparison of mAP experimental results of attention networks
模型 | 图像检索文本 | 文本检索图像 | ||||
---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | |
通道 | 0.808 | 0.828 | 0.829 | 0.811 | 0.818 | 0.825 |
空间 | 0.809 | 0.826 | 0.830 | 0.809 | 0.819 | 0.824 |
混合 | 0.821 | 0.832 | 0.836 | 0.817 | 0.825 | 0.831 |
任务 | 方法 | MIRFKICKR-25K | NUS-WIDE | MS COCO | IAPR TC-12 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | ||
图像检索 文本 | AHAH | 0.821 | 0.832 | 0.836 | 0.662 | 0.682 | 0.692 | 0.613 | 0.655 | 0.685 | 0.557 | 0.587 | 0.602 |
AHAH-1 | 0.815 | 0.823 | 0.829 | 0.651 | 0.674 | 0.685 | 0.607 | 0.44 | 0.676 | 0.548 | 0.579 | 0.595 | |
AHAH-2 | 0.817 | 0.824 | 0.830 | 0.655 | 0.677 | 0.684 | 0.609 | 0.650 | 0.680 | 0.550 | 0.582 | 0.596 | |
文本检索 图像 | AHAH | 0.816 | 0.825 | 0.831 | 0.685 | 0.706 | 0.713 | 0.617 | 0.659 | 0.682 | 0.571 | 0.603 | 0.620 |
AHAH-1 | 0.811 | 0.820 | 0.823 | 0.678 | 0.695 | 0.702 | 0.610 | 0.650 | 0.673 | 0.562 | 0.595 | 0.609 | |
AHAH-2 | 0.810 | 0.813 | 0.825 | 0.681 | 0.699 | 0.706 | 0.607 | 0.654 | 0.677 | 0.566 | 0.593 | 0.614 |
Tab. 4 mAP results of ablation experiments of AHAH
任务 | 方法 | MIRFKICKR-25K | NUS-WIDE | MS COCO | IAPR TC-12 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | ||
图像检索 文本 | AHAH | 0.821 | 0.832 | 0.836 | 0.662 | 0.682 | 0.692 | 0.613 | 0.655 | 0.685 | 0.557 | 0.587 | 0.602 |
AHAH-1 | 0.815 | 0.823 | 0.829 | 0.651 | 0.674 | 0.685 | 0.607 | 0.44 | 0.676 | 0.548 | 0.579 | 0.595 | |
AHAH-2 | 0.817 | 0.824 | 0.830 | 0.655 | 0.677 | 0.684 | 0.609 | 0.650 | 0.680 | 0.550 | 0.582 | 0.596 | |
文本检索 图像 | AHAH | 0.816 | 0.825 | 0.831 | 0.685 | 0.706 | 0.713 | 0.617 | 0.659 | 0.682 | 0.571 | 0.603 | 0.620 |
AHAH-1 | 0.811 | 0.820 | 0.823 | 0.678 | 0.695 | 0.702 | 0.610 | 0.650 | 0.673 | 0.562 | 0.595 | 0.609 | |
AHAH-2 | 0.810 | 0.813 | 0.825 | 0.681 | 0.699 | 0.706 | 0.607 | 0.654 | 0.677 | 0.566 | 0.593 | 0.614 |
1 | ARYA S, MOUNT D M, NETANYAHU N S, et al. An optimal algorithm for approximate nearest neighbor searching fixed dimensions[J]. Journal of the ACM, 1998, 45(6): 891-923. 10.1145/293347.293348 |
2 | SONG J K, YANG Y, YANG Y, et al. Inter-media hashing for large-scale retrieval from heterogeneous data sources[C]// Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2013: 785-796. 10.1145/2463676.2465274 |
3 | DING G G, GUO Y C, ZHOU J L. Collective matrix factorization hashing for multimodal data[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 2083-2090. 10.1109/cvpr.2014.267 |
4 | ZHOU J, DING G, GUO Y. Latent semantic sparse hashing for cross-modal similarity search[C]// Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2014: 415-424. 10.1145/2600428.2609610 |
5 | MANDAL D, CHAUDHURY K N, BISWAS S. Generalized semantic preserving hashing for n-label cross-modal retrieval [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2633-2641. 10.1109/cvpr.2017.282 |
6 | ZHANG D Q, LI W J. Large-scale supervised multimodal hashing with semantic correlation maximization[C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2014: 2177-2183. 10.1609/aaai.v28i1.8995 |
7 | MANDAL D, CHAUDHURY K N, BISWAS S. Generalized semantic preserving hashing for cross-modal retrieval[J]. IEEE Transactions on Image Processing, 2019, 28(1): 102-112. 10.1109/tip.2018.2863040 |
8 | WANG H T, MENG M, CHEN H, et al. Supervised consistent and specific hashing[C]// Proceedings of the 2019 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2019: 1822-1827. 10.1109/icme.2019.00313 |
9 | JIANG Q Y, LI W J. Deep cross-modal hashing[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3270-3278. 10.1109/cvpr.2017.348 |
10 | YANG E K, DENG C, LIU W, et al. Pairwise relationship guided deep hashing for cross-modal retrieval[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2017: 1618-1625. 10.1609/aaai.v31i1.10719 |
11 | LIN Q M, CAO W M, HE Z H, et al. Semantic deep cross-modal hashing[J]. Neurocomputing, 2020, 396: 113-122. 10.1016/j.neucom.2020.02.043 |
12 | LIU H, FENG Y, ZHOU M L, et al. Semantic ranking structure preserving for cross-modal retrieval[J]. Applied Intelligence, 2021, 51(3): 1802-1812. 10.1007/s10489-020-01930-x |
13 | LI C, DENG C, LI N, et al. Self-supervised adversarial hashing networks for cross-modal retrieval[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4242-4251. 10.1109/cvpr.2018.00446 |
14 | MA X H, ZHANG T Z, XU C S. Multi-level correlation adversarial hashing for cross-modal retrieval[J]. IEEE Transactions on Multimedia, 2020, 22(12): 3101-3114. 10.1109/tmm.2020.2969792 |
15 | ZOU X T, WANG X Z, BAKKER E M, et al. Multi-label semantics preserving based deep cross-modal hashing[J]. Signal Processing: Image Communication, 2021, 93: No.116131. 10.1016/j.image.2020.116131 |
16 | 刘芳名,张鸿. 基于多级语义的判别式跨模态哈希检索算法[J]. 计算机应用, 2021, 41(8): 2187-2192. 10.11772/j.issn.1001-9081.2020101607 |
LIU F M, ZHANG H. Cross-modal retrieval algorithm based on multi-level semantic discriminative guided hashing[J]. Journal of Computer Applications, 2021, 41(8): 2187-2192. 10.11772/j.issn.1001-9081.2020101607 | |
17 | 张成,万源,强浩鹏. 基于知识蒸馏的深度无监督离散跨模态哈希[J].计算机应用, 2021, 41(9):2523-2531. 10.11772/j.issn.1001-9081.2020111785 |
ZHANG C, WAN Y, QIANG H P. Deep unsupervised discrete cross-modal hashing based on knowledge distillation[J]. Journal of Computer Applications, 2021, 41(9):2523-2531. 10.11772/j.issn.1001-9081.2020111785 | |
18 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 |
19 | LI X, WANG W H, HU X L, et al. Selective kernel networks[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 510-519. 10.1109/cvpr.2019.00060 |
20 | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11211. Cham: Springer, 2018: 3-19. |
21 | HUANG Z L, WANG X G, HUANG L C, et al. CCNet: criss-cross attention for semantic segmentation[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 603-612. 10.1109/iccv.2019.00069 |
22 | FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3141-3149. 10.1109/cvpr.2019.00326 |
[1] | Zhihao XIAO, Zhihua HU, Lin ZHU. Hybrid adaptive large neighborhood search algorithm for solving time-dependent vehicle routing problem in cold chain logistics [J]. Journal of Computer Applications, 2022, 42(9): 2926-2935. |
[2] | Xiaoyu WANG, Zhanqing WANG, Wei XIONG. Deep asymmetric discrete cross-modal hashing method [J]. Journal of Computer Applications, 2022, 42(8): 2461-2470. |
[3] | Bo YANG, Hengwei ZHANG, Zheming LI, Kaiyong XU. Adversarial example generation method based on image flipping transform [J]. Journal of Computer Applications, 2022, 42(8): 2319-2325. |
[4] | Yinglü XUAN, Yuan WAN, Jiahui CHEN. Time series classification by LSTM based on multi-scale convolution and attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2343-2352. |
[5] | Kun LI, Qing HOU. Lightweight human pose estimation based on attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2407-2414. |
[6] | Shangwang LIU, Xinming ZHANG, Fei ZHANG. Image character editing method based on improved font adaptive neural network [J]. Journal of Computer Applications, 2022, 42(7): 2227-2238. |
[7] | Liang ZHU, Hua XU, Jinhai CHENG, Shen ZHU. Analysis and improvement of AdaBoost’s sample weight and combination coefficient [J]. Journal of Computer Applications, 2022, 42(7): 2022-2029. |
[8] | Wentao MAO, Guifang WU, Chao WU, Zhi DOU. Animation video generation model based on Chinese impressionistic style transfer [J]. Journal of Computer Applications, 2022, 42(7): 2162-2169. |
[9] | Houming FAN, Shuang MU, Lijun YUE. Collaborative optimization of automated guided vehicle scheduling and path planning considering conflict and congestion [J]. Journal of Computer Applications, 2022, 42(7): 2281-2291. |
[10] | Rongyuan CHEN, Jianmin YAO, Qun YAN, Zhixian LIN. Video playback speed recognition based on deep neural network [J]. Journal of Computer Applications, 2022, 42(7): 2043-2051. |
[11] | Zheng DI, Yifan CAO, Chao QIU, Tao LUO, Xiaofei WANG. New computing power network architecture and application case analysis [J]. Journal of Computer Applications, 2022, 42(6): 1656-1661. |
[12] | Zhonghua ZHANG, Fuyuan ZHAO, Junfeng GUO, Gaochang ZHAO. Integrated prediction model of Cauchy adaptive backtracking search and least square support vector machine [J]. Journal of Computer Applications, 2022, 42(6): 1829-1836. |
[13] | Weikang ZHANG, Sheng LIU, Qian HUANG, Yuxin GUO. Equilibrium optimizer considering distance factor and elite evolutionary strategy [J]. Journal of Computer Applications, 2022, 42(6): 1844-1851. |
[14] | Meng YU, Wentao HE, Xuchuan ZHOU, Mengtian CUI, Keqi WU, Wenjie ZHOU. Review of recommendation system [J]. Journal of Computer Applications, 2022, 42(6): 1898-1913. |
[15] | Man ZHANG, Zhengjun ZHANG, Junqi FENG, Tao YAN. Density peak clustering algorithm based on adaptive reachable distance [J]. Journal of Computer Applications, 2022, 42(6): 1914-1921. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||