《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (12): 3663-3670.DOI: 10.11772/j.issn.1001-9081.2021101806

• 人工智能 • 上一篇    

自适应混合注意力深度跨模态哈希

柳兴华1,2, 曹桂涛3, 林秋斌1,2, 曹文明1,2()   

  1. 1.深圳大学 电子与信息工程学院, 广东 深圳 518060
    2.广东省多媒体信息服务工程技术研究中心(深圳大学), 广东 深圳 518060
    3.华东师范大学 软件工程学院, 上海 200062
  • 收稿日期:2021-10-22 修回日期:2021-12-20 接受日期:2021-12-23 发布日期:2021-12-31 出版日期:2022-12-10
  • 通讯作者: 曹文明
  • 作者简介:柳兴华(1995—),男,河南信阳人,硕士研究生,主要研究方向:多媒体信息处理、跨模态检索
    曹桂涛(1970—),女,山东烟台人,教授,博士,CCF会员,主要研究方向:多媒体信息处理、人工智能
    林秋斌(1994—),男,广东潮州人,博士研究生,主要研究方向:多媒体信息处理、跨模态检索
  • 基金资助:
    国家自然科学基金资助项目(61771322)

Adaptive hybrid attention hashing for deep cross-modal retrieval

Xinghua LIU1,2, Guitao CAO3, Qiubin LIN1,2, Wenming CAO1,2()   

  1. 1.College of Electronics and Information Engineering,Shenzhen University,Shenzhen Guangdong 518060,China
    2.Guangdong Multimedia Information Service Engineering Technology Research Center (Shenzhen University),Shenzhen Guangdong 518060,China
    3.Software Engineering Institute,East China Normal University,Shanghai 200062,China
  • Received:2021-10-22 Revised:2021-12-20 Accepted:2021-12-23 Online:2021-12-31 Published:2022-12-10
  • Contact: Wenming CAO
  • About author:LIU Xinghua, born in 1995, M. S. candidate. His research interests include multimedia information processing, cross-modal retrieval.
    CAO Guitao,born in 1970, Ph. D., professor. Her research interests include multimedia information processing, artificial intelligence.
    LIN Qiubin, born in 1994, Ph. D. candidate. His research interests include multimedia information processing, cross-modal retrieval.
  • Supported by:
    National Natural Science Foundation of China(61771322)

摘要:

针对现有哈希方法在特征学习过程中无法区分各区域特征信息的重要程度和不能充分利用标签信息来深度挖掘模态间相关性的问题,提出了自适应混合注意力深度跨模态哈希检索(AHAH)模型。首先,通过自主学习得到的权重将通道注意力和空间注意力有机结合来强化对特征图中相关目标区域的关注度,同时弱化对不相关目标区域的关注度;其次,通过对模态标签进行统计分析,并使用所提出的相似度计算方法将相似度量化为0~1的数字以更精细地表示模态间的相似性。在4个常用的数据集MIRFLICKR-25K、NUS-WIDE、MSCOCO和IAPR TC-12上,当哈希码长度为16 bit时,与最先进的方法多标签语义保留哈希(MLSPH)相比,所提方法的检索平均准确率均值(mAP)分别提高了2.25%、1.75%、6.8%和2.15%。此外,消融实验和效率分析也证明了所提方法的有效性。

关键词: 跨模态检索, 哈希方法, 深度神经网络, 自适应, 混合注意力

Abstract:

In feature learning process, the existing hashing methods cannot distinguish the importance of the feature information of each region, and cannot utilize the label information to explore the correlation between modalities. Therefore, an Adaptive Hybrid Attention Hashing for deep cross-modal retrieval (AHAH) model was proposed. Firstly, channel attention and spatial attention were combined by the weights obtained by autonomous learning to strengthen the attention to the relevant target area and weaken the attention to the irrelevant target area. Secondly, the similarity between modalities was expressed more finely through the statistical analysis of modality labels and quantification of similarity degrees to numbers between 0 and 1 by using the proposed similarity measurement method. Compared with the most advanced method Multi-Label Semantics Preserving Hashing (MLSPH) on four commonly used datasets MIRFLICKR-25K, NUS-WIDE, MSCOCO, and IAPR TC-12, when the hash code length is 16 bit, the proposed method has the retrieval mean Average Precision (mAP) increased by 2.25%, 1.75%, 6.8%, and 2.15%, respectively. In addition, ablation experiments and efficiency analysis also prove the effectiveness of the proposed method.

Key words: cross-modal retrieval, hashing method, deep neural network, adaptive, hybrid attention

中图分类号: