《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2461-2470.DOI: 10.11772/j.issn.1001-9081.2021061017

• 数据科学与技术 • 上一篇    

深度非对称离散跨模态哈希方法

王晓雨, 王展青(), 熊威   

  1. 武汉理工大学 理学院,武汉 430070
  • 收稿日期:2021-06-15 修回日期:2021-09-15 接受日期:2021-10-12 发布日期:2021-12-27 出版日期:2022-08-10
  • 通讯作者: 王展青
  • 作者简介:王晓雨(1997—),女,河南汝州人,硕士研究生,主要研究方向:机器学习、跨模态检索;
    王展青(1965—),男,湖北武汉人,教授,博士,主要研究方向:模式识别、数字图像处理、信息计算;
    熊威(1996—),男,湖北黄石人,硕士研究生,主要研究方向:图像处理、跨模态检索。
  • 基金资助:
    中央高校基本科研业务费专项资金资助项目(2019ZY232)

Deep asymmetric discrete cross-modal hashing method

Xiaoyu WANG, Zhanqing WANG(), Wei XIONG   

  1. School of Science,Wuhan University of Technology,Wuhan Hubei 430070,China
  • Received:2021-06-15 Revised:2021-09-15 Accepted:2021-10-12 Online:2021-12-27 Published:2022-08-10
  • Contact: Zhanqing WANG
  • About author:WANG Xiaoyu, born in 1997, M. S. candidate. Her research interests include machine learning, cross-modal retrieval.
    WANG Zhanqing, born in 1965, Ph. D., professor. His research interests include pattern recognition, digital image processing, information computing.
    XIONG Wei, born in 1996, M. S. candidate. His research interests include image processing, cross-modal retrieval.
  • Supported by:
    Fundamental Research Funds for Central Universities(2019ZY232)

摘要:

大多数深度监督跨模态哈希方法采用对称的方式学习哈希码,导致其不能有效利用大规模数据集中的监督信息;并且对于哈希码的离散约束问题,常采用的基于松弛的策略会产生较大的量化误差,导致哈希码次优。针对以上问题,提出深度非对称离散跨模态哈希(DADCH)方法。首先构造了深度神经网络和字典学习相结合的非对称学习框架,以学习查询实例和数据库实例的哈希码,从而更有效地挖掘数据的监督信息,减少模型的训练时间;然后采用离散优化算法逐列优化哈希码矩阵,降低哈希码二值化的量化误差;同时为充分挖掘数据的语义信息,在神经网络中添加了标签层进行标签预测,并利用语义信息嵌入将不同类别的判别信息通过线性映射嵌入到哈希码中,增强哈希码的判别性。实验结果表明,在IAPR-TC12、MIRFLICKR-25K和NUS-WIDE数据集上,哈希码长度为64 bit时,所提方法在图像检索文本时的平均精度均值(mAP)较近年来提出的先进的深度跨模态检索方法——自监督对抗哈希(SSAH)分别高出约11.6、5.2、14.7个百分点。

关键词: 跨模态检索, 深度神经网络, 非对称哈希, 语义信息嵌入, 离散优化

Abstract:

Most deep supervised cross-modal hashing methods adopt a symmetric strategy to learn hash code, so that the supervision information in large-scale datasets cannot be used effectively. And for the problem of discrete constraints of hash code, relaxation-based strategy is typically adopted, resulting in large quantization error which leads to the sub-optimal hash code. Aiming at the above problems, a Deep Asymmetric Discrete Cross-modal Hashing (DADCH) method was proposed. Firstly, an asymmetric learning framework combining deep neural networks and dictionary learning was proposed to learn the hash code of query instances and database instances, thereby mining the supervision information of the data more effectively and reducing the training time of the model. Then, the discrete optimization algorithm was used to optimize the hash code matrix column by column to reduce the quantization error of the hash code binarization. At the same time, in order to fully mine the semantic information of the data, a label layer was added to the neural network for label prediction, and the semantic information embedding was used to embed discrimination information of different categories into the hash code through linear mapping to make the hash code more discriminative. Experimental results show that on IAPR-TC12, MIRFLICKR-25K and NUS-WIDE datasets, the mean Average Precision (mAP) of the proposed method on retrieval text by image is about 11.6, 5.2 and 14.7 percentage points higher than that of the advanced deep cross-modal retrieval method — Self-Supervised Adversarial Hashing (SSAH) proposed in recent years respectively.

Key words: cross-modal retrieval, deep neural network, asymmetric hashing, semantic information embedding, discrete optimization

中图分类号: