计算机应用 ›› 2021, Vol. 41 ›› Issue (9): 2523-2531.DOI: 10.11772/j.issn.1001-9081.2020111785

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

基于知识蒸馏的深度无监督离散跨模态哈希

张成, 万源, 强浩鹏   

  1. 武汉理工大学 理学院, 武汉 430070
  • 收稿日期:2020-11-14 修回日期:2020-12-21 出版日期:2021-09-10 发布日期:2021-05-08
  • 通讯作者: 万源
  • 作者简介:张成(1996-),男,湖北孝感人,硕士研究生,主要研究方向:机器学习、深度学习、大规模多媒体数据检索;万源(1976-),女,湖北武汉人,教授,博士,CFF会员,主要研究方向:机器学习、图像处理、模式识别;强浩鹏(1995-),男,河北石家庄人,硕士研究生,主要研究方向:机器学习、深度学习、大规模多媒体数据检索。
  • 基金资助:
    中央高校基本科研业务费专项资金资助项目(2019IB010)。

Deep unsupervised discrete cross-modal hashing based on knowledge distillation

ZHANG Cheng, WAN Yuan, QIANG Haopeng   

  1. School of Science, Wuhan University of Technology, Wuhan Hubei 430070, China
  • Received:2020-11-14 Revised:2020-12-21 Online:2021-09-10 Published:2021-05-08
  • Supported by:
    This work is partially supported by the Fundamental Research Funds for the Central Universities (2019IB010).

摘要: 跨模态哈希因其低存储花费和高检索效率得到了广泛的关注。现有的大部分跨模态哈希方法需要额外的手工标签来提供实例间的关联信息,然而,预训练好的深度无监督跨模态哈希方法学习到的深度特征同样能提供相似信息;且哈希码学习过程中放松了离散约束,造成较大的量化损失。针对以上两个问题,提出基于知识蒸馏的深度无监督离散跨模态哈希(DUDCH)方法。首先,结合知识蒸馏中知识迁移的思想,利用预训练无监督老师模型潜藏的关联信息以重构对称相似度矩阵,从而代替手工标签帮助有监督学生模型训练;其次,采用离散循环坐标下降法(DCC)迭代更新离散哈希码,以此减少神经网络学习到的实值哈希码与离散哈希码间的量化损失;最后,采用端到端神经网络作为老师模型,构建非对称神经网络作为学生模型,从而降低组合模型的时间复杂度。在两个常用的基准数据集MIRFLICKR-25K和NUS-WIDE上的实验结果表明,该方法相较于深度联合语义重构哈希(DJSRH)方法在图像检索文本/文本检索图像两个任务上的平均精度均值(mAP)分别平均提升了2.83个百分点/0.70个百分点和6.53个百分点/3.95个百分点,充分体现了其在大规模跨模态数据检索中的有效性。

关键词: 跨模态哈希, 知识蒸馏, 相似度矩阵重构, 离散循环坐标下降法, 非对称

Abstract: Cross-modal hashing has attracted much attention due to its low storage cost and high retrieval efficiency. Most of the existing cross-modal hashing methods require the inter-instance association information provided by additional manual labels. However, the deep features learned by pre-trained deep unsupervised cross-modal hashing methods can also provide similar information. In addition, the discrete constraints are relaxed in the learning process of Hash codes, resulting in a large quantization loss. To solve the above two issues, a Deep Unsupervised Discrete Cross-modal Hashing (DUDCH) method based on knowledge distillation was proposed. Firstly, combined with the idea of knowledge transfer in knowledge distillation, the latent association information of the pre-trained unsupervised teacher model was used to reconstruct the symmetric similarity matrix, so as to replace the manual labels to help the supervised student method training. Secondly, the Discrete Cyclic Coordinate descent (DCC) was adopted to update the discrete Hash codes iteratively, thereby reducing the quantization loss between the real-value Hash codes learned by neural network and the discrete Hash codes. Finally, with the end-to-end neural network adopted as teacher model and the asymmetric neural network constructed as student model, the time complexity of the combination model was reduced. Experimental results on two commonly used benchmark datasets MIRFLICKR-25K and NUS-WIDE show that compared with Deep Joint-Semantics Reconstructing Hashing (DJSRH), the proposed method has the mean Average Precision (mAP) in image-to-text/text-to-image tasks increased by 2.83 percentage points/0.70 percentage points and 6.53 percentage points/3.95 percentage points averagely and respectively, proving its effectiveness in large-scale cross-modal retrieval.

Key words: cross-modal hashing, knowledge distillation, reconstruction of similarity matrix, Discrete Cyclic Coordinate descent (DCC), asymmetric

中图分类号: