《计算机应用》唯一官方网站 ›› 2021, Vol. 41 ›› Issue (4): 1106-1112.DOI: 10.11772/j.issn.1001-9081.2020071047

所属专题: 数据科学与技术 综述

• 数据科学与技术 • 上一篇    下一篇

在线哈希算法研究综述

郭一村, 陈华辉   

  1. 宁波大学 信息科学与工程学院, 浙江 宁波 315000
  • 收稿日期:2020-07-21 修回日期:2020-10-19 发布日期:2020-12-23 出版日期:2021-04-10
  • 通讯作者: 郭一村
  • 作者简介:郭一村(1995—),男,河南平顶山人,硕士研究生,主要研究方向:数据挖掘、深度学习;陈华辉(1964—),男,浙江宁波人,教授,博士,CCF会员,主要研究方向:数据库、大数据处理。
  • 基金资助:
    国家自然科学基金资助项目(61572266)。

Survey on online hashing algorithm

GUO Yicun, CHEN Huahui   

  1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo Zhejiang 315000, China
  • Received:2020-07-21 Revised:2020-10-19 Online:2020-12-23 Published:2021-04-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61572266).

摘要: 在当前大规模数据检索任务中,学习型哈希方法能够学习紧凑的二进制编码,在节省存储空间的同时能快速地计算海明空间内的相似度,因此近似最近邻检索常使用哈希的方式来完善快速最近邻检索机制。对于目前大多数哈希方法都采用离线学习模型进行批处理训练,在大规模流数据的环境下无法适应可能出现的数据变化而使得检索效率降低的问题,提出在线哈希方法并学习适应性的哈希函数,从而在输入数据的过程中连续学习,并且能实时地应用于相似性检索。首先,阐释了学习型哈希的基本原理和实现在线哈希的内在要求;接着,从在线条件下流数据的读取模式、学习模式以及模型更新模式等角度介绍在线哈希不同的学习方式;而后,将在线学习算法分为六类:基于主-被动算法、基于矩阵分解技术、基于无监督聚类、基于相似性监督、基于互信息度量和基于码本监督,并且分析这些算法的优缺点及特点;最后,总结和讨论了在线哈希的发展方向。

关键词: 在线学习, 学习型哈希, 无监督学习, 监督学习, 最近邻检索

Abstract: In the current large-scale data retrieval tasks, learning to hash methods can learn compact binary codes, which saves storage space and can quickly calculate the similarity in Hamming space. Therefore, for approximate nearest neighbor search, hashing methods are often used to improve the mechanism of fast nearest neighbor search. In most current hashing methods, the offline learning models are used for batch training, which cannot adapt to possible data changes appeared in the environment of large-scale streaming data, resulting in reduction of retrieval efficiency. Therefore, the adaptive hash functions were proposed and learnt in online hashing methods, which realize the continuous learning in the process of inputting data and make the methods can be applied to similarity retrieval in real-time. Firstly, the basic principles of learning to hash and the inherent requirements to realize online hashing were explained. Secondly, the different learning methods of online hashing were introduced from the perspectives such as the reading method, learning mode, and model update method of streaming data under online conditions. Thirdly, the online learning algorithms were further divided into six categories, that is, categories based on passive-aggressive algorithms, matrix factorization technology, unsupervised clustering, similarity supervision, mutual information measurement, codebook supervision respectively. And the advantages, disadvantages and characteristics of these algorithms were analyzed. Finally, the development directions of online hashing were summarized and discussed.

Key words: online learning, learning to hash, unsupervised learning, supervised learning, nearest neighbor search

中图分类号: