Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (3): 713-723.DOI: 10.11772/j.issn.1001-9081.2021040911

• 2021 CCF Conference on Artificial Intelligence (CCFAI 2021) • Previous Articles    

Clustering based on discrete hashing

Shuting XUAN, Jinglei LIU()   

  1. School of Computer and Control Engineering,Yantai University,Yantai Shandong 264005,China
  • Received:2021-05-31 Revised:2021-06-27 Accepted:2021-06-29 Online:2022-04-09 Published:2022-03-10
  • Contact: Jinglei LIU
  • About author:XUAN Shuting, born in 1997, M. S. candidate. Her research interests include clustering analysis based on discrete hashing.
  • Supported by:
    National Natural Science Foundation of China(62072391);Natural Science Foundation of Shandong Province(ZR2020MF148)

基于离散哈希的聚类

轩书婷, 刘惊雷()   

  1. 烟台大学 计算机与控制工程学院,山东 烟台 264005
  • 通讯作者: 刘惊雷
  • 作者简介:轩书婷(1997—),女,山东济宁人,硕士研究生,主要研究方向:基于离散哈希的聚类分析;
  • 基金资助:
    国家自然科学基金资助项目(62072391);山东省自然科学基金资助项目(ZR2020MF148)

Abstract:

The traditional clustering methods are carried out in the data space, and clustered data is high-dimensional. In order to solve these two problems, a new binary image clustering method, Clustering based on Discrete Hashing (CDH), was proposed. To reduce the dimension of data, L21?norm was used in this framework to realize adaptive feature selection. At the same time, the data was mapped into binary Hamming space by the hashing method. Then, the sparse binary matrix was decomposed into a low-rank matrix in the Hamming space to complete fast image clustering. Finally, an optimization scheme that could converge quickly was used to solve the objective function. Experimental results on image datasets (Caltech101, Yale, COIL20, ORL) show that this method can effectively improve the efficiency of clustering. Compared with the traditional clustering methods,such as K-means and Spectral Clustering (SC),the time efficiency of CDH was improved by 87 and 98 percentage points respectively in the Gabor view of the Caltech101 dataset when processing high-dimensional data.

Key words: hashing method, automatic feature selection, sparse binary matrix, L21-norm, convergent optimization, Hamming space

摘要:

传统的聚类方法是在数据空间进行,且聚类数据的维度较高。为了解决这两个问题,提出了一种新的二进制图像聚类方法——基于离散哈希的聚类(CDH)。该框架通过L21范数实现自适应的特征选择,从而降低数据的维度;同时通过哈希方法将数据映射到二进制的汉明空间,随后,在汉明空间中对稀疏的二进制矩阵进行低秩矩阵分解,完成图像的快速聚类;最后使用可以快速收敛的优化方案来对目标函数进行优化求解。在Caltech101、Yale、COIL20、ORL图像数据集上的实验结果表明,该方法可以有效提升聚类效率。在Caltech101数据集的Gabor视图,与传统的K-means、谱聚类方法相比,在处理高维度数据时,CDH的时间效率分别提高了约87和98个百分点。

关键词: 哈希方法, 自动特征选择, 稀疏二进制矩阵, L21范数, 收敛优化, 汉明空间

CLC Number: