《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (8): 2493-2498.DOI: 10.11772/j.issn.1001-9081.2022091469

• 先进计算 • 上一篇    下一篇

基于汉明距离的量子K-Means算法

钟静, 林晨, 盛志伟(), 张仕斌   

  1. 成都信息工程大学 网络空间安全学院,成都 610225
  • 收稿日期:2022-10-08 修回日期:2023-02-09 接受日期:2023-02-10 发布日期:2023-08-07 出版日期:2023-08-10
  • 通讯作者: 盛志伟
  • 作者简介:钟静(1997—),女,四川内江人,硕士研究生,主要研究方向:量子聚类算法、量子模糊算法
    林晨(1990—),男,甘肃张掖人,讲师,博士,主要研究方向:量子容错计算、量子机器学习
    张仕斌(1970—),男,重庆人,教授,博士,CCF高级会员,主要研究方向:量子计算与量子安全通信、网络与信息安全、人工智能与系统安全、区块链技术及应用。
  • 基金资助:
    国家自然科学基金资助项目(62076042)

Quantum K-Means algorithm based on Hamming distance

Jing ZHONG, Chen LIN, Zhiwei SHENG(), Shibin ZHANG   

  1. School of Cybersecurity,Chengdu University of Information Technology,Chengdu Sichuan 610225,China
  • Received:2022-10-08 Revised:2023-02-09 Accepted:2023-02-10 Online:2023-08-07 Published:2023-08-10
  • Contact: Zhiwei SHENG
  • About author:ZHONG Jing, born in 1997, M. S. candidate. Her research interests include quantum clustering algorithm, quantum fuzzy algorithm.
    LIN Chen, born in 1990, Ph. D., lecturer. His research interests include quantum fault tolerant computing, quantum machine learning.
    ZHANG Shibin, born in 1970, Ph. D., professor. His research interests include quantum computing and quantum secure communication, network and information security, artificial intelligence and system security, blockchain technology and applications.
  • Supported by:
    National Natural Science Foundation of China(62076042)

摘要:

K-Means算法在处理大规模异构数据时,通常使用欧氏距离来衡量数据点之间的相似度,然而这样存在效率低下以及计算复杂性过高的问题。受到汉明距离在处理数据相似性计算上存在显著优势的启发,提出一种基于汉明距离的量子K-Means(QKMH)算法来计算相似度。首先,将数据制备成量子态,并使用量子汉明距离计算待聚类点和K个聚类中心之间的相似度;然后,改进了Grover最小值搜索算法查找距离待聚类点最近的聚类中心;最后,循环以上步骤,直到达到规定迭代次数或者聚类中心不再改变。基于量子模拟计算框架QisKit,将提出的算法在MNIST手写数字数据集上进行了验证并与传统和改进的多种方法进行了对比,实验结果表明,QKMH算法的F1值相较于基于曼哈顿距离的量子K-Means算法提高了10个百分点,相较于最新优化的基于欧氏距离的量子K-Means算法提高了4.6个百分点;同时经计算,QKMH算法时间复杂度比上述对比算法更低。

关键词: 量子机器学习, 量子算法, 量子K-Means算法, 汉明距离, Grover搜索算法

Abstract:

The K-Means algorithms typically utilize Euclidean distance to calculate the similarity between data points when dealing with large-scale heterogeneous data. However, this method has problems of low efficiency and high computational complexity. Inspired by the significant advantage of Hamming distance in handling data similarity calculation, a Quantum K-Means Hamming (QKMH) algorithm was proposed to calculate similarity. First, the data was prepared and made into quantum state, and the quantum Hamming distance was used to calculate similarity between the points to be clustered and the K cluster centers. Then, the Grover’s minimum search algorithm was improved to find the cluster center closest to the points to be clustered. Finally, these steps were repeated until the designated number of iterations was reached or the clustering centers no longer changed. Based on the quantum simulation computing framework QisKit, the proposed algorithm was validated on the MNIST handwritten digit dataset and compared with various traditional and improved methods. Experimental results show that the F1 score of the QKMH algorithm is improved by 10 percentage points compared with that of the Manhattan distance-based quantum K-Means algorithm and by 4.6 percentage points compared with that of the latest optimized Euclidean distance-based quantum K-Means algorithm, and the time complexity of the QKMH algorithm is lower than those of the above comparison algorithms.

Key words: quantum machine learning, quantum algorithm, quantum K-Means algorithm, Hamming distance, Grover’s search algorithm

中图分类号: