• •    

基于密度的K-means算法在轨迹数据聚类中的优化

郝美薇,戴华林,郝琨   

  1. 天津城建大学计算机与信息工程学院
  • 收稿日期:2017-04-14 修回日期:2017-06-14 发布日期:2017-06-14
  • 通讯作者: 郝美薇

Optimization of density-based K-means algorithm in trajectory data clustering

  • Received:2017-04-14 Revised:2017-06-14 Online:2017-06-14
  • Contact: Mei-Wei HAO

摘要: 针对传统的K-means算法无法预先明确聚类数目,对初始聚类中心选取敏感且易受离群孤点影响导致聚类结果稳定性和准确性欠佳的问题,提出了一种改进的基于密度的K-means算法。该算法首先基于轨迹数据分布密度和增加轨迹数据关键点密度权值的方式选取高密度的轨迹数据点作为初始聚类中心进行K-means聚类,然后结合聚类有效函数类内类外划分指标对聚类结果进行评价,最后根据评价确定最佳聚类数目和最优聚类划分。理论研究与实验结果表明,该算法能够更好的提取轨迹关键点,保留关键路径信息,且与传统的K-means算法相比,聚类准确性提高了28%,与具有噪声的基于密度的聚类算法相比,聚类准确性提高了17%。所提算法在轨迹数据聚类中具有更好的稳定性和准确性。

关键词: K-means算法, 基于密度, 车辆活动特征, 密度权值, 初始聚类中心, 类内类外划分指标

Abstract: Since the traditional K-means algorithm hardly predefines the number of clusters, and performs sensitively to the initial clustering centers and outliers, which may result in unstable and inaccurate results, an improved density-based K-means algorithm was proposed. Firstly, high-density trajectory data points were selected as the initial clustering centers based on the density of the trajectory data distribution and increasing the weight of the density of important points, to perform K-means clustering. Secondly the clustering results were evaluated by the Between-Within Proportion standard of clustering efficient function. Finally, the optimal number of clusters and clustering were determined according to the clustering results evaluation. Theoretical researches and experimental results showed that the improved algorithm could be better at extracting the trajectory key and keeping the key path information. The accuracy of clustering results was 28% higher compared with the traditional K-means algorithm and 17% higher compared with the Density-Based Spatial Clustering of Applications with Noise algorithm. The proposed algorithm has a better stability and a higher accuracy in the trajectory data clustering.

Key words: K-means algorithm, based on density, characteristics of vehicle activity, weight of the density, initial clustering center, Between-Within Proportion(BWP) standard

中图分类号: