计算机应用 ›› 2015, Vol. 35 ›› Issue (2): 435-439.DOI: 10.11772/j.issn.1001-9081.2015.02.0435

• 人工智能 • 上一篇    下一篇

具备历史借鉴能力的软划分聚类模型

孙寿伟, 钱鹏江, 陈爱国, 蒋亦樟   

  1. 江南大学 数字媒体学院, 江苏 无锡 214122
  • 收稿日期:2014-09-22 修回日期:2014-11-12 出版日期:2015-02-10 发布日期:2015-02-12
  • 通讯作者: 孙寿伟
  • 作者简介:孙寿伟(1989-),男,江苏涟水人,硕士研究生,CCF会员,主要研究方向:模式识别、智能计算; 钱鹏江(1979-),男,江苏泰州人,副教授,博士,CCF会员,主要研究方向:模式识别、图像处理; 陈爱国(1975-),男,江苏靖江人,讲师,博士研究生,主要研究方向:人工智能与模式识别; 蒋亦樟(1989-),男,江苏无锡人,博士研究生,主要研究方向:模式识别、智能计算。
  • 基金资助:

    国家自然科学基金资助项目(61202311);江苏省自然科学基金资助项目(BK201221834);江苏省产学研前瞻性研究项目(BY2013015-02)。

Soft partition based clustering models with reference to historical knowledge

SUN Shouwei, QIAN Pengjiang, CHEN Aiguo, JIANG Yizhang   

  1. School of Digital Media, Jiangnan University, Wuxi Jiangsu 214122, China
  • Received:2014-09-22 Revised:2014-11-12 Online:2015-02-10 Published:2015-02-12

摘要:

在数据稀少或失真等场景下,传统软划分聚类算法无法获得满意的聚类效果。为解决该问题,以极大熵聚类算法为基础,基于历史知识利用的途径,提出两种新的具备历史借鉴能力的软划分聚类模型(分别简称SPBC-RHK-1和SPBC-RHK-2)。SPBC-RHK-1是仅借鉴历史类中心的基础模型,SPBC-RHK-2则是以历史类中心和历史隶属度相融合为手段的高级模型。通过历史知识借鉴,两种模型的聚类有效性均得到有效提高,比较而言具备更高知识利用能力的SPBC-RHK-2模型在聚类有效性和鲁棒性上具有更好的表现。由于所用历史知识不暴露历史源数据,因此两种方法还具有良好的历史数据隐私保护效果。最后在模拟数据集和真实数据集上的实验验证了上述优点。

关键词: 软划分聚类算法, 信息缺失或失真, 历史知识, 知识利用, 隐私保护

Abstract:

Conventional soft partition based clustering algorithms usually cannot achieve desired clustering outcomes in the situations where the data are quite spare or distorted. To address this problem, based on maximum entropy clustering, by means of the strategy of historical knowledge learning, two novel soft partition based clustering models called SPBC-RHK-1 and SPBC-RHK-2 for short respectively were proposed. SPBC-RHK-1 is the basic model which only refers to the historical cluster centroids, whereas SPBC-RHK-2 is of advanced modality based on the combination of historical cluster centroids and historical memberships. In terms of the historical knowledge, the effectiveness of both algorithms was improved distinctly, and SPBC-RHK-2 method showed better effectiveness and robustness compared to the other method since its higher ability of utilizing knowledge. In addition, because the involved historical knowledge does not expose the historical raw data, both of these two approaches have good capacities of privacy protection for historical data. Finally, experiments were conducted on both artificial and real-world datasets to verify above merits.

Key words: soft partition based clustering algorithm, impure data, historical knowlege, knowlege learning, privacy protection

中图分类号: