具备历史借鉴能力的软划分聚类模型

doi:10.11772/j.issn.1001-9081.2015.02.0435

计算机应用 ›› 2015, Vol. 35 ›› Issue (2): 435-439.DOI: 10.11772/j.issn.1001-9081.2015.02.0435

具备历史借鉴能力的软划分聚类模型

孙寿伟, 钱鹏江, 陈爱国, 蒋亦樟

江南大学数字媒体学院, 江苏无锡 214122

收稿日期:2014-09-22 修回日期:2014-11-12 发布日期:2015-02-12 出版日期:2015-02-10
通讯作者: 孙寿伟
作者简介:孙寿伟(1989-),男,江苏涟水人,硕士研究生,CCF会员,主要研究方向:模式识别、智能计算; 钱鹏江(1979-),男,江苏泰州人,副教授,博士,CCF会员,主要研究方向:模式识别、图像处理; 陈爱国(1975-),男,江苏靖江人,讲师,博士研究生,主要研究方向:人工智能与模式识别; 蒋亦樟(1989-),男,江苏无锡人,博士研究生,主要研究方向:模式识别、智能计算。
基金资助:
国家自然科学基金资助项目(61202311);江苏省自然科学基金资助项目(BK201221834);江苏省产学研前瞻性研究项目(BY2013015-02)。

Soft partition based clustering models with reference to historical knowledge

SUN Shouwei, QIAN Pengjiang, CHEN Aiguo, JIANG Yizhang

School of Digital Media, Jiangnan University, Wuxi Jiangsu 214122, China

Received:2014-09-22 Revised:2014-11-12 Online:2015-02-12 Published:2015-02-10

摘要/Abstract

摘要：

在数据稀少或失真等场景下,传统软划分聚类算法无法获得满意的聚类效果。为解决该问题,以极大熵聚类算法为基础,基于历史知识利用的途径,提出两种新的具备历史借鉴能力的软划分聚类模型(分别简称SPBC-RHK-1和SPBC-RHK-2)。SPBC-RHK-1是仅借鉴历史类中心的基础模型,SPBC-RHK-2则是以历史类中心和历史隶属度相融合为手段的高级模型。通过历史知识借鉴,两种模型的聚类有效性均得到有效提高,比较而言具备更高知识利用能力的SPBC-RHK-2模型在聚类有效性和鲁棒性上具有更好的表现。由于所用历史知识不暴露历史源数据,因此两种方法还具有良好的历史数据隐私保护效果。最后在模拟数据集和真实数据集上的实验验证了上述优点。

关键词: 软划分聚类算法, 信息缺失或失真, 历史知识, 知识利用, 隐私保护

Abstract:

Conventional soft partition based clustering algorithms usually cannot achieve desired clustering outcomes in the situations where the data are quite spare or distorted. To address this problem, based on maximum entropy clustering, by means of the strategy of historical knowledge learning, two novel soft partition based clustering models called SPBC-RHK-1 and SPBC-RHK-2 for short respectively were proposed. SPBC-RHK-1 is the basic model which only refers to the historical cluster centroids, whereas SPBC-RHK-2 is of advanced modality based on the combination of historical cluster centroids and historical memberships. In terms of the historical knowledge, the effectiveness of both algorithms was improved distinctly, and SPBC-RHK-2 method showed better effectiveness and robustness compared to the other method since its higher ability of utilizing knowledge. In addition, because the involved historical knowledge does not expose the historical raw data, both of these two approaches have good capacities of privacy protection for historical data. Finally, experiments were conducted on both artificial and real-world datasets to verify above merits.

Key words: soft partition based clustering algorithm, impure data, historical knowlege, knowlege learning, privacy protection

中图分类号:

TP181

孙寿伟, 钱鹏江, 陈爱国, 蒋亦樟. 具备历史借鉴能力的软划分聚类模型[J]. 计算机应用, 2015, 35(2): 435-439.

SUN Shouwei, QIAN Pengjiang, CHEN Aiguo, JIANG Yizhang. Soft partition based clustering models with reference to historical knowledge[J]. Journal of Computer Applications, 2015, 35(2): 435-439.

参考文献

[1] GORBAN A N, ZINOVYEV A Y. PCA and K-means decipher genome [M]//Principal Manifolds for Data Visualization and Dimension Reduction, Lecture Notes in Computational Science and Engineering Volume 58. Berlin: Springer, 2008: 309-323.
[2] ZADEH L A. Fuzzy sets [J]. Information and Control, 1965, 8(3): 338-353.
[3] BEZDEK J C. Pattern recognition with fuzzy objective function [M]. Norwell: Kluwer Academic Publishers, 1981: 95-154.
[4] ZHANG Y, DENG Z, WANG J, et al. Transfer generalized fuzzy c-means clustering algorithm with improved fuzzy partitions by leveraging knowledge [J]. Pattern Recognition and Artificial Intelligence, 2013, 26(10): 975-984. (蒋亦樟, 邓赵红, 王骏, 等. 基于知识利用的迁移学习一般化增强模糊划分聚类算法[J]. 模式识别与人工智能, 2013, 26(10): 975-984.)
[5] ZHANG M, YU J. Fuzzy partitional clustering algorithms [J]. Journal of Software, 2004, 15(6): 858-868. (张敏, 于剑. 基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868.)
[6] MIYAMOTO S, MUKAIDONO M. Fuzzy c-means as a regularization and maximum entropy approach[C]//IFSA 1997: Proceedings of the 1997 7th International Fuzzy Systems Association World Congress. Berlin: Springer, 1997: 86-92.
[7] LI R-P, MUKAIDONO M. Gaussian clustering method based on maximum-fuzzy-entropy interpretation [J]. Fuzzy Sets and Systems, 1999, 102(2): 253-258.
[8] ROSE K, GUREWTIZ E, FOX G. A deterministic annealing approach to clustering [J]. Pattern Recognition Letters, 1990, 11(9): 589-594.
[9] PAL N R, PAL K, BEZDEK J C. A mixed c-means clustering model [C]//Proceedings of the Sixth IEEE International Conference on Fuzzy Systems. Piscataway: IEEE, 1997, 1: 11-21.
[10] KARAYIANNIS N B. MECA: maximum entropy clustering algorithm [C]//Proceedings of the 1994 IEEE World Congress on Computational Intelligence: Proceedings of the Third IEEE Conference on Fuzzy Systems. Piscataway: IEEE, 1994,1: 630-635.
[11] YU J, SHI H, HUANG H, et al. Counterexamples to convergence theorem of maximum-entropy clustering algorithm [J]. Science in China Series F: Information Sciences, 2003, 46(5): 321-326.
[12] WANG S, Chung K F L, DENG Z, et al. Robust maximum entropy clustering algorithm with its labeling for outliers [J]. Soft Computing, 2006, 10(7): 555-563.
[13] TAO J, CHUNG F, WANG S. On minimum distribution discrepancy support vector machine for domain adaptation [J]. Pattern Recognition, 2012, 45(11): 3962-3984.
[14] ZHANG J, WANG S, WANG J. ESVM algorithm in transfer learning data classification [J]. Computer Engineering, 2012, 38(8): 173-176. (张建军, 王士同, 王骏. 迁移学习数据分类中的ESVM算法[J]. 计算机工程, 2012, 38(8): 173-176.)
[15] DENG Z, CHOI K-S, CHUNG F-L, et al. Enhanced soft subspace clustering integrating within-cluster and between-cluster information [J]. Pattern Recognition.2010, 43(3): 767-781.
[16] PAN S J, YANG Q. A survey on transfer learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359.
[17] TORREY L, SHAVLIK J, WALKER T, et al. Rule extraction for transfer learning [C]//Rule Extraction from Support Vector Machines: Studies in Computational Intelligence Volume 80. Berlin: Springer, 2000: 67-82.
[18] GU Q, ZHOU J. Learning the shared subspace for multi-task clustering and transductive transfer classification [C]//ICDM '09: Proceedings of the 9th IEEE International Conference on Data Mining. Piscataway: IEEE, 2009: 159-168.
[19] DAI W, YANG Q, XUE G, et al. Self-taught clustering [C]//ICML '08: Proceedings of the 25th International Conference on Machine Learning. New York: ACM, 2008: 200-207.
[20] JIANG W, CHUNG F-L. Transfer spectral clustering [C]//ECML PKDD'12: Proceedings of the 2012 European Conference on Machine Learning and Knowledge Discovery in Databases, LNCS 7524. Berlin: Springer, 2012, 2: 789-803.

具备历史借鉴能力的软划分聚类模型

Soft partition based clustering models with reference to historical knowledge

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈学斌, 任志强, 张宏扬. 联邦学习中的安全威胁与防御措施综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1663-1672.
[2]	刘沛骞, 王水莲, 申自浩, 王辉. 基于轨迹扰动和路网匹配的位置隐私保护算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1546-1554.
[3]	高改梅, 张瑾, 刘春霞, 党伟超, 白尚旺. 基于区块链与CP-ABE策略隐藏的众包测试任务隐私保护方案[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 811-818.
[4]	马海峰, 李玉霞, 薛庆水, 杨家海, 高永福. 用于实现区块链隐私保护的属性基加密方案[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 485-489.
[5]	王一帆, 林绍福, 李云江. 基于区块链和零知识证明的高速公路自由流收费方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3741-3750.
[6]	高瑞, 陈学斌, 张祖篡. 面向部分图更新的动态社交网络隐私发布方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3831-3838.
[7]	贾淼, 姚中原, 祝卫华, 高婷婷, 斯雪明, 邓翔. 零知识证明赋能区块链的进展与展望[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3669-3677.
[8]	王伊婷, 万武南, 张仕斌, 张金全, 秦智. 基于SM9算法的可链接环签名方案[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3709-3716.
[9]	梁静, 万武南, 张仕斌, 张金全, 秦智. 面向主从链的慈善系统溯源存储模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3751-3758.
[10]	方鹏, 赵凡, 王保全, 王轶, 蒋同海. 区块链3.0的发展、技术与应用[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3647-3657.
[11]	陈学斌, 屈昌盛. 面向联邦学习的后门攻击与防御综述[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3459-3469.
[12]	周辉, 陈玉玲, 王学伟, 张洋文, 何建江. 基于生成对抗网络的联邦学习深度影子防御方案[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 223-232.
[13]	崔剑阳, 蔡英, 张宇, 范艳芳. 车载自组织网络中格基签密的可认证隐私保护方案[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 233-241.
[14]	黄硕, 李艳辉, 曹建秋. 本地化差分隐私下的频繁序列模式挖掘算法PrivSPM[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2057-2064.
[15]	蓝梦婕, 蔡剑平, 孙岚. 非独立同分布数据下的自正则化联邦学习优化方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2073-2081.