迭代直觉模糊K-modes算法

doi:10.11772/j.issn.1001-9081.2021030383

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (2): 375-381.DOI: 10.11772/j.issn.1001-9081.2021030383

• 人工智能 • 上一篇

迭代直觉模糊K-modes算法

陈育丹, 高翠芳(), 沈莞蔷, 殷萍

江南大学理学院，江苏无锡 214122

收稿日期:2021-03-15 修回日期:2021-07-02 接受日期:2021-07-02 发布日期:2022-02-21 出版日期:2022-02-10
通讯作者: 高翠芳
作者简介:陈育丹（1998—），女，江西赣州人，硕士研究生，主要研究方向：计算智能、模式识别；
高翠芳（1974—），女，河北石家庄人，副教授，博士，主要研究方向：模式识别、生物信息学；
沈莞蔷（1981—），女，江苏常州人，副教授，博士，主要研究方向：计算机图形学、模式识别；
殷萍（1981—），女，浙江嘉兴人，副教授，博士，主要研究方向：数值计算、模式识别。
基金资助:
国家自然科学基金资助项目(61772013)

Iterative intuitionistic fuzzy K-modes algorithm

Yudan CHEN, Cuifang GAO(), Wanqiang SHEN, Ping YIN

School of Science，Jiangnan University，Wuxi Jiangsu 214122，China

Received:2021-03-15 Revised:2021-07-02 Accepted:2021-07-02 Online:2022-02-21 Published:2022-02-10
Contact: Cuifang GAO
About author:CHEN Yudan， born in 1998， M. S. candidate. Her research interests include computational intelligence， pattern recognition.
GAO Cuifang， born in 1974， Ph. D.， associate professor. Her research interests include pattern recognition， bioinformatics.
SHEN Wanqiang， born in 1981， Ph. D.， associate professor. Her research interests include computer graphics， pattern recognition.
YIN Ping， born in 1981， Ph. D.， associate professor. Her research interests include numerical calculation， pattern recognition.
Supported by:
National Natural Science Foundation of China(61772013)

摘要/Abstract

摘要：

直觉模糊K-modes（IFKM）算法在聚类过程中采用简单0-1匹配相似性度量，既无法有效刻画类内数据对象之间的相似性，也未体现不同属性在聚类过程中的贡献程度；此外，IFKM算法在聚类的每一次迭代中直接根据直觉模糊隶属度矩阵来确定数据对象所属类别，没有充分发挥直觉模糊思想的作用。为了解决这两个问题，提出一种迭代IFKM （IIFKM）算法。首先，基于直觉模糊熵（IFE）与直觉模糊集（IFS）定义了一种加权的直觉模糊隶属度相似性度量；其次，将直觉模糊隶属度矩阵作为迭代信息贯穿于整个聚类过程，使算法中的直觉模糊思想得到充分体现。在UCI数据库的5个数据集上进行的实验结果表明，与IFKM算法相比，IIFKM算法在分类正确率和召回率方面提升了7%~11%，在分类精度方面也有一定提升。

关键词: 分类型数据聚类, 相似性度量, 直觉模糊K-modes算法, 直觉模糊集, 直觉模糊熵

Abstract:

Intuitionistic Fuzzy K-Modes （IFKM） algorithm adopts the simple 0-1 matching similarity measure in clustering process， which can not effectively describe the similarity of data objects in class， and fails to reflect the contribution of different attributes in clustering process. In addition， IFKM algorithm directly determines the classes of data objects according to the intuitionistic fuzzy membership matrix in each iteration of clustering， and do not give full play to the role of intuitionistic fuzziness idea. In order to solve these two problems， an Iterative IFKM （IIFKM） algorithm was proposed. Firstly， a weighted similarity measure of intuitionistic fuzzy membership degree was defined based on Intuitionistic Fuzzy Entropy（IFE） and Intuitionistic Fuzzy Set （IFS）. Secondly， the intuitionistic fuzzy membership matrix was used as iterative information in the whole clustering process， so that the intuitionistic fuzziness idea in the algorithm was fully reflected. Experimental results on 5 datasets from UCI database show that compared with IFKM algorithm， the proposed IIFKM algorithm can improve the accuracy and recall by 7%-11%， and can also improve the precision to some degree.

Key words: categorical data clustering, similarity measure, Intuitionistic Fuzzy K-Modes (IFKM) algorithm, Intuitionistic Fuzzy Set (IFS), Intuitionistic Fuzzy Entropy (IFE)

中图分类号:

TP391

陈育丹, 高翠芳, 沈莞蔷, 殷萍. 迭代直觉模糊K-modes算法[J]. 计算机应用, 2022, 42(2): 375-381.

Yudan CHEN, Cuifang GAO, Wanqiang SHEN, Ping YIN. Iterative intuitionistic fuzzy K-modes algorithm[J]. Journal of Computer Applications, 2022, 42(2): 375-381.

图/表 5

表1 数据集描述

Tab. 1 Description of datasets

数据集	对象数	属性数	类别数
Lung-cancer	32	56	3
Zoo	101	16	7
Dermatology	366	33	6
Breast-cancer	699	9	2
Mushroom	8 124	22	2

表2 IIFKM算法在不同β值时的AC、PR、RE

Tab. 2 AC， PR， RE of IIFKM algorithm with different values of β

数据集	$β$	AC	PR	RE
Lung-cancer	0.85	0.621 1	0.646 4	0.635 6
	0.95	0.6319	0.663 3	0.643 5
	1.05	0.611 5	0.655 2	0.620 7
	1.85	0.595 9	0.666 6	0.600 5
	2.00	0.578 1	0.648 0	0.584 9
	2.50	0.563 0	0.627 3	0.571 0
Zoo	0.85	0.871 9	0.857 7	0.677 2
	0.95	0.8731	0.857 5	0.682 0
	1.05	0.869 0	0.848 9	0.684 5
	1.85	0.859 6	0.847 1	0.662 2
	2.00	0.856 7	0.842 9	0.661 1
	2.50	0.844 6	0.836 1	0.633 0
Dermatology	0.85	0.708 5	0.732 5	0.576 9
	0.95	0.719 5	0.772 2	0.583 6
	1.05	0.735 8	0.799 4	0.601 4
	1.85	0.755 3	0.824 5	0.629 6
	2.00	0.7619	0.829 5	0.638 2
	2.50	0.755 7	0.820 3	0.637 5
Breast-cancer	0.85	0.888 5	0.917 8	0.844 5
	0.95	0.8886	0.917 9	0.844 6
	1.05	0.888 1	0.917 6	0.843 9
	1.85	0.888 3	0.917 7	0.844 1
	2.00	0.887 5	0.916 9	0.843 2
	2.50	0.887 3	0.916 8	0.843 0
Mushroom	0.85	0.717 7	0.723 3	0.715 0
	0.95	0.728 9	0.736 9	0.724 8
	1.05	0.732 0	0.742 3	0.728 0
	1.85	0.746 7	0.758 0	0.742 4
	2.00	0.7562	0.766 5	0.752 2
	2.50	0.738 5	0.748 8	0.734 2

表2 IIFKM算法在不同β值时的AC、PR、RE

Tab. 2 AC， PR， RE of IIFKM algorithm with different values of β

数据集	$β$	AC	PR	RE
Lung-cancer	0.85	0.621 1	0.646 4	0.635 6
	0.95	0.6319	0.663 3	0.643 5
	1.05	0.611 5	0.655 2	0.620 7
	1.85	0.595 9	0.666 6	0.600 5
	2.00	0.578 1	0.648 0	0.584 9
	2.50	0.563 0	0.627 3	0.571 0
Zoo	0.85	0.871 9	0.857 7	0.677 2
	0.95	0.8731	0.857 5	0.682 0
	1.05	0.869 0	0.848 9	0.684 5
	1.85	0.859 6	0.847 1	0.662 2
	2.00	0.856 7	0.842 9	0.661 1
	2.50	0.844 6	0.836 1	0.633 0
Dermatology	0.85	0.708 5	0.732 5	0.576 9
	0.95	0.719 5	0.772 2	0.583 6
	1.05	0.735 8	0.799 4	0.601 4
	1.85	0.755 3	0.824 5	0.629 6
	2.00	0.7619	0.829 5	0.638 2
	2.50	0.755 7	0.820 3	0.637 5
Breast-cancer	0.85	0.888 5	0.917 8	0.844 5
	0.95	0.8886	0.917 9	0.844 6
	1.05	0.888 1	0.917 6	0.843 9
	1.85	0.888 3	0.917 7	0.844 1
	2.00	0.887 5	0.916 9	0.843 2
	2.50	0.887 3	0.916 8	0.843 0
Mushroom	0.85	0.717 7	0.723 3	0.715 0
	0.95	0.728 9	0.736 9	0.724 8
	1.05	0.732 0	0.742 3	0.728 0
	1.85	0.746 7	0.758 0	0.742 4
	2.00	0.7562	0.766 5	0.752 2
	2.50	0.738 5	0.748 8	0.734 2

表3 五种算法的实验结果

Tab. 3 Experimental results of five algorithms

数据集	算法	AC	PR	RE
Lung-cancer	KM	0.578 9	0.627 0	0.592 6
	FKM	0.610 0	0.648 5	0.623 1
	IFKM	0.593 0	0.657 1	0.603 9
	NDFKM	0.594 1	0.651 3	0.604 6
	IIFKM	0.6319	0.6633	0.6435
Zoo	KM	0.846 4	0.847 5	0.644 7
	FKM	0.841 4	0.845 8	0.640 0
	IFKM	0.843 7	0.858 9	0.644 8
	NDFKM	0.861 2	0.847 6	0.680 7
	IIFKM	0.8731	0.8575	0.6820
Dermatology	KM	0.665 6	0.749 2	0.551 0
	FKM	0.674 1	0.730 9	0.554 4
	IFKM	0.686 9	0.777 4	0.578 3
	NDFKM	0.738 2	0.820 7	0.599 5
	IIFKM	0.7619	0.8295	0.6382
Breast-cancer	KM	0.822 1	0.849 9	0.751 2
	FKM	0.823 8	0.849 5	0.752 3
	IFKM	0.831 6	0.852 3	0.764 9
	NDFKM	0.868 2	0.907 9	0.814 5
	IIFKM	0.8886	0.9179	0.8446
Mushroom	KM	0.689 8	0.7075	0.6859
	FKM	0.692 1	0.706 7	0.687 8
	IFKM	0.713 7	0.739 2	0.709 1
	NDFKM	0.710 2	0.725 5	0.705 3
	IIFKM	0.7562	0.7665	0.7522

图1 各算法在5个数据集上的AC箱型图

Fig. 1 AC box plots of each algorithms on 5 datasets

表4 IFKM算法和IIFKM0算法的实验结果对比

Tab. 4 Comparison of experimental results of IFKM algorithm and IIFKM0 algorithm

数据集	算法	AC	PR	RE
Lung-cancer	IFKM	0.5930	0.6571	0.6039
Lung-cancer	IIFKM0	0.6167	0.6541	0.6297
Zoo	IFKM	0.8437	0.8589	0.6448
Zoo	IIFKM0	0.8507	0.8514	0.6672
Dermatology	IFKM	0.6869	0.7774	0.5783
Dermatology	IIFKM0	0.7029	0.7941	0.5924
Breast-cancer	IFKM	0.8316	0.8523	0.7649
Breast-cancer	IIFKM0	0.8379	0.8611	0.7761
Mushroom	IFKM	0.7137	0.7392	0.7091
Mushroom	IIFKM0	0.7390	0.7641	0.7342

参考文献 37

1	HUANG Z X. Extensions to the k-means algorithm for clustering large data sets with categorical values［J］. Data Mining and Knowledge Discovery， 1998， 2（3）： 283-304. 10.1023/a:1009769707641
2	HUANG Z X， NG M K. A fuzzy k-modes algorithm for clustering categorical data［J］. IEEE Transactions on Fuzzy Systems， 1999， 7（4）： 446-452. 10.1109/91.784206
3	KIM D W， LEE K H， LEE D. Fuzzy clustering of categorical data using fuzzy centroids［J］. Pattern Recognition Letters， 2004， 25（11）： 1263-1271. 10.1016/j.patrec.2004.04.004
4	BAI L， LIANG J Y， DANG C Y， et al. A novel fuzzy clustering algorithm with between-cluster information for categorical data［J］. Fuzzy Sets and Systems， 2013， 215： 55-73. 10.1016/j.fss.2012.06.005
5	SAHA A， DAS S. Categorical fuzzy k-modes clustering with automated feature weight learning［J］. Neurocomputing， 2015， 166： 422-435. 10.1016/j.neucom.2015.03.037
6	白亮，曹付元，梁吉业.基于新的相异度量的模糊K-Modes聚类算法［J］.计算机工程， 2009， 35（16）： 192-194. 10.3969/j.issn.1000-3428.2009.16.069
	BAI L， CAO F Y， LIANG J Y. Fuzzy K-modes clustering algorithm based on new dissimilarity measure［J］. Computer Engineering， 2009， 35（16）： 192-194. 10.3969/j.issn.1000-3428.2009.16.069
7	CAO F Y， LIANG J Y， BAI L. A new initialization method for categorical data clustering［J］. Expert Systems with Applications， 2009， 36（7）： 10223-10228. 10.1016/j.eswa.2009.01.060
8	BAI L， LIANG J Y， DANG C Y， et al. A cluster centers initialization method for clustering categorical data［J］. Expert Systems with Applications， 2012， 39（9）： 8022-8029. 10.1016/j.eswa.2012.01.131
9	JIANG F， LIU G Z， DU J W， et al. Initialization of K-modes clustering using outlier detection techniques［J］. Information Sciences， 2016， 332： 167-183. 10.1016/j.ins.2015.11.005
10	KUMAR A， KUMAR S. A support based initialization algorithm for categorical data clustering［J］. Journal of Information Technology Research， 2018， 11（2）： 53-67. 10.4018/jitr.2018040104
11	WANG C， DONG X J， ZHOU F， et al. Coupled attribute similarity learning on categorical data［J］. IEEE Transactions on Neural Networks and Learning Systems， 2015， 26（4）： 781-797. 10.1109/tnnls.2014.2325872
12	JIAN S L， CAO L B， LU K， et al. Unsupervised coupled metric similarity for non-IID categorical data［J］. IEEE Transactions on Knowledge and Data Engineering， 2018， 30（9）： 1810-1823. 10.1109/tkde.2018.2808532
13	JIA H， CHEUNG Y M， LIU J M. A new distance metric for unsupervised learning of categorical data［J］. IEEE Transactions on Neural Networks and Learning Systems， 2016， 27（5）： 1065-1079. 10.1109/tnnls.2015.2436432
14	JIAN S L， PANG G S， CAO L B， et al. CURE： flexible categorical data representation by hierarchical coupling learning［J］. IEEE Transactions on Knowledge and Data Engineering， 2019， 31（5）： 853-866. 10.1109/tkde.2018.2848902
15	GAN G J， WU J H， YANG Z J. A genetic fuzzy k-modes algorithm for clustering categorical data［J］. Expert Systems with Applications， 2009， 36（2Pt 1）： 1615-1620. 10.1016/j.eswa.2007.11.045
16	YANG C L， KUO R J， CHIEN C H， et al. Non-dominated sorting genetic algorithm using fuzzy membership chromosome for categorical data clustering［J］. Applied Soft Computing， 2015， 30： 113-122. 10.1016/j.asoc.2015.01.031
17	NGUYEN T P Q， KUO R J. Partition-and-merge based fuzzy genetic clustering algorithm for categorical data［J］. Applied Soft Computing， 2019， 75： 254-264. 10.1016/j.asoc.2018.11.028
18	DUTTA D， SIL J， DUTTA P. Automatic clustering by multi-objective genetic algorithm with numeric and categorical features［J］. Expert Systems with Applications， 2019， 137： 357-379. 10.1016/j.eswa.2019.06.056
19	KAUR P. Intuitionistic fuzzy sets based credibilistic fuzzy c-means clustering for medical image segmentation［J］. International Journal of Information Technology， 2017， 9（4）： 345-351. 10.1007/s41870-017-0039-2
20	CHAIRA T. A novel intuitionistic fuzzy C means clustering algorithm and its application to medical images［J］. Applied Soft Computing， 2011， 11（2）： 1711-1717. 10.1016/j.asoc.2010.05.005
21	KUO R J， LIN T C， ZULVIA F E， et al. A hybrid metaheuristic and kernel intuitionistic fuzzy c-means algorithm for cluster analysis［J］. Applied Soft Computing， 2018， 67： 299-308. 10.1016/j.asoc.2018.02.039
22	VERMA H， GUPTA A， KUMAR D. A modified intuitionistic fuzzy c-means algorithm incorporating hesitation degree［J］. Pattern Recognition Letters， 2019， 122： 45-52. 10.1016/j.patrec.2019.02.017
23	KUMAR D， VERMA H， MEHRA A， et al. A modified intuitionistic fuzzy c-means clustering approach to segment human brain MRI image［J］. Multimedia Tools and Applications， 2019， 78（10）： 12663-12687. 10.1007/s11042-018-5954-0
24	SARKAR J P， SAHA I， CHAKRABORTY S， et al. Machine learning integrated credibilistic semi supervised clustering for categorical data［J］. Applied Soft Computing， 2020， 86： No.105871. 10.1016/j.asoc.2019.105871
25	KUO R J， NGUYEN T P Q. Genetic intuitionistic weighted fuzzy k-modes algorithm for categorical data［J］. Neurocomputing， 2019， 330： 116-126. 10.1016/j.neucom.2018.11.016
26	GOYAL A， SOURAV P A， KALYANARAMAN P. Application of genetic algorithm based intuitionistic fuzzy k-mode for clustering categorical data［J］. Cybernetics and Information Technologies， 2017， 17（4）： 99-113. 10.1515/cait-2017-0044
27	YAGER R R. On the measure of fuzziness and negation part I： membership in the unit interval［J］. International Journal of General Systems， 1979， 5（4）： 221-229. 10.1080/03081077908547452
28	SUGENO M. Fuzzy measures and fuzzy integrals — a survey［M］// DUBOIS D， PRADE H， YAGER R R. Readings in Fuzzy Sets for Intelligent Systems. San Francisco： Morgan Kaufmann， 1993： 251-257. 10.1016/b978-1-4832-1450-4.50027-4
29	ATANASSOV K T. Intuitionistic fuzzy sets［J］. Fuzzy Sets and Systems， 1986， 20（1）： 87-96. 10.1016/s0165-0114(86)80034-3
30	HE Z Y， XU X F， DENG S C. Squeezer： an efficient algorithm for clustering categorical data［J］. Journal of Computer Science and Technology， 2002， 17（5）： 611-624. 10.1007/bf02948829
31	HE Z Y， XU X F， DENG S C. Scalable algorithms for clustering large datasets with mixed type attributes［J］. International Journal of Intelligent Systems， 2005， 20（10）： 1077-1089. 10.1002/int.20108
32	TSEKOURAS G E， PAPAGEORGIOU D， KOTSIANTIS S， et al. Fuzzy clustering of categorical attributes and its use in analyzing cultural data［J］. International Journal of Computational Intelligence， 2004， 1（2）： 147-151. 10.1109/icccyb.2005.1511553
33	NG M K， LI M J， HUANG J Z， et al. On the impact of dissimilarity measure in k-modes clustering algorithm［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2007， 29（3）： 503-507. 10.1109/tpami.2007.53
34	HSU C C， CHEN C L， SU Y W. Hierarchical clustering of mixed data based on distance hierarchy［J］. Information Sciences， 2007， 177（20）： 4474-4492. 10.1016/j.ins.2007.05.003
35	BASAK J， KRISHNAPURAM R. Interpretable hierarchical clustering by constructing an unsupervised decision tree［J］. IEEE Transactions on Knowledge and Data Engineering， 2005， 17（1）： 121-132. 10.1109/tkde.2005.11
36	CHEUNG Y M， JIA H. A unified metric for categorical and numerical attributes in data clustering ［C］// Proceedings of the 2013 Pacific-Asia Conference on Knowledge Discovery and Data Mining， LNCS7819. Berlin： Springer， 2013： 135-146.
37	YANG Y M. An evaluation of statistical approaches to text categorization［J］. Journal of Information Retrieval， 1999， 1（1/2）： 69-90. 10.1023/a:1009982220290

[1]	张林发, 张榆锋, 王琨, 李支尧. 基于直觉模糊集和亮度增强的医学图像融合[J]. 计算机应用, 2021, 41(7): 2082-2091.
[2]	杨蒙蒙, 张爱华. 基于灰度共生矩阵和同步正交匹配追踪的分形图像压缩[J]. 计算机应用, 2021, 41(5): 1445-1449.
[3]	胡立华, 左威健, 聂瑶瑶. 基于加权相似性度量的特征匹配方法[J]. 计算机应用, 2021, 41(2): 511-516.
[4]	周玉彬, 肖红, 王涛, 姜文超, 熊梦, 贺忠堂. 基于动作周期退化相似性度量的机械轴健康指标构建与剩余寿命预测[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3192-3199.
[5]	马伟苹, 李文新, 孙晋川, 曹鹏霞. 基于粗精立体匹配的双目视觉目标定位方法[J]. 计算机应用, 2020, 40(1): 227-232.
[6]	章永来, 周耀鉴. 聚类算法综述[J]. 计算机应用, 2019, 39(7): 1869-1882.
[7]	姜逸凡, 叶青. 基于孪生神经网络的时间序列相似性度量[J]. 计算机应用, 2019, 39(4): 1041-1045.
[8]	常炳国, 臧虹颖. 基于分段降维和路径修正DTW的时序特征分类器设计[J]. 计算机应用, 2018, 38(7): 1910-1915.
[9]	鲍舒婷, 孙丽萍, 郑孝遥, 郭良敏. 基于共享近邻相似度的密度峰聚类算法[J]. 计算机应用, 2018, 38(6): 1601-1607.
[10]	徐苏, 周颖玥. 基于图像分割的非局部均值去噪算法[J]. 计算机应用, 2017, 37(7): 2078-2083.
[11]	邓达平, 谢小云, 郭子璇. 直觉模糊环境下考虑有限理性特征的决策方法[J]. 计算机应用, 2017, 37(5): 1376-1381.
[12]	于金明, 孟军, 吴秋峰. 基于改进相似性度量的项目协同过滤推荐算法[J]. 计算机应用, 2017, 37(5): 1387-1391.
[13]	石陆魁, 张延茹, 张欣. 基于时空模式的轨迹数据聚类算法[J]. 计算机应用, 2017, 37(3): 854-859.
[14]	杨家慧, 刘方爱. 基于巴氏系数和Jaccard系数的协同过滤算法[J]. 计算机应用, 2016, 36(7): 2006-2010.
[15]	徐军. 集成直觉模糊信息的主观信任模型[J]. 计算机应用, 2016, 36(4): 937-940.

迭代直觉模糊K-modes算法

Iterative intuitionistic fuzzy K-modes algorithm

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 37

相关文章 15

编辑推荐

Metrics