《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (2): 382-388.DOI: 10.11772/j.issn.1001-9081.2021071168

• 人工智能 • 上一篇    

变精度邻域等价粒的邻域决策树构造算法

谢鑫1,2, 张贤勇1,2(), 王旋晔1,2, 唐鹏飞1,2   

  1. 1.四川师范大学 数学科学学院,成都 610068
    2.四川师范大学 智能信息与量子信息研究所,成都 610068
  • 收稿日期:2021-07-07 修回日期:2021-08-09 接受日期:2021-08-10 发布日期:2021-08-09 出版日期:2022-02-10
  • 通讯作者: 张贤勇
  • 作者简介:谢鑫(1996—),男,四川资中人,硕士研究生,主要研究方向:不确定性机器学习;
    张贤勇(1978—),男,四川宜宾人,教授,博士生导师,博士,主要研究方向:不确定性分析、智能计算、机器学习;
    王旋晔(1995—),男,四川达州人,硕士研究生,主要研究方向:不确定性机器学习;
    唐鹏飞(1996—),男,重庆人,硕士研究生,主要研究方向:粗糙集、粒计算。
  • 基金资助:
    国家自然科学基金资助项目(61673258);四川省科技计划项目(2021YJ0085)

Neighborhood decision tree construction algorithm based on variable-precision neighborhood equivalent granules

Xin XIE1,2, Xianyong ZHANG1,2(), Xuanye WANG1,2, Pengfei TANG1,2   

  1. 1.School of Mathematical Sciences,Sichuan Normal University,Chengdu Sichuan 610068,China
    2.Institute of Intelligent Information and Quantum Information,Sichuan Normal University,Chengdu Sichuan 610068,China
  • Received:2021-07-07 Revised:2021-08-09 Accepted:2021-08-10 Online:2021-08-09 Published:2022-02-10
  • Contact: Xianyong ZHANG
  • About author:XIE Xin, born in 1996, M. S. candidate. His research interests include uncertain machine learning.
    ZHANG Xianyong, born in 1978, Ph. D., professor. His research interests include uncertainty analysis, intelligent computing, machine learning.
    WANG Xuanye, born in 1995, M. S. candidate. His research interests include uncertain machine learning.
    TANG Pengfei, born in 1996, M. S. candidate. His research interests include rough set, granular computing.
  • Supported by:
    National Natural Science Foundation of China(61673258);Sichuan Science and Technology Program(2021YJ0085)

摘要:

针对现有决策树算法对连续性数据分类的信息丢失、效果不佳等缺点,提出一种邻域决策树(NDT)构造算法。首先,挖掘了邻域决策信息系统上的变精度邻域等价粒,并探讨了相关性质;然后基于变精度邻域等价粒构建邻域基尼指数度量,以度量邻域决策信息系统的不确定性;最后,用邻域基尼指数度量诱导出树节点的选取条件,并以变精度邻域等价粒为树分裂规则,从而构建NDT。在UCI数据集进行实验的结果表明,NDT算法的准确度比基于信息熵的决策树算法ID3、基于基尼指数的决策树算法CART、基于信息增益率的决策树(C4.5)算法和融合信息增益和基尼指数(IGGI)算法平均提高了20个百分点左右,验证了NDT算法的有效性。

关键词: 不确定性度量, 基尼指数, 邻域决策信息系统, 决策树, 机器学习

Abstract:

Aiming at the shortcomings such as information loss and poor effect of the existing decision tree algorithms for continuous data classification, a Neighborhood Decision Tree (NDT) construction algorithm was proposed. Firstly, the variable-precision neighborhood equivalent granules on the neighborhood decision information system were mined, and the related properties were discussed. Secondly, the neighborhood Gini index measure was constructed based on the variable-precision neighborhood equivalent granules to measure the uncertainty of the neighborhood decision information system. Finally, the neighborhood Gini index measure was used to induce the tree node selection conditions, and the variable-precision neighborhood equivalent granules were used as the tree splitting rules to construct NDT. Experimental results on UCI datasets show that the accuracy of NDT algorithm is generally improved by about 20 percentage points compared with those of Iterative Dichotomiser 3 (ID3) algorithm, Classification And Regression Tree (CART) algorithm, C4.5 algorithm and combining Information Gain and Gini Index (IGGI) algorithm, indicating that the proposed NDT algorithm is effective.

Key words: uncertainty measurement, Gini index, neighborhood decision information system, decision tree, machine learning

中图分类号: