计算机应用 ›› 2021, Vol. 41 ›› Issue (10): 2785-2792.DOI: 10.11772/j.issn.1001-9081.2020122006

所属专题: 人工智能

• 人工智能 •    下一篇

基于模糊优势互补互信息的有序决策树算法

王雅辉1,2, 钱宇华1,2,3, 刘郭庆1,2   

  1. 1. 山西大学 大数据科学与产业研究院, 太原 030006;
    2. 山西大学 计算机与信息技术学院, 太原 030006;
    3. 计算智能与中文信息处理教育部重点实验室(山西大学), 太原 030006
  • 收稿日期:2020-12-21 修回日期:2021-04-29 出版日期:2021-10-10 发布日期:2021-07-14
  • 通讯作者: 钱宇华
  • 作者简介:王雅辉(1995-),女,山西临汾人,硕士研究生,CCF会员,主要研究方向:机器学习;钱宇华(1976-),男,山西晋城人,教授,博士,CCF会员,主要研究方向:模式识别、特征选择、粗糙集理论、粒计算、人工智能;刘郭庆(1994-),女,山西临汾人,硕士研究生,主要研究方向:强化学习。
  • 基金资助:
    国家自然科学基金面上项目(61672332);山西省拔尖创新人才支持计划项目(02150116072021);山西省重点研发计划(国际科技合作)项目(201903D421003);山西省三晋学者项目(2016769);山西省回国留学人员科研资助项目(2017-023)。

Ordinal decision tree algorithm based on fuzzy advantage complementary mutual information

WANG Yahui1,2, QIAN Yuhua1,2,3, LIU Guoqing1,2   

  1. 1. Institute of Big Data Science and Industry, Shanxi University, Taiyuan Shanxi 030006, China;
    2. School of Computer and Information Technology, Shanxi University, Taiyuan Shanxi 030006, China;
    3. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education(Shanxi University), Taiyuan Shanxi 030006, China
  • Received:2020-12-21 Revised:2021-04-29 Online:2021-10-10 Published:2021-07-14
  • Supported by:
    This work is partially supported by the Surface Program of National Natural Science Foundation of China (61672332), the Top Innovative Talents Support Program of Shanxi Province (02150116072021), the Key Research and Development Program of Shanxi Province (International Science and Technology Cooperation) (201903D421003), the San Jin Scholars Project of Shanxi Province (2016769), the Funded Scientific Research Project for Returned Overseas Students of Shanxi Province (2017-023).

摘要: 传统决策树算法应用于有序分类任务时存在两个问题:传统决策树算法没有引入序关系,因此无法学习和抽取数据集中的序结构;现实生活中存在大量模糊而非精确的知识,而传统的决策树算法无法处理存在模糊属性取值的数据。针对上述问题,提出了基于模糊优势互补互信息的有序决策树算法。首先,使用优势集表示数据中的序关系,并引入模糊集来计算优势集以形成模糊优势集。模糊优势集不仅能反映数据中的序信息,而且能自动获取不精确知识。然后,在模糊优势集的基础上将互补互信息进行推广,并提出了模糊优势互补互信息。最后,使用模糊优势互补互信息作为启发式,设计出基于模糊优势互补互信息的有序决策树算法。在5个人工数据集及9个现实数据集上的实验结果表明,所提算法在有序分类任务上较经典决策树算法取得了更低的分类误差。

关键词: 机器学习, 决策树算法, 有序分类, 模糊数学, 优势集

Abstract: When the traditional decision tree algorithm is applied to the ordinal classification task, there are two problems:the traditional decision tree algorithm does not introduce the order relation, so it cannot learn and extract the order structure of the dataset; in real life, there is a lot of fuzzy but not exact knowledge, however the traditional decision tree algorithm cannot deal with the data with fuzzy attribute value. To solve these problems, an ordinal decision tree algorithm based on fuzzy advantage complementary mutual information was proposed. Firstly, the dominant set was used to represent the order relations in the data, and the fuzzy set was introduced to calculate the dominant set for forming a fuzzy dominant set. The fuzzy dominant set was able to not only reflect the order information in the data, but also obtain the inaccurate knowledge automatically. Then, the complementary mutual information was generalized on the basis of fuzzy dominant set, and the fuzzy advantage complementary mutual information was proposed. Finally, the fuzzy advantage complementary mutual information was used as a heuristic method, and an decision tree algorithm based on fuzzy advantage complementary mutual information was designed. Experimental results on 5 synthetic datasets and 9 real datasets show that, the proposed algorithm has less classification errors compared with the classical decision tree algorithm on the ordinal classification tasks.

Key words: machine learning, decision tree algorithm, ordinal classification, fuzzy mathematics, dominant set

中图分类号: