Journal of Computer Applications ›› 2005, Vol. 25 ›› Issue (05): 989-991.DOI: 10.3724/SP.J.1087.2005.0989

• Data mining • Previous Articles     Next Articles

New method of processing noise data in decision tree algorithm based on variable precision rough set model

QIAO Mei1,2, HAN Wen-xiu1   

  1. 1. School of Management, Tianjin University, Tianjin 300072, China; 2. Department of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300191, China
  • Online:2005-05-01 Published:2005-05-01

基于VPRS的决策树算法中处理噪音数据的新方法

乔梅1,2,韩文秀1   

  1. 1.天津大学管理学院;2.天津理工大学计算机科学与工程系
  • 基金资助:

    天津市教委高校科技发展基金资助项目(020714)

Abstract: Noise data is the main factor affecting the training efficiency and quality of decision tree. Present pruning methods can’t eliminate the effect of noise data on choosing test attribute of tree node. In order to solve the problem, a new method of processing noise data — predictive pruning method was presented based on Variable Precision Rough Set (VPRS)model, which eliminated the influence of noise data using variable precision positive area before the calculation of choosing the test attribute of tree node. By using the method to improve the ID3 algorithm, experiments show that the algorithm generates smaller decision tree and uses less training time than the algorithm using pre-pruning method.

Key words: decision tree, noise data, Variable Precision Rough Set (VPRS), predictive pruning

摘要:  噪音数据是影响决策树训练效率和结果集质量的重要因素。目前的树剪枝方法不能消除噪音数据对选择决策树测试节点属性的影响。为改变这种状况,基于变精度Rough集(VPRS)模型,提出了一个在决策树算法中处理噪音数据的新方法———预剪枝法,该方法在进行选择属性的计算之前基于变精度正区域求取属性修正的分类模式,来消除噪音数据的对选择属性以及生成叶节点的影响。利用该方法对基本ID3决策树算法进行了改进。分析和实验表明,与先剪枝方法相比,该方法能进一步减小决策树的规模和训练时间。

关键词: 决策树, 数据噪音, 变精度Rough集(VPRS), 预剪枝

CLC Number: