计算机应用 ›› 2005, Vol. 25 ›› Issue (05): 1026-1028.DOI: 10.3724/SP.J.2005.1026

• 人工智能与仿真 • 上一篇    下一篇

一种基于CHI值特征选取的粗糙集文本分类规则抽取方法

王明春1,2,王正欧1,张楷2,郝玺龙3   

  1. 1.天津大学系统工程研究所; 2.天津工程师范学院数理系; 3.天津海量软件公司
  • 出版日期:2005-05-01 发布日期:2005-05-25
  • 基金资助:

    国家自然科学基金资助项目(60275020)

Rough set text classification rule extraction based on CHI value

WANG Ming-chun1,2, WANG Zheng-ou1,ZHANG Kai2,HAO Xi-long3   

  1. 1. Institute of Systems Engineering, Tianjin University, Tianjin 300072, China; 2. Department of Mathematics and Physics, Tianjin University of Education and Technology, Tianjin 300222,China; 3. Tianjin Hylanda Software Corporation, Tianjin 300384,China
  • Online:2005-05-01 Published:2005-05-25

摘要: 结合文本分类规则抽取的特点,给出了近似规则的定义。该方法首先利用CHI值进行特征选取并为下一步特征选取提供特征重要性信息,然后使用粗糙集对离散决策表继续进行特征选取,最后用粗糙集抽取出精确规则或近似规则。该方法将CHI值特征选取和粗糙集理论充分结合,避免了用粗糙集对大规模决策表进行特征约简,同时避免了决策表的离散化。该方法提高了文本规则抽取的效率,并使其更趋实用化。实验结果表明了这种方法的有效性和实用性。

关键词: CHI值, 特征选取, 粗糙集, 文本分类规则

Abstract: The definition of proximate rule was proposed based on the characteristic of text classification rule extraction. Based on the CHI values, the features of text set were selected firstly and feature significance information was provided to the further feature selection. Then rough set was used to select further the attributes on the discrete decision table. Finally precise rules or proximate rules were extracted using rough set theory. The method combined CHI value feature selection and rough set theory fully so as to avoid both feature reduction on a large scale decision table and the discretization of the decision table. The method improved the effectiveness and the practicability of extracting text rule greatly. Experiment results demonstrate the effectiveness of the method.

Key words: CHI value, feature selection, rough set, text classification rule

中图分类号: