Journal of Computer Applications ›› 2005, Vol. 25 ›› Issue (07): 1634-1637.DOI: 10.3724/SP.J.1087.2005.01634

• Artificial intelligence • Previous Articles     Next Articles

Text categorization rule extraction based on fuzzy decision tree

WANG Yu1,2, WANG Zheng-ou1   

  1. 1. Institute of Systems Engineering, Tianjin University;
    2. School of Mathematics and Computer Science, Hebei University
  • Received:2005-01-14 Revised:2005-02-25 Online:2011-04-22 Published:2005-07-01

基于模糊决策树的文本分类规则抽取

王煜1,2,王正欧1   

  1. 1.天津大学 系统工程研究所,天津 300072; 2.河北大学 数学与计算机学院,河北 保定 071002
  • 作者简介:王煜(1971-),女,河北保定人,讲师,博士研究生,主要研究方向:文本挖掘;王正欧(1938-),男,上海人,教授,博士生导师,主要研究方向:神经网络、数据挖掘、知识发现.
  • 基金资助:

    国家自然科学基金资助项目(60275020)

Abstract:

A new method was presented, which extracted similar text categorization rule by a fuzzy decision tree merging some branches.   χ2  statistic was analyzed and improved. The new method converged features of text in terms of the improved   χ2  statistic, and so largely reduced the dimension of the vector space. And then, the fuzzy decision tree was applied to text categorization. The number of categorization rule was reduced largely by merging some branches. Both the understandable categorization rules extraction and better accuracy of categorization can be acquired.

Key words: similar text categorization;, rule extraction, χ2 statistic, fuzzy decision tree

摘要:

提出一种合并分枝的模糊决策树文本分类方法对相似文本类进行分类,并可抽取出分类精度较高的模糊分类规则。首先研究改进了的χ2统计量,并根据改进的χ2统计量对文本的特征词条进行聚合,有效地降低了文本向量空间的维数。然后使用一种合并分枝的模糊决策树进行分类,大大减少了抽取的规则数量。从而既保证了决策树分类的精度和速度,又可抽取出可理解的模糊分类规则。

关键词:  相似文本分类, 规则抽取, χ2统计量, 模糊决策树

CLC Number: