计算机应用

• 典型应用 • 上一篇    下一篇

基尼指数在文本特征选择中的应用研究

林永民 朱卫东   

  1. 河北理工大学 北京交通大学
  • 收稿日期:2007-04-02 修回日期:1900-01-01 发布日期:2007-10-01 出版日期:2007-10-01
  • 通讯作者: 林永民

Using Gini-Index for feature selection in text categorization

Yong-Min LIN Wei-Dong ZHU   

  • Received:2007-04-02 Revised:1900-01-01 Online:2007-10-01 Published:2007-10-01
  • Contact: Yong-Min LIN

摘要: 使用基尼指数原理进行了文本特征选择的研究,构造了基于基尼指数的适合于文本特征选择的评估函数。结合fkNN和SVM两种不同的分类方法,在两个不同的语料集上,与其他著名的文本特征选择方法进行比较和分析实验,结果显示它的性能与现有的特征选择方法不相上下,但在算法时间复杂上获得了良好的性能。

关键词: 文本分类, 特征选择, 基尼指数, 特征评估函数

Abstract: This paper used improved Gini-index for text feature selection, and constructed the measure function based on Gini-Index, then compared it to other four feature selection measures using two kinds of classifiers on two different document corpora. The result of experiments shows that its performance is comparable with other text feature selection approaches. However, it is perfect in the time complexity of algorithm.

Key words: text categorization, feature selection, Gini-Index, feature selection function