Journal of Computer Applications

• Typical applications • Previous Articles     Next Articles

Using Gini-Index for feature selection in text categorization

Yong-Min LIN Wei-Dong ZHU   

  • Received:2007-04-02 Revised:1900-01-01 Online:2007-10-01 Published:2007-10-01
  • Contact: Yong-Min LIN

基尼指数在文本特征选择中的应用研究

林永民 朱卫东   

  1. 河北理工大学 北京交通大学
  • 通讯作者: 林永民

Abstract: This paper used improved Gini-index for text feature selection, and constructed the measure function based on Gini-Index, then compared it to other four feature selection measures using two kinds of classifiers on two different document corpora. The result of experiments shows that its performance is comparable with other text feature selection approaches. However, it is perfect in the time complexity of algorithm.

Key words: text categorization, feature selection, Gini-Index, feature selection function

摘要: 使用基尼指数原理进行了文本特征选择的研究,构造了基于基尼指数的适合于文本特征选择的评估函数。结合fkNN和SVM两种不同的分类方法,在两个不同的语料集上,与其他著名的文本特征选择方法进行比较和分析实验,结果显示它的性能与现有的特征选择方法不相上下,但在算法时间复杂上获得了良好的性能。

关键词: 文本分类, 特征选择, 基尼指数, 特征评估函数