计算机应用 ›› 2005, Vol. 25 ›› Issue (03): 661-663.DOI: 10.3724/SP.J.1087.2005.0661

• 人工智能 • 上一篇    下一篇

文本分类中一种新的特征选择方法

王秀娟,郭军,郑康锋   

  1. 北京邮电大学信息工程学院
  • 发布日期:2005-03-01 出版日期:2005-03-01
  • 基金资助:

    国家自然科学基金资助项目(60475007)

A new feature selection method in text categorization

WANG Xiu-juan, GUO Jun,ZHENG Kang-feng   

  1. School of Information Engineering, Beijing University of Posts and Telecommunications
  • Online:2005-03-01 Published:2005-03-01

摘要: 在自动文本分类系统中,特征选择是有效降低文本向量维数的一种方法。在分析了常用的一些特征选择的评价函数的基础上,提出了一个新的评价函数,即互信息比值。实验证明这一方法简单可行,有助于提高所选特征子集的有效性。

关键词: 文本分类, 特征选择, 评价函数, 互信息比值

Abstract: Feature selection is a valid method to reduce the dimension of text vector in automatic text categorization system. After analyzing some normal evaluation functions for feature selection, a new evaluation function named the ratio of mutual information in feature selection was presented. Experiments show that the method is simple and feasible. It is advantageous in improving the efficiency of the selected feature subset.

Key words: text categorization, feature selection, evaluation function, ratio of mutual information

中图分类号: