Journal of Computer Applications ›› 2009, Vol. 29 ›› Issue (07): 1755-1757.
• Pattern recognition and Software • Next Articles
Received:
Revised:
Online:
Published:
熊忠阳,蒋健,张玉芳
通讯作者:
基金资助:
省部级基金
Abstract:
Reducing the high dimension of feature vectors is an essential part of text categorization. After studying current dimension reduction technique and analyzing some normal methods of feature selection, a new approach, named CDF, in feature selection was proposed by comprehensively taking concentration among classes, distribution in class and average frequency in class into account. Experiment takes K-nearest neighbor(KNN) as the evaluating classifier. Experiment results prove that CDF approach is simple and effective, and get a better performance than conventional feature selection methods in dimension reduction.
摘要:
对高维的特征集进行降维是文本分类过程中的一个重要环节。本文在研究了现有的特征降维技术的基础之上,对部分常用的特征提取方法做了简要的分析,之后结合类间集中度、类内分散度和类内平均频度,提出了一个新的特征提取方法,即CDF方法。实验采用K-最近邻分类算法(KNN)来考察CDF方法的有效性。结果表明该方法简单有效,能够取得比传统特征提取方法更优的降维效果。
关键词: 文本分类;降维;评价函数
CLC Number:
TP391
熊忠阳 蒋健 张玉芳. 新的CDF文本分类特征提取方法研究[J]. 计算机应用, 2009, 29(07): 1755-1757.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/
http://www.joca.cn/EN/Y2009/V29/I07/1755