计算机应用 ›› 2005, Vol. 25 ›› Issue (07): 1632-1633.DOI: 10.3724/SP.J.1087.2005.01632

• 人工智能 • 上一篇    下一篇

基于模式聚类和遗传算法的文本特征提取方法

郝占刚,王正欧   

  1. 天津大学 系统工程研究所
  • 收稿日期:2004-12-21 发布日期:2005-07-01 出版日期:2005-07-01
  • 作者简介:郝占刚(1976-),男,河北邢台人,博士研究生,主要研究方向:文本挖掘、遗传算法;王正欧(1938-),男,上海人,教授,博士生导师,主要研究方向:神经网络、数据挖掘、知识发现
  • 基金资助:

    国家自然科学基金资助项目(60275020)

Text feature selection method based on pattern clustering and genetic algorithm

HAO Zhan-gang,WANG Zheng-ou   

  1. Institute of Systems Engineering, Tianjin University
  • Received:2004-12-21 Online:2005-07-01 Published:2005-07-01

摘要:

采用模式聚类和遗传算法进行文本特征提取,并用Kohonen网络进行分类。模式聚类可以有效降低文本特征的维数,使得特征从几千维降为几百维。但几百维的维数对Kohonen网络来说仍然太高,因此采用遗传算法在此基础上继续降维。实验结果表明,这两种方法结合可以极大地降低文本的维数,并能提高分类准确率。

关键词: 特征提取, 模式聚类, 遗传算法, Kohonen网络

Abstract:

The features of text were selected by pattern clustering and GA(Genetic Algorithm) and were classified by Kohonen network. The dimensions of text could be reduced greatly by using pattern clustering from thousands to hundreds, then reduced to tens by using GA. The experiment results indicate that combining these two methods can greatly reduce the dimension of text and improve the precision of text classification.

Key words: feature selection, pattern clustering, genetic algorithm, Kohonen network

中图分类号: