Abstract:The word frequency matrix currently used in text categorization is characterized with high dimensionality and excessive sparsity. These two features caused some difficulties to computing. To solve this problem, according to the search engine users' selections, a new text categorization method based upon the feature of topic words frequency was proposed. This approach was designed to filter new concept topic words by statistical method, and then the FCM clustering algorism was applied to the documents, using the frequency of topic words rather than the frequency of single word as the feature.This method performs well in the experiment. Furthermore, this method was compared in many aspects with a text categorization method based on keyword clusters, and some useful conclusions about implementation and application were reached.
康恺 林坤辉 周昌乐 . 基于主题词频数特征的文本主题划分[J]. 计算机应用, 2006, 26(8): 1993-1995.
. New text categorization method based on the frequency of topic words. Journal of Computer Applications, 2006, 26(8): 1993-1995.