New feature weight calculation method for short text
MA Wenwen1,DENG Yigui1,2
1. College of Computer Science, Chongqing University, Chongqing 400044, China 2. Center of Information and Network, Chongqing University, Chongqing 400044, China
Abstract:The inherent sparse features and unbalanced sample of the short text make it difficult for short text to use traditional weight of long text mechanically. To resolve this problem, an approach of short text feature weight named Integrated Category (IC) was proposed. This approach introduced the concept of inverse document frequency and relevancy frequency, and integrated the distribution of sample in positive category and negative category. The experimental results show that, compared with other feature weight methods, the micro-average and macro-average of this method are above 90%, and it can enhance the sample categories distinguishing ability in negative category, and improve the precision and recall of short text categorization.
马雯雯 邓一贵. 新的短文本特征权重计算方法[J]. 计算机应用, 2013, 33(08): 2280-2282.
MA Wenwen DENG Yigui. New feature weight calculation method for short text. Journal of Computer Applications, 2013, 33(08): 2280-2282.