计算机应用 ›› 2014, Vol. 34 ›› Issue (8): 2317-2321.DOI: 10.11772/j.issn.1001-9081.2014.08.2317

• 人工智能 • 上一篇    下一篇

基于文本分类的商品评价情感分析

钟将,杨思源,孙启干   

  1. 重庆大学 计算机学院,重庆400044
  • 收稿日期:2014-02-12 修回日期:2014-03-25 出版日期:2014-08-01 发布日期:2014-08-10
  • 通讯作者: 杨思源
  • 作者简介:钟将(1974-),男,重庆江津人,教授,博士,主要研究方向:文本分析、数据挖掘、知识管理;杨思源(1988-),女,北京通州人,硕士研究生,主要研究方向:文本分类、数据挖掘;孙启干(1986-),男,山东临沂人,硕士研究生,主要研究方向:文本分类、数据挖掘。
  • 基金资助:

    国家自然科学基金资助项目;中央高校基本科研业务费资助项目

Sentiment analysis for goods evaluation based on text classification

ZHONG Jiang,YANG Siyuan,SUN Qigan   

  1. College of Computer Science, Chongqing University, Chongqing 400044, China
  • Received:2014-02-12 Revised:2014-03-25 Online:2014-08-01 Published:2014-08-10
  • Contact: YANG Siyuan

摘要:

为了在准确判断商品评价情感倾向的同时提高识别效率,提出了基于矩阵投影(MP)和归一化向量(NLV)的文本分类算法实现对商品评价的情感分析。首先,利用矩阵投影提取商品评价的特征词;然后,计算每一类别中特征词的平均特征频率(FF),采用归一化函数(NLF)对平均特征频率进行归一化处理,得到每一类别的归一化向量;最后,通过比较评价的特征向量与每一类别的归一化向量的相似度预测评价的情感倾向。与k近邻(kNN)、朴素贝叶斯(NB)和支持向量机(SVM)算法进行了对比,实验结果表明该算法具有较高的预测准确度和分类速度:尤其与kNN算法相比该算法有明显优势,该算法的宏平均F1值比kNN高出12%以上,分类时间缩短了11/12;与SVM算法相比分类速度也大幅提高。

Abstract:

To improve the efficiency of recognition while determining the emotional tendencies of goods evaluation accurately, this paper proposed a text classification approach based on Matrix Projection (MP) and Normalized Vector (NLV) to realize sentiment analysis for goods evaluation. Firstly, this approach extracted feature words of goods evaluation by utilizing matrix projection, and then computed the average Feature Frequency (FF) of feature words in each category, and obtained normalized vector through normalized processing to feature frequency of each category by using Normalized Function (NLF). Finally, it predicted the sentiment tendency by comparing similarity between feature vector of goods evaluation and normalized vector of each category. Compared with the k-Nearest Neighbor (kNN), Naive Bayesian (NB) and Support Vector Machine (SVM) algorithm, the experimental results show that the proposed approach has higher prediction accuracy and speed of classification. Especially compared with the kNN the approach has obvious advantages, its macro average F1 value is more than 12% higher than the kNN and classification time is reduced by 11/12〖BP(〗reduce to或reduce by〖BP)〗. Compared with the SVM its speed is greatly improved.

中图分类号: