计算机应用 ›› 2018, Vol. 38 ›› Issue (11): 3084-3088.DOI: 10.11772/j.issn.1001-9081.2018041245

• 第七届中国数据挖掘会议(CCDM 2018) • 上一篇    下一篇

基于词典和弱标注信息的电影评论情感分析

樊振1, 过弋1,2, 张振豪1, 韩美琪1   

  1. 1. 华东理工大学 信息科学与工程学院, 上海 200237;
    2. 石河子大学 信息科学与技术学院, 新疆 石河子 832003
  • 收稿日期:2018-04-23 修回日期:2018-05-30 出版日期:2018-11-10 发布日期:2018-11-10
  • 通讯作者: 过弋
  • 作者简介:樊振(1994-),男,湖北仙桃人,硕士研究生,主要研究方向:自然语言处理、数据挖掘;过弋(1975-),男,江苏无锡人,教授,博士生导师,博士,CCF会员,主要研究方向:自然语言处理、智能信息处理、本体工程;张振豪(1993-),男,浙江杭州人,硕士研究生,主要研究方向:自然语言处理、数据挖掘;韩美琪(1994-),女,吉林吉林人,硕士研究生,主要研究方向:自然语言处理、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61462073);上海市科学技术委员会科研计划项目(17DZ1101003,18511106602)。

Sentiment analysis of movie reviews based on dictionary and weak tagging information

FAN Zhen1, GUO Yi1,2, ZHANG Zhenhao1, HAN Meiqi1   

  1. 1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China;
    2. College of Information Science and Technology, Shihezi University, Shihezi Xinjiang 832003, China
  • Received:2018-04-23 Revised:2018-05-30 Online:2018-11-10 Published:2018-11-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61462073), the Science and Technology Committee of Shanghai Municipality (17DZ1101003, 18511106602).

摘要: 针对评论文本情感分析研究中数据标注费时费力的问题,提出了一种新的数据自动标注方法。首先,通过基于情感词典的方法计算出评论文本的情感倾向;其次,利用用户评分的弱标注信息和基于词典方法的情感倾向对评论文本自动标注;最后,利用支持向量机(SVM)对评论文本进行情感分类。所提出的数据自动标注方法在两种类型数据集情感分类准确率上分别达到了77.2%和77.8%,相对于单一的利用用户评分对数据标注的方法,分别提高了1.7个百分点和2.1个百分点。实验结果表明,提出的数据自动标注方法在电影评论情感分析中能提高分类效果。

关键词: 电影评论, 情感词典, 弱标注信息, 支持向量机, 情感分类

Abstract: Focused on the time-consuming and laborious problem of data annotation in review text sentiment analysis, a new automatic data annotation method was proposed. Firstly, the sentiment tendency of the review text was calculated based on the sentiment dictionary. Secondly, the review text was automatically annotated by using the weak tagging information of the user and the sentiment tendency based on the dictionary. Finally, Support Vector Machine (SVM) was used to classify the sentiment of the review text. The proposed method reached 77.2% and 77.8% respectively in the accuracy of sentiment classification on two types of data sets, which were 1.7 percentage points and 2.1 percentage points respectively higher than those of the method only based on user rating. The experimental results show that the proposed method can improve the classification effect in movie reviews sentiment analysis.

Key words: movie review, sentiment dictionary, weak tagging information, Support Vector Machine (SVM), sentiment classification

中图分类号: