计算机应用 ›› 2016, Vol. 36 ›› Issue (8): 2252-2256.DOI: 10.11772/j.issn.1001-9081.2016.08.2252

• 人工智能 • 上一篇    下一篇

基于长短期记忆多维主题情感倾向性分析模型

滕飞, 郑超美, 李文   

  1. 南昌大学 信息工程学院, 南昌 330031
  • 收稿日期:2016-01-27 修回日期:2016-04-23 出版日期:2016-08-10 发布日期:2016-08-10
  • 通讯作者: 郑超美
  • 作者简介:滕飞(1990-),女,天津人,硕士研究生,主要研究方向:人工智能、数据分析;郑超美(1959-),女,江西抚州人,教授,主要研究方向:人工智能、数据分析、计算机网络;李文(1980-2016),女,江西宜丰人,副教授,博士研究生,主要研究方向:文本信息处理。
  • 基金资助:
    江西省科技支撑计划项目(20112BBE50045)。

Multidimensional topic model for oriented sentiment analysis based on long short-term memory

TENG Fei, ZHENG Chaomei, LI Wen   

  1. College of Information and Engineering, Nanchang University, Nanchang Jiangxi 330031, China
  • Received:2016-01-27 Revised:2016-04-23 Online:2016-08-10 Published:2016-08-10
  • Supported by:
    This work is partially supported by Science and Technology Plan Project of Jiangxi Province (20112BBE50045).

摘要: 针对中文微博全局性情感倾向分类的准确性不高的问题,提出基于长短期记忆模型的多维主题模型(MT-LSTM)。该模型是一个多层多维序列计算模型,由多维长短期记忆(LSTM)细胞网络组成,适用于处理向量、数组以及更高维度的数据。该模型首先将微博语句分为多个层次进行分析,纵向以三维长短期记忆模型(3D-LSTM)处理词语及义群的情感倾向,横向以多维长短期记忆模型(MD-LSTM)多次处理整条微博的情感倾向;然后根据主题标签的高斯分布判断情感倾向;最后将几次判断结果进行加权得到最终的分类结果。实验结果表明,该算法平均查准率达91%,最高可达96.5%;中性微博查全率高达50%以上。与递归神经网络(RNN)模型相比,该算法F-测量值提升40%以上;与无主题划分的方法相比,细致的主题划分可将F-测量值提升11.9%。所提算法具有较好的综合性能,能够有效提升中文微博情感倾向分析的准确性,同时减少训练数据量,降低匹配计算的复杂度。

关键词: 中文微博, 情感倾向分析, 长短期记忆, 多层多维模型, 主题标签

Abstract: Concerning the low accuracy of global Chinese microblog sentiment classification, a new model was introduced from the perspective of Multi-dimensional Topics based on Long Short-Term Memory (MT-LSTM). The proposed model was constituted by hierarchical multidimensional sequence computation, it was composed of Long Short-Term Memory (LSTM) cell network and suitable for processing vector, array and higher dimensional data. Firstly, microblog was divided into multiple levels for analysis. To upward spread, sentiment tendencies of words and phrases were analyzed by three-Dimensional Long Short-Term Memory (3D-LSTM); to rightward spread, sentiment tendencies of the whole microblog were analyzed by Multi-Dimensional Long Short-Term Memory (MD-LSTM). Secondly, sentiment tendencies were analyzed by Gaussian distribution in topic sign. Finally, the classification result was obtained by weighting above analyses. The experimental results show that the average precision of the proposed model reached 91%, up to 96.5%, and the recall of the neutral microblog reached 50%. In the comparison experiments with Recursive Neural Network (RNN) model, the F-measure of MT-LSTM was enhanced above 40%; compared with no topic division, the F-measure of MT-LSTM was enhanced by 11.9% because of meticulous topic division. The proposed model has good overall performance, it can effectively improve the accuracy of analyzing Chinese microblog sentiment tendencies and reduce the amount of training data and the complexity of matching calculation.

Key words: Chinese microblog, oriented sentiment analysis, Long Short-Term Memory (LSTM), hierarchical multidimensional model, topic sign

中图分类号: