计算机应用 ›› 2015, Vol. 35 ›› Issue (10): 2721-2726.DOI: 10.11772/j.issn.1001-9081.2015.10.2721

• 第十五届中国机器学习会议(CCML2015)论文 • 上一篇    下一篇

中文微博情感分类的简单多标签排序算法

史绍亮1, 文益民1,2, 缪裕青1,2   

  1. 1. 桂林电子科技大学 计算机科学与工程学院, 广西 桂林 541004;
    2. 广西可信软件重点实验室(桂林电子科技大学), 广西 桂林 541004
  • 收稿日期:2015-06-15 修回日期:2015-06-22 出版日期:2015-10-10 发布日期:2015-10-14
  • 通讯作者: 文益民(1969-),男,湖南益阳人,教授,博士,CCF高级会员,主要研究方向:机器学习、数据挖掘、推荐系统、智慧旅游,ymwen2004@aliyun.com
  • 作者简介:史绍亮(1989-),男,湖南邵阳人,硕士研究生,CCF会员,主要研究方向:社会媒体挖掘、情感分析;缪裕青(1966-),女,浙江台州人,副教授,博士,主要研究方向:数据挖掘、云计算、并行计算。
  • 基金资助:
    国家自然科学基金资助项目(61363029,71340025);广西区科学研究与技术开发项目(桂科攻14124005-2-1);广西可信软件重点实验室项目(KX201311)。

Simple multi-label ranking for Chinese microblog sentiment classification

SHI Shaoliang1, WEN Yimin1,2, MIAO Yuqing1,2   

  1. 1. School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin Guangxi 541004, China;
    2. Guangxi Key Laboratory of Trusted Software (Guilin University of Electronic Technology), Guilin Guangxi 541004, China
  • Received:2015-06-15 Revised:2015-06-22 Online:2015-10-10 Published:2015-10-14

摘要: 针对中文微博文本情感分类中每个样本最多只有两种有序情感标签的情形,提出了一种简单的多标签排序算法——TSMLR,该算法采用两步学习和两步分类的策略,通过学习情感标签之间的主次关系,对微博文本的情感进行分类并对情感标签进行排序。首先,将一个多标签排序问题转化为八个多类单标签分类问题,分别对主要情感标签和次要情感标签进行学习;然后,利用得到的分类模型对微博表达的情感进行两步分类,首先给出主要情感标签,再给出次要情感标签。通过在NLP&CC2014的中文微博文本情感分析评测数据集上进行实验,与校准标签排序方法(CLR)相比,TSMLR方法的准确度和平均精度分别提高了8.59%和9.28%,1-错误率相应下降了9.77%,而且TSMLR所需的训练时间相对较少。实验结果表明:TSMLR对标签之间顺序关系的学习能够有效提高对中文微博情感分类的准确率。

关键词: 情感分析, 中文微博, 多标签排序, 情感分类, 两步策略

Abstract: In order to solve a specific case that each sample has two emotional labels at most in emotion classification of Chinese microblog text, a simple multi-label ranking algorithm named TSMLR was proposed. The proposed algorithm employed the strategy of two-stage learning and two-stage classification, and gave classification and ranking emotional labels for each microblog text by learning the relations between labels. Firstly, it transformed the emotion classification problem into eight single-label classification problems. One learning model was trained for the dominant emotion and seven learning models were trained for the secondary emotion. It classified for the dominant emotion label at first, then chose the corresponding classification model for the secondary emotion label. The experiment was conducted on the dataset of Chinese Weibo Texts provided by NLP&CC2014. The results showed that the proposed method improved the accuracy and average precision by 8.59% and 9.28% respectively, and decreased the one-error by 9.77% accordingly, compared to the method of Calibrated Label Ranking (CLR). In addition, the running time of the proposed method was lower than those of the two baseline methods. These experimental results illustrate that the proposed algorithm can effectively learn the label order and make more accurate emotion classification for Chinese microblog.

Key words: sentiment analysis, Chinese microblog, multi-label ranking, emotion classification, two-stage strategy

中图分类号: