计算机应用 ›› 2014, Vol. 34 ›› Issue (8): 2188-2191.DOI: 10.11772/j.issn.1001-9081.2014.08.2188

• 第五届中国数据挖掘会议(CCDM 2014)论文 • 上一篇    下一篇

中文微博语料情感类别自动标注方法

阳爱民1,周咏梅1,周剑峰2   

  1. 1. 广东外语外贸大学 思科信息学院,广州510006
    2. 广东外语外贸大学 图书馆,广州510006
  • 收稿日期:2014-04-29 修回日期:2014-05-09 出版日期:2014-08-01 发布日期:2014-08-10
  • 通讯作者: 阳爱民
  • 作者简介:阳爱民(1970-),男,湖南永州人,教授,博士,CCF高级会员,主要研究方向:机器学习、文本情感分析;周咏梅(1971-),女,湖南永州人,教授,CCF会员,主要研究方向:自然语言处理、文本情感分析;周剑峰(1986-),男,湖南株洲人,助理馆员,硕士,主要研究方向:机器学习、文本情感分析。
  • 基金资助:

    国家社会科学基金资助项目;教育部新世纪优秀人才支持计划项目

Automatic annotation methods for Chinese micro-blog corpus with sentiment class

YANG Aiming1,ZHOU Yongmei1,ZHOU Jianfeng2   

  1. 1. Cisco School of Informatics, Guangdong University of Foreign Studies, Guangzhou Guangdong 510006, China;
    2. Library, Guangdong University of Foreign Studies, Guangzhou Guangdong 510006, China
  • Received:2014-04-29 Revised:2014-05-09 Online:2014-08-01 Published:2014-08-10
  • Contact: YANG Aiming
  • Supported by:

    National Social Science Fund

摘要:

针对大规模微博语料手动标注困难的问题,提出了中文微博语料情感类别自动标注的方法,包括基于关键词的、基于概率求和的和基于概率乘积的3种自动标注方法和一种集成标注方法。自动标注时首先分别使用3种标注方法进行标注,得到3种标注结果;然后,采用标注方法集成的策略,对3种标注的结果通过投票的方式决定最终的标注结果。通过设计自动标注实验系统进行实验,实验结果验证了所提方法的可行性和有效性。实验结果表明,单个标注方法的准确率均在70%以上,投票方法的准确率达90%以上。

Abstract:

For the difficulty of manual annotation on large-scale micro-blog corpus, three automatic annotation methods and an integrated annotation method by voting for Chinese micro-blog corpus were proposed. Three automatic annotation methods included keywords-based annotation method, probability-summation-based annotation method and probability-product-based annotation method. During the process of automatic annotation, firstly, micro-blog corpus were annotated by three annotation methods respectively, and three results were obtained, then the final annotation results were determined by voting method with the integrated strategy. By designing automatic annotation experiment system, experimental results verify the feasibility and effectiveness of the proposed methods, and show that the accuracy of the single annotation method is more than 70%, and it is more than 90% for the voting method.

中图分类号: