计算机应用 ›› 2014, Vol. 34 ›› Issue (4): 1099-1104.DOI: 10.11772/j.issn.1001-9081.2014.04.1099

• 人工智能 • 上一篇    下一篇

基于Tri-training的评价单元识别

蒋润,顾春华,阮彤   

  1. 华东理工大学 信息科学与工程学院,上海 200237
  • 收稿日期:2013-10-08 修回日期:2013-12-08 出版日期:2014-04-01 发布日期:2014-04-29
  • 通讯作者: 蒋润
  • 作者简介:蒋润(1989-),男,上海人,硕士研究生,主要研究方向:自然语言处理、数据挖掘、情感分析;
    顾春华(1970-),男,江苏常熟人,教授,博士,主要研究方向:信息安全、软件工程、智能物流;
    阮彤(1973-),女,江苏南京人,副教授,博士,CCF会员,主要研究方向:自然语言处理、知识本体、文本挖掘。
  • 基金资助:

    国家科技支撑计划项目

Appraisal expression recognition based on Tri-training

JIANG Run,GU Chunhua,RUAN Tong   

  1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Received:2013-10-08 Revised:2013-12-08 Online:2014-04-01 Published:2014-04-29
  • Contact: JIANG Run

摘要:

评价单元的识别是情感倾向性分析中重要的一步,但由于标注语料匮乏,大多数研究集中在用人工构建规则、模板来识别评价单元的方法上。为了减轻标注训练语料的工作,同时进一步挖掘未标记样本的信息,提出一种基于协同训练机制的评价单元识别算法,以利用少量的已标记样本和大量的未标记样本来提高识别性能。该算法利用Tri-training的思想,将支持向量机(SVM)、最大熵(MaxEnt)以及条件随机场(CRF)三个不同分类器组合成一个分类体系,对生成的评价单元候选集进行分类。将Tri-training的算法思想应用于实验来对比采用单一分类器的方法,结果表明,该算法能够有效地识别主观句中的评价单元。

Abstract:

Appraisal expression recognition is very important in sentiment analysis. Because of the lack of labeled corpus, most former works in appraisal expression recognition are focused on construction of rules and templates manually. In order to reduce the training work of labeling corpus and further mining information of unlabeled corpus, a new algorithm based on co-training was proposed, which mainly used massive unlabeled corpus and only a small number of labeled corpus. The proposed algorithm was based on Tri-training and combined Support Vector Machine (SVM), Maximum Entropy (MaxEnt) and Conditional Random Field (CRF) to build a new approach for candidate appraisal expression classification. By comparing the Tri-training based algorithm with the former single classifier based algorithms, the former can effectively improve the performance of appraisal expression recognition in subjective sentences.

中图分类号: