基于Tri-training的评价单元识别

doi:10.11772/j.issn.1001-9081.2014.04.1099

计算机应用 ›› 2014, Vol. 34 ›› Issue (4): 1099-1104.DOI: 10.11772/j.issn.1001-9081.2014.04.1099

基于Tri-training的评价单元识别

蒋润,顾春华,阮彤

华东理工大学信息科学与工程学院,上海 200237

收稿日期:2013-10-08 修回日期:2013-12-08 发布日期:2014-04-29 出版日期:2014-04-01
通讯作者: 蒋润
作者简介:蒋润(1989-),男,上海人,硕士研究生,主要研究方向:自然语言处理、数据挖掘、情感分析;
顾春华(1970-),男,江苏常熟人,教授,博士,主要研究方向:信息安全、软件工程、智能物流;
阮彤(1973-),女,江苏南京人,副教授,博士,CCF会员,主要研究方向:自然语言处理、知识本体、文本挖掘。
基金资助:
国家科技支撑计划项目

Appraisal expression recognition based on Tri-training

JIANG Run,GU Chunhua,RUAN Tong

School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China

Received:2013-10-08 Revised:2013-12-08 Online:2014-04-29 Published:2014-04-01
Contact: JIANG Run

摘要/Abstract

摘要：

评价单元的识别是情感倾向性分析中重要的一步，但由于标注语料匮乏，大多数研究集中在用人工构建规则、模板来识别评价单元的方法上。为了减轻标注训练语料的工作，同时进一步挖掘未标记样本的信息，提出一种基于协同训练机制的评价单元识别算法，以利用少量的已标记样本和大量的未标记样本来提高识别性能。该算法利用Tri-training的思想，将支持向量机(SVM)、最大熵(MaxEnt)以及条件随机场(CRF)三个不同分类器组合成一个分类体系，对生成的评价单元候选集进行分类。将Tri-training的算法思想应用于实验来对比采用单一分类器的方法，结果表明，该算法能够有效地识别主观句中的评价单元。

Abstract:

Appraisal expression recognition is very important in sentiment analysis. Because of the lack of labeled corpus, most former works in appraisal expression recognition are focused on construction of rules and templates manually. In order to reduce the training work of labeling corpus and further mining information of unlabeled corpus, a new algorithm based on co-training was proposed, which mainly used massive unlabeled corpus and only a small number of labeled corpus. The proposed algorithm was based on Tri-training and combined Support Vector Machine (SVM), Maximum Entropy (MaxEnt) and Conditional Random Field (CRF) to build a new approach for candidate appraisal expression classification. By comparing the Tri-training based algorithm with the former single classifier based algorithms, the former can effectively improve the performance of appraisal expression recognition in subjective sentences.

中图分类号:

TP391.4

蒋润顾春华阮彤. 基于Tri-training的评价单元识别[J]. 计算机应用, 2014, 34(4): 1099-1104.

JIANG Run GU Chunhua RUAN Tong. Appraisal expression recognition based on Tri-training[J]. Journal of Computer Applications, 2014, 34(4): 1099-1104.

参考文献

［1］MA X, JIN B, FAN B. An analysis of Chinese text emotional tendency ［J］. Information and Documentation Services,2013(1):52-56. (马晓玲,金碧漪,范并思.中文文本情感倾向分析研究［J］.情报资料工作, 2013(1): 52-56.)
［2］ZHU X. Semi-supervised learning literature survey, Computer Science TR 1530 ［R］. Madison: University of Wisconsin, 2008.
［3］CHANG Y, LIANG J, GAO J, et al.A semi-supervised clustering algorithm based on seeds and pair-wise constraints ［J］. Journal of Nanjing University: Natural Science Edition, 2012,48(4):405-411.(常瑜,梁吉业,高嘉伟,等.一种基于Seeds集和成对约束的半监督聚类算法［J］.南京大学学报:自然科学版,2012,48(4):405-411.)
［4］LIU B, HU M, CHENG J. Opinion observer: analyzing and comparing opinions on the Web ［C］ // WWW '05: Proceedings of the 14th International Conference on World Wide Web. New York: ACM, 2005: 342-351.
［5］YAO T, LOU D. Research on semantic orientation analysis for topics in Chinese sentences [J]. Journal of Chinese Information Processing, 2007, 21(5): 73-79. (姚天昉,娄德成.汉语语句主题语义倾向分析方法的研究[J].中文信息学报,2007,21(5):73-79.)
［6］ZHAO Y, QIN B, CHE W, et al.Appraisal expression recognition based on syntactic path［J］. Journal of Software, 2011, 22(5): 887-898. (赵妍妍,秦兵,车万翔,等.基于句法路径的情感评价单元识别［J］.软件学报, 2011, 22(5): 887-898.)
［7］FANG M, LIU P. Identification of evaluation collocation based on maximum entropy model ［J］. Application Research of Computers, 2011, 28(10): 3714-3716. (方明,刘培玉.基于最大熵模型的评价搭配识别［J］.计算机应用研究, 2011, 28(10): 3714-3716.)
［8］XU B,ZHAO T,WANG S,et al.Extraction of opinion targets based on shallow parsing features［J］.Acta Automatica Sinica,2011,37(10):1241-1247.(徐冰,赵铁军,王山雨,等.基于浅层句法特征的评价对象抽取研究［J］.自动化学报,2011,37(10): 1241-1247.)
［9］SHAHSHAHANI B M, LANDGREBE D A. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon ［J］. IEEE Transactions on Geoscience and Remote Sensing, 1994, 32(5): 1087-1095.
［10］ZHOU Z, WANG J. Machine learning and application ［M］. Beijing: Tsinghua University Press, 2007: 259-275.(周志华,王珏.机器学习及其应用［M］.北京:清华大学出版社, 2007: 259-275.)
［11］GOLDMAN S A, ZHOU Y. Enhancing supervised learning with unlabeled data ［C］// ICML '00: Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 2000: 327-334.
［12］ZHOU Z, LI M. Tri-training: exploiting unlabeled data using three classifiers ［J］. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541.
［13］ZHANG W, LIU J, GUO X. Xuesheng Baobianyi Cidian［M］. Beijing: Encyclopedia of China Publishing House, 2004. (张伟,刘缙,郭先珍.学生褒贬义词典［M］.北京:中国大百科全书出版社,2004.)
［14］DONG Q, DONG Z. HowNet knowledge database ［EB/OL］. ［2013-03-18］. http://www.keenage.com/. (董强,董振东.知网简介［EB/OL］. ［2013-03-18］. http://www.keenage.com/.)
［15］XU L, LIN H, PAN Y, et al.Constructing the affective lexicon ontology ［J］. Journal of the China Society for Scientific and Technical Information, 2008, 27(2): 180-185. (徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造［J］.情报学报,2008,27(2):180-185.)
［16］TAN S. Chinese sentiment corpus — ChnSentiCorp ［EB/OL］. ［2012-11-20］. http://www.searchforum.org.cn/tansongbo/senti_corpus.jsp. (谭松波. 中文情感挖掘语料——ChnSentiCorp ［EB/OL］. ［2012-11-20］.http://www.searchforum.org.cn/tansongbo/senti_corpus.jsp.)

[1]	张佳慧李晓明张嘉祥. 强化形态感知的路面缺陷检测算法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[2]	杨建锋陈斌李雨轩. 基于点云重构的自监督点云异常检测方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[3]	蒋畅江向杰何旭颖. 面向机械臂抓取的双目视觉目标定位算法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[4]	边小勇胡其仁袁培洋. 多注意力对比学习的红外小目标检测[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[5]	李钟华钟庚辛范萍朱恒亮. 通过边界挖掘和背景引导的伪装目标检测[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[6]	吴松霖张广朝姚远彭博. 基于判别区域引导的多视图困难气道识别[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[7]	李强白少雄熊源袁薇. 基于视觉大模型隐私保护的监控图像定位[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[8]	况世雄姚俊波陆佳炜王琪冰肖刚. 基于动态图卷积网络的电梯乘客异常行为数据增强方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[9]	康斌陈斌王俊杰李昱林赵军智咸伟志. 基于多粒度共享语义中心关联的文本到人物检索方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[10]	王磊胡节彭博. 用于半监督火灾检测的分布自适应和动态课程伪标签框架[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[11]	王子怡李卫军刘雪洋丁建平刘世侠苏易礌. 基于Swin Transformer与多尺度特征融合的图像描述方法#br# [J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[12]	熊炜, 陈奕博, 张丽真, 杨茜, 邹勤. 利用多帧序列影像的自监督单目深度估计[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3907-3914.
[13]	付可意, 王高才, 邬满. 基于改进区域提议网络和特征聚合小样本目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3790-3797.
[14]	杨成昊, 胡节, 王红军, 彭博. 基于注意力机制的不完备多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3784-3789.
[15]	方鹏, 赵凡, 王保全, 王轶, 蒋同海. 区块链3.0的发展、技术与应用[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3647-3657.

基于Tri-training的评价单元识别

Appraisal expression recognition based on Tri-training

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics