Journal of Computer Applications ›› 2012, Vol. 32 ›› Issue (07): 2033-2037.DOI: 10.3724/SP.J.1087.2012.02033

• Typical applications • Previous Articles     Next Articles

Construction of Chinese polarity lexicon by integration of morpheme features

CHANG Xiao-long,ZHANG Hui   

  1. School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang Sichuan 621000, China
  • Received:2012-01-17 Revised:2012-02-29 Online:2012-07-05 Published:2012-07-01
  • Contact: ZHANG Hui

融合语素特征的中文褒贬词典构建

常晓龙,张晖   

  1. 西南科技大学 计算机科学与技术学院,四川 绵阳621000
  • 通讯作者: 张晖
  • 作者简介:常晓龙(1989-),男,安徽阜阳人,硕士,CCF会员,主要研究方向:情感分析、网页信息抽取、搜索引擎;张晖(1972-),男,安徽宿松人,教授,博士,主要研究方向:数据挖掘、知识工程。

Abstract: Concerning the dependence on seed words amount of the traditional method based on morpheme, and the low recall rate of traditional graph-based method, the authors proposed a method which integrated the morpheme relationship of Chinese words into the graph model, and combined the synonymy of words to build Chinese polarity lexicon by a semi-supervised learning algorithm in a graph. Firstly, a morpheme model was used to weight the similarity of two Chinese words. Secondly, synonymous words and bilingual lexicon were used to build the synonymy of words. Finally, the final relation was acquired by integrating the two relations, and Label Propagation (LP) was used to run on the relation map to distinguish the polarity of the emotion words. The experimental results show that the proposed method can achieve high accuracy and recall rate, and MicroF1 can be as high as 92.8%. The dependence on seed words amount is reduced based on the fact that when the seed word amount is 100, MicroF1 can still be 84.1%. In addition, the proposed method has fast convergence.

Key words: polarity lexicon, morpheme model, synonymy relation, graph model, Label Propagation (LP)

摘要: 针对传统语素方法对于种子词语数量的依赖和传统图方法召回率较低的问题,提出一种将词语间语素关系融入到图模型中,并结合词语同义关系进行中文褒贬词典半监督构建的方法。首先利用语素模型计算词语间语素相似度;然后利用同义词林和双语词典资源,构建词语间同义关系;最后将二种关系结合,并利用标签传播(LP)算法进行词语的褒贬分类。实验结果表明,所提方法具有较高的准确率和召回率,微平均F1值最高可达92.8%;并降低了对种子词语数量的依赖,当种子词语数量仅为100时,微平均F1值依然可达到84.1%。除此之外,所提方法还具有快速收敛的特性。

关键词: 极性词典, 语素模型, 同义关系, 图模型, 标签传播

CLC Number: