计算机应用 ›› 2010, Vol. 30 ›› Issue (4): 1011-1014.

• 人工智能 • 上一篇    下一篇

基于机器学习的网络新闻评论情感分类研究

周杰1,林琛2,李弼程2   

  1. 1. 信息工程大学信息工程学院
    2.
  • 收稿日期:2009-09-08 修回日期:2009-11-30 发布日期:2010-04-15 出版日期:2010-04-01
  • 通讯作者: 周杰
  • 基金资助:
    国家863项目

Research of sentiment classification for netnews comments by machine learning

  • Received:2009-09-08 Revised:2009-11-30 Online:2010-04-15 Published:2010-04-01
  • Contact: jie zhou
  • Supported by:
    the National HighTechnology Research and Development Program of China

摘要: 首先对网络新闻评论数据的特点进行归纳总结,选取不同的特征集、特征维度、权重计算方法和词性等因素进行分类测试,并对实验结果进行分析比较。对比结果表明:情感词和论据词语搭配效果优于仅使用情感词作为评论特征;另外该类数据中特征维度对分类准确率的影响减小,且TF-IDF权重计算方法仍优于布尔型权重;在词性选择上,名词和动词词性比形容词和副词取得更好的分类效果。

关键词: 网络新闻评论, 中文信息处理, 情感分类, 机器学习, 口语化评论

Abstract: Netnews comments has become an important channel to express personal opinions for the common people, and sentiment analysis can find out the whole attitude of the common people for the news events. This paper summarized the characteristics of netnews comments firstly, and selected different sets of feature, different feature dimensions, different feature-weight methods and parts of speech to construct classifiers; then made the comparison and analysis to the experimental results. The results of comparison show that the features combining sentiment words and argument words perform well to those only employing sentiment words; otherwise, feature dimension has less influence on the accuracy of classification for this kind of data, and the feature-weight method of TF-IDF is still better than boolean method. As for part of speech selection, nouns and verbs as features obtain better performance than adjectives and adverbs.

Key words: netnews comments, Chinese information processing, sentiment analysis, machine learning, oral comments