计算机应用 ›› 2013, Vol. 33 ›› Issue (12): 3563-3566.

• 人工智能 • 上一篇    下一篇

基于AdaBoost的微博垃圾评论识别方法

黄铃,李学明   

  1. 重庆大学 计算机学院,重庆 400044
  • 收稿日期:2013-06-14 修回日期:2013-08-02 出版日期:2013-12-01 发布日期:2013-12-31
  • 通讯作者: 黄铃
  • 作者简介:黄铃(1988-),男,重庆人,硕士研究生,主要研究方向:数据挖掘、电子商务;
    李学明(1967-),男,重庆人,教授,博士,主要研究方向:数据挖掘、网格计算。
  • 基金资助:
    国家自然科学基金资助项目

Identification method of spam comments in microblog based on AdaBoost

HUANG Ling,LI Xueming   

  1. College of Computer Science, Chongqing University, Chongqing 400044, China
  • Received:2013-06-14 Revised:2013-08-02 Online:2013-12-31 Published:2013-12-01
  • Contact: HUANG Ling

摘要: 针对微博上存在的大量垃圾评论,提出一种基于AdaBoost的微博垃圾评论识别方法。该方法首先提取表示微博评论的特征值向量,由8个特征值组成,然后通过AdaBoost算法在这些特征上训练出若干个比随机预测好的弱分类器,最后将得到的弱分类器加权集合成高精度的强分类器。从实际的热门新浪微博中提取评论数据集进行实验,结果表明所选取的8个特征是有效的,该方法对于微博垃圾评论的识别拥有较高的识别率。

关键词: 微博, 垃圾评论识别, 特征值向量, AdaBoost算法, 弱分类器

Abstract: In view of the existence of a lot of spam comments in microblog, a new method based on AdaBoost was proposed to identify spam comments. This method firstly extracted feature vectors which consisted of eight feature values to represent the comments, then trained several weak classifiers which were better than random prediction on these features via AdaBoost algorithm, and finally combined these weighted weak classifiers to build a strong classifier with a high precision. The experimental results on comment data sets extracted from the popular Sina microblogs indicate that the selected eight features are effective for the method, and it has a high recognition rate in the identification of spam comments in microblog.

Key words: microblog, spam comments identification, feature vector, AdaBoost algorithm, weak classifier

中图分类号: