Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (2): 410-414.DOI: 10.11772/j.issn.1001-9081.2017082368

Previous Articles     Next Articles

Micro-blog misinformation detection based on gradient boost decision tree

DUAN Dagao1,2, GAI Xinxin1, HAN Zhongming1,2, LIU Bingxin3   

  1. 1. School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China;
    2. Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing 100048, China;
    3. Department of Mathematical Sciences, University of Liverpool, Liverpool, GB L69 7ZX
  • Received:2017-08-28 Revised:2017-10-10 Online:2018-02-10 Published:2018-02-10
  • Supported by:
    This work is partially supported by the Humanities and Social Sciences Foundation of Ministry of Education (13YJC860006), the Beijing Municipal Natural Science Foundation (4172016), the Beijing Science and Technology Project (Z161100001616004).

基于梯度提升决策树的微博虚假消息检测

段大高1,2, 盖新新1, 韩忠明1,2, 刘冰心3   

  1. 1. 北京工商大学 计算机与信息工程学院, 北京 100048;
    2. 北京工商大学 食品安全大数据技术北京市重点实验室, 北京 100048;
    3. University of Liverpool, Department of mathematical Sciences, Liverpool, GB L69 7ZX
  • 通讯作者: 韩忠明
  • 作者简介:段大高(1976-),男,湖南邵阳人,副教授,博士,CCF会员,主要研究方向:多媒体信息处理、现代网络通信、嵌入式系统、智能数据分析;盖新新(1990-),女,河北邢台人,硕士研究生,主要研究方向:数据挖掘;韩忠明(1972-),男,山西文水人,副教授,博士,CCF会员,主要研究方向:海量数据分析与挖掘、互联网挖掘、生物信息学;刘冰心(1996-),女,北京人,主要研究方向:数据挖掘。
  • 基金资助:
    教育部人文社会科学研究基金资助项目(13YJC860006);北京市自然科学基金资助项目(4172016);北京市科技计划项目(Z161100001616004)。

Abstract: Micro-blog has become an important platform for information sharing. Meanwhile, it is also one of the main ways for spreading of different misinformation. In order to detect the micro-blog misinformation quickly and effectively, a method based on Gradient Boost Decision Tree (GBDT) was proposed. Firstly, classification features of content, user properties, information dissemination and time characteristic were extracted from the comments of micro-blog. Then an identification model based on GBDT algorithm was proposed to detect misinformation. Finally, two real micro-blog datasets were used to verify the efficiency and effectiveness of the model. The experimental results show that the proposed model can effectively improve the accuracy of micro-blog misinformation detection.

Key words: micro-blog, social network, misinformation, gradient boost decision tree, comment

摘要: 微博是信息共享的重要平台,同时,也成为虚假消息产生和推广的重要平台,虚假消息的传播严重扰乱了社会秩序。为了快速、有效地识别微博虚假消息,提出一种基于梯度提升决策树(GBDT)的虚假消息检测方法。首先,从评论的角度分析微博虚假消息和真实消息之间存在的差异,在此基础上提取评论中的文本内容、用户属性,信息传播和时间特性的分类特征;然后,基于分类特征,采用GBDT算法实现微博虚假消息识别模型;最后,在两个真实的微博数据集上进行验证。实验结果表明,基于GBDT的识别模型能有效提高微博虚假消息检测的准确率。

关键词: 微博, 社交网络, 虚假消息, 梯度提升决策树, 评论

CLC Number: