计算机应用

• 人工智能与仿真 •    下一篇

BIGDATA-133 基于梯度提升决策树的微博虚假消息检测

段大高1,盖新新1,韩忠明1,刘冰心2   

  1. 1. 北京工商大学 计算机与信息工程学院,北京 100048;
    2. 北京工商大学 食品安全大数据技术北京市重点实验室,北京 100048;
    3. University of Liverpool,Liverpool
  • 收稿日期:2017-10-09 发布日期:2017-10-09 出版日期:2017-10-18
  • 通讯作者: 盖新新
  • 作者简介:段大高(1976—),男,湖南邵阳市人,副教授,博士,CCF会员,主要研究方向:多媒体信息处理、现代网络通信、嵌入式系统、智能数据分析; 盖新新(1990—),女,河北邢台市人,硕士研究生,主要研究方向:数据挖掘; 韩忠明(1972—),男,山西人,副教授,博士,CCF会员,主要研究方向:海量数据分析与挖掘、互联网挖掘,生物信息学; 刘冰心(1996—),女,北京人,本科,主要研究方向:数据挖掘。
  • 基金资助:

    教育部人文社会科学研究基金项目(13YJC860006);北京市自然科学基金资助项目(4172016);北京市科技计划项目(Z161100001616004)。

BIGDATA-133- Micro-blog misinformation detection based on gradient boost decision tree

DUAN Dagao<sup>1,2</sup> , GAI Xinxin<sup>1</sup> , HAN Zhongming<sup>1,2*</sup>LIU Bingxin<sup>3</sup>   

  1. 1. School of Computer and information engineering, Beijing Technology and Business University, Beijing 100048, China;
    2. Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing 100048, China;
    3. University of Liverpool, Liverpool
  • Received:2017-10-09 Online:2017-10-09 Published:2017-10-18
  • Contact: Xin-Xin GAI
  • About author:DUAN Dagao, born in 1976. Ph.D., associate professor. His research interests include data mining, multi-media information processing. GAI Xinxin, born in 1990. M. S. candidate. Her research interests include data mining. HAN Zhongming, born in 1972. Ph.D., associate professor. His research interests include data mining, web mining, natural language. LIU Bingxin, born in 1996. B. S. candidate. Her research interests include data mining.
  • Supported by:

    This work is partially supported by the Humanities and Social Sciences of Ministry of Education Planning Fund (13YJC860006), the Beijing Municipal Natural Science Foundation(4172016), and the Beijing Science and Technology Project(Z161100001616004)

摘要:

微博是信息共享的重要平台,同时,也成为虚假消息产生和推广的重要平台,虚假消息的传播严重扰乱了社会秩序。为了快速、有效地识别微博虚假消息,提出了一种基于梯度提升决策树(GBDT)的虚假消息检测方法。首先,从评论的角度分析微博虚假消息和真实消息之间存在的差异,在此基础上提取评论中的文本内容、用户属性,信息传播和时间特性的分类特征;然后,基于分类特征,采用GBDT算法实现微博虚假消息识别模型;最后, 在两个真实的微博数据集上进行验证。实验结果表明,基于GBDT的识别模型能有效提高微博虚假消息检测的准确率。

关键词: 微博, 社交网络, 虚假消息, 梯度提升决策树, 评论

Abstract:

Weibo has become an important platform for information sharing. Meanwhile, it is also one of the main ways for spreading of different misinformation. In order to detect the micro-blog misinformation quickly and effectively, a method based on Gradient boost decision tree ( GBDT ) was proposed. Firstly, classification features of content, user properties, information dissemination and time characteristic were extracted from the comments of micro-blog. Then an identification model based on GBDT algorithm was proposed to detect misinformation. Finally, two real Weibo dataset were used to verify the efficiency and effectiveness of the model. The experimental results show that the model can effectively improve the classification performance.

Key words: micro-blog, social network, misinformation, Gradient Boost Decision Tree, comment

中图分类号: