Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (8): 2071-2075.DOI: 10.11772/j.issn.1001-9081.2016.08.2071

Previous Articles     Next Articles

Credibility evaluating method of Chinese microblog based on information fusion

GAO Mingxia1, CHEN Furong2   

  1. 1. College of Computer Science, Beijing University of Technology, Beijing 100124, China;
    2. TravelSky Technology Limited, Beijing 100127, China
  • Received:2015-03-01 Revised:2015-05-09 Online:2016-08-10 Published:2016-08-10
  • Supported by:
    This work is partly supported by the National Natural Science Foundation of China (61375059), the Specialized Research Fund for the Doctoral Program of Higher Education (20121103110031), the Beijing Municipal Education Research Plan Key Project (KZ201410005004), the Opening Project of State Key Laboratory of Digital Publishing Technology, Peking University Founder Group Corp.

基于信息融合的中文微博可信度评估方法

高明霞1, 陈福荣2   

  1. 1. 北京工业大学 计算机学院, 北京 100124;
    2. 中国民航信息网络股份有限公司, 北京 100127
  • 通讯作者: 高明霞
  • 作者简介:高明霞(1973-),女,河北张北人,工程师,博士,CCF会员,主要研究方向:数据挖掘、语义网络、知识工程;陈福荣(1978-),男,江西于都人,工程师,硕士,主要研究方向:数据挖掘、机器学习。
  • 基金资助:
    国家自然科学基金面上项目(61375059);高等学校博士学科点专项科研基金——博导类基金资助项目(20121103110031);北京市教委科研计划重点项目(KZ201410005004);北大方正集团有限公司数字出版技术国家重点实验室开放课题资助项目。

Abstract: To measure Chinese microblog, a framework of Credibility of Chinese Microblog based on Information Fusion (CCM-IF) was proposed by analyzing impact factors of Chinese microblog and their pedigree. Firstly, different evaluating methods were implemented for three particular features, such as text message, user, and information propagation. Secondly, a method based on Dempster-Shafer (D-S) theory was proposed to combine the features from the fuzzy nature of the credibility. Thirdly, a series of experimental validations involving two real datasets from Sina Weibo were conducted. Experimental results show that the accuracy of CCM-IF is 10%-20% higher than that of the classical ranking algorithm named LMJM (Language Modeling with Jelinek-Mercer smoothing). So, as a static indicator of quality assessment, CCM-IF can be used for microblog retrieval ordering and garbage microblog filtering.

Key words: Chinese microblog, credibility, information fusion, four quadrant principle, evidence theory

摘要: 针对中文微博信息的特点及这些特点的可测量性和实际任务,系统地梳理了中文微博信息可信度测量指标,并将其进行了谱系化分析,提出一个基于信息融合的中文微博可信度评估框架CCM-IF。首先,为本质不同的三个异构特征:文本内容、信息作者与信息传播使用了不同的度量方式;其次,基于决策层可信度的模糊认知特点,采用了多维证据理论进行特征融合;最后,收集了新浪微博两个真实数据集进行了一系列实验。实验结果表明,与传统信息检索排序方法平滑语言模型(LMJM)相比,CCM-IF符合用户需求的信息占比提高了10%~20%。因此,作为一个静态质量评估指标,CCM-IF可直接用于微博检索排序、垃圾微博过滤等实际任务。

关键词: 中文微博, 可信度, 信息融合, 四象限法则, 证据理论

CLC Number: