Credibility evaluating method of Chinese microblog based on information fusion

doi:10.11772/j.issn.1001-9081.2016.08.2071

Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (8): 2071-2075.DOI: 10.11772/j.issn.1001-9081.2016.08.2071

Previous Articles Next Articles

Credibility evaluating method of Chinese microblog based on information fusion

GAO Mingxia¹, CHEN Furong²

1. College of Computer Science, Beijing University of Technology, Beijing 100124, China;
2. TravelSky Technology Limited, Beijing 100127, China

Received:2015-03-01 Revised:2015-05-09 Online:2016-08-10 Published:2016-08-10
Supported by:
This work is partly supported by the National Natural Science Foundation of China (61375059), the Specialized Research Fund for the Doctoral Program of Higher Education (20121103110031), the Beijing Municipal Education Research Plan Key Project (KZ201410005004), the Opening Project of State Key Laboratory of Digital Publishing Technology, Peking University Founder Group Corp.

基于信息融合的中文微博可信度评估方法

高明霞¹, 陈福荣²

1. 北京工业大学计算机学院, 北京 100124;
2. 中国民航信息网络股份有限公司, 北京 100127

通讯作者: 高明霞
作者简介:高明霞(1973-),女,河北张北人,工程师,博士,CCF会员,主要研究方向:数据挖掘、语义网络、知识工程;陈福荣(1978-),男,江西于都人,工程师,硕士,主要研究方向:数据挖掘、机器学习。
基金资助:
国家自然科学基金面上项目（61375059）；高等学校博士学科点专项科研基金——博导类基金资助项目（20121103110031）；北京市教委科研计划重点项目（KZ201410005004）；北大方正集团有限公司数字出版技术国家重点实验室开放课题资助项目。

Abstract

Abstract: To measure Chinese microblog, a framework of Credibility of Chinese Microblog based on Information Fusion (CCM-IF) was proposed by analyzing impact factors of Chinese microblog and their pedigree. Firstly, different evaluating methods were implemented for three particular features, such as text message, user, and information propagation. Secondly, a method based on Dempster-Shafer (D-S) theory was proposed to combine the features from the fuzzy nature of the credibility. Thirdly, a series of experimental validations involving two real datasets from Sina Weibo were conducted. Experimental results show that the accuracy of CCM-IF is 10%-20% higher than that of the classical ranking algorithm named LMJM (Language Modeling with Jelinek-Mercer smoothing). So, as a static indicator of quality assessment, CCM-IF can be used for microblog retrieval ordering and garbage microblog filtering.

Key words: Chinese microblog, credibility, information fusion, four quadrant principle, evidence theory

摘要： 针对中文微博信息的特点及这些特点的可测量性和实际任务，系统地梳理了中文微博信息可信度测量指标，并将其进行了谱系化分析，提出一个基于信息融合的中文微博可信度评估框架CCM-IF。首先，为本质不同的三个异构特征：文本内容、信息作者与信息传播使用了不同的度量方式；其次，基于决策层可信度的模糊认知特点，采用了多维证据理论进行特征融合；最后，收集了新浪微博两个真实数据集进行了一系列实验。实验结果表明，与传统信息检索排序方法平滑语言模型（LMJM）相比，CCM-IF符合用户需求的信息占比提高了10%~20%。因此，作为一个静态质量评估指标，CCM-IF可直接用于微博检索排序、垃圾微博过滤等实际任务。

关键词: 中文微博, 可信度, 信息融合, 四象限法则, 证据理论

CLC Number:

TP391

GAO Mingxia, CHEN Furong. Credibility evaluating method of Chinese microblog based on information fusion[J]. Journal of Computer Applications, 2016, 36(8): 2071-2075.

高明霞, 陈福荣. 基于信息融合的中文微博可信度评估方法[J]. 计算机应用, 2016, 36(8): 2071-2075.

References

[1] 张剑峰,夏云庆,姚建民.微博文本处理研究综述[J].中文信息学报,2012,26(4):21-27.(ZHANG J F,XIA Y Q,YAO J M.A review towards micro text processing[J].Journal of Chinese Information Processing,2012,26(4):21-27.)
[2] CASTILLO C,MENDOZA M,POBLETE B.Information credibility on twitter[C]//WWW'11:Proceedings of the 20th International Conference on World Wide Web.New York:ACM,2011:675-684.
[3] RAVIKUMAR S,BALAKRISHNAN R,KAMBHAMPATI S.Ranking tweets considering trust and relevance[C]//ⅡWeb'12:Proceedings of the 9th International Workshop on Information Integration on the Web.New York:ACM,2012:Article No.4.
[4] NAGMOTI R,TEREDESAI A,COCK M D.Ranking approaches for microblog search[C]//WI-IAT'10:Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.Washington,DC:IEEE Computer Society,2010,1:153-157.
[5] 高承实,荣星,陈越.微博舆情监测指标体系研究[J].情报杂志,2011,30(9):66-70.(GAO C S,RONG X,CHEN Y.Research on public opinion monitoring index-system in micro-blogging[J].Journal of Intelligence,2011,30(9):66-70.)
[6] 焦德武,常松.微博舆情:生产、研判与处置研究[J].安徽师范大学学报(人文社会科学版),2013,41(1):65-71.(JIAO D W,CHANG S.Study of micro-blog public opinions:production,judgments and treatment[J].Journal of Anhui Normal University (Humanities and Social Sciences),2013,41(1):65-71.)
[7] 郭秋艳,何跃.新浪微博名人用户特征挖掘及效应研究[J].情报杂志,2013,32(2):112-116.(GUO Q Y,HE Y.Study on the celebrity users' characteristics mining and the effects of Sina micro-blog[J].Journal of Intelligence.2013,32(2):112-116.)
[8] WANG N,SHE J,CHEN J.How "Big Vs" dominate Chinese microblog:a comparison of verified and unverified users on Sina Weibo[C]//WebSci'14:Proceedings of the 2014 ACM Conference on Web Science.New York:ACM,2014:182-186.
[9] HOVLAND C I.Changes in attitude through communication[J].Journal of Abnormal Psychology,1951,46(3):424-437.
[10] O'KEFFE D J.Persuasion:Theory and Research[M].Newbury Park:SAGE Publications,1992:131-132.
[11] WEERKAMP W,DE RIJKE M.Credibility-inspired ranking for blog post retrieval[J].Information Retrieval,2012,15(3/4):243-277.
[12] DEMPSTER A P. Upper and lower probabilities induced by a multivalued mapping[M//Classic Works of the Dempster-Shafer Theory of Belief Functions, Volume 219 of the series Studies in Fuzziness and Soft Computing. Berlin:Springer-Verlag, 2008:57-72.

[1]	CAO Jianfang, TIAN Xiaodong, JIA Yiming, YAN Minmin. Application of improved DeepLabV3+ model in mural segmentation [J]. Journal of Computer Applications, 2021, 41(5): 1471-1476.
[2]	REN Kezhou, PENG Furong, GUO Xin, WANG Zhe, ZHANG Xiaojing. Social recommendation based on dynamic integration of social information [J]. Journal of Computer Applications, 2021, 41(10): 2806-2812.
[3]	LI Yang, ZHANG Wei, PENG Chen. Target-dependent method for authorship attribution [J]. Journal of Computer Applications, 2020, 40(2): 473-478.
[4]	YANG Lei, ZHAO Hongdong. Environment sound recognition based on lightweight deep neural network [J]. Journal of Computer Applications, 2020, 40(11): 3172-3177.
[5]	LI Xufeng, SONG Yafei, LI Xiaonan. Temporal evidence fusion method with consideration of time sequence preference of decision maker [J]. Journal of Computer Applications, 2019, 39(6): 1626-1631.
[6]	LIU Zhengming, MA Hong, LIU Shuxin, LI Haitao, CHANG Sheng. Network representation learning algorithm incorporated with node profile attribute information [J]. Journal of Computer Applications, 2019, 39(4): 1012-1020.
[7]	DONG Haiyan, YU Feng, CHENG Ke, HUANG Shucheng. Credibility analysis method of online user behavior based on non-interference theory [J]. Journal of Computer Applications, 2019, 39(10): 3002-3006.
[8]	LI Chao, XIANG Jing, XIANG Jun. Assessment method of credibility on online product reviews [J]. Journal of Computer Applications, 2019, 39(1): 181-185.
[9]	ZHANG Li, SUN Jun, LI Dawei, NIU Minghang, GAO Yidan. Rolling bearing sub-health recognition algorithm based on fusion deep learning [J]. Journal of Computer Applications, 2018, 38(8): 2224-2229.
[10]	ZHENG Yaoyu, FANG Yangwang, WEI Xianzhi, CHEN Shaohua, GAO Xiang, WANG Hongke, PENG Weishi. Evaluation method for simulation credibility based on cloud model [J]. Journal of Computer Applications, 2018, 38(6): 1535-1541.
[11]	WANG Shasha, FENG Ziliang, FU Keren. Saliency detection method based on graph node centrality and spatial autocorrelation [J]. Journal of Computer Applications, 2018, 38(12): 3547-3556.
[12]	LIU Pan, ZHANG Bang, HUANG Chao, YANG Weijun, XU Zhengyi. Pedestrian heading particle filter correction method with indoor environment constraints [J]. Journal of Computer Applications, 2018, 38(12): 3360-3366.
[13]	GAO Junqiang, TANG Xiaqing, ZHANG Huan, GUO Libin. Processing method of INS/GPS information delay based on factor graph algorithm [J]. Journal of Computer Applications, 2018, 38(11): 3342-3347.
[14]	TAI Yingying, PANG Ying, DUAN Keke, FU Yunpeng. Dynamic algorithm of load balancing based on D-S evidence theory with improved weight [J]. Journal of Computer Applications, 2018, 38(10): 2976-2981.
[15]	WANG Jian, ZHANG Zhiyong, QIAO Kuoyuan. Evidence combination rule with similarity collision reduced [J]. Journal of Computer Applications, 2018, 38(10): 2794-2800.

Credibility evaluating method of Chinese microblog based on information fusion

基于信息融合的中文微博可信度评估方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics