《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (11): 3371-3378.DOI: 10.11772/j.issn.1001-9081.2021122148
• 第九届CCF大数据学术会议 • 上一篇
收稿日期:
2021-12-21
修回日期:
2022-02-28
接受日期:
2022-03-04
发布日期:
2022-04-08
出版日期:
2022-11-10
通讯作者:
王莉
作者简介:
张斌(1995—),男,山西永济人,硕士研究生,CCF会员 ,主要研究方向:自然语言处理、谣言检测Bin ZHANG, Li WANG(), Yanjie YANG
Received:
2021-12-21
Revised:
2022-02-28
Accepted:
2022-03-04
Online:
2022-04-08
Published:
2022-11-10
Contact:
Li WANG
About author:
ZHANG Bin, born in 1995, M. S. candidate. His research interests include natural language processing, rumor detection.Supported by:
摘要:
当前,社交媒体平台成为人们发布和获取信息的主要途径,但简便的信息发布也导致了谣言更容易迅速传播,因此验证信息是否为谣言并阻止谣言传播,已经成为一个亟待解决的问题。以往的研究表明,人们对信息的立场可以协助判断信息是否为谣言。在此基础上,针对谣言泛滥的问题,提出了一个联合立场的过程跟踪式多任务谣言验证模型(JSP?MRVM)。首先,分别使用拓扑图、特征图和公共图卷积网络(GCN)对信息的三种传播过程进行表征;然后,利用注意机制获取信息的立场特征,并融合立场特征与推文特征;最后,设计多任务目标函数使立场分类任务更好地协助验证谣言。实验结果表明,所提模型在RumorEval数据集上的准确度和Macro?F1较基线模型RV?ML分别提升了10.7个百分点和11.2个百分点,可以更有效地检验谣言,减少谣言的泛滥。
中图分类号:
张斌, 王莉, 杨延杰. 联合立场的过程跟踪式多任务谣言验证模型[J]. 计算机应用, 2022, 42(11): 3371-3378.
Bin ZHANG, Li WANG, Yanjie YANG. Process tracking multi‑task rumor verification model combined with stance[J]. Journal of Computer Applications, 2022, 42(11): 3371-3378.
集合 | 会话数 | 推文数 | 准确度分类 | ||
---|---|---|---|---|---|
真 | 假 | 未验证 | |||
总计 | 325 | 5 568 | 145 | 74 | 106 |
训练集 | 272 | 4 238 | 127 | 50 | 95 |
验证集 | 25 | 281 | 10 | 12 | 3 |
测试集 | 28 | 1 049 | 8 | 12 | 8 |
表 1 RumorEval数据集分布
Tab. 1 RumorEval dataset distribution
集合 | 会话数 | 推文数 | 准确度分类 | ||
---|---|---|---|---|---|
真 | 假 | 未验证 | |||
总计 | 325 | 5 568 | 145 | 74 | 106 |
训练集 | 272 | 4 238 | 127 | 50 | 95 |
验证集 | 25 | 281 | 10 | 12 | 3 |
测试集 | 28 | 1 049 | 8 | 12 | 8 |
事件 | 简称 | 会话数 | 推文数 | 准确度分类 | ||
---|---|---|---|---|---|---|
真 | 假 | 未验证 | ||||
总计 | 2 402 | 21 382 | 1 067 | 638 | 697 | |
Charliehebdo | Gha | 458 | 4 779 | 193 | 116 | 149 |
Germanwings‑crash | Ger | 238 | 1 633 | 94 | 111 | 33 |
Ferguson | Fer | 284 | 3 881 | 10 | 8 | 266 |
Ottawashooting | Ott | 470 | 4 369 | 329 | 72 | 69 |
Sydneysiege | Syd | 522 | 5 628 | 382 | 86 | 54 |
Putinmissing | Put | 126 | 316 | 0 | 9 | 117 |
Prince‑toronto | Pri | 229 | 538 | 0 | 222 | 7 |
Gurlitt | Gur | 61 | 76 | 59 | 0 | 2 |
Ebola‑essien | Ebo | 14 | 126 | 0 | 14 | 0 |
表 2 PHEME数据集分布
Tab. 2 PHEME dataset distribution
事件 | 简称 | 会话数 | 推文数 | 准确度分类 | ||
---|---|---|---|---|---|---|
真 | 假 | 未验证 | ||||
总计 | 2 402 | 21 382 | 1 067 | 638 | 697 | |
Charliehebdo | Gha | 458 | 4 779 | 193 | 116 | 149 |
Germanwings‑crash | Ger | 238 | 1 633 | 94 | 111 | 33 |
Ferguson | Fer | 284 | 3 881 | 10 | 8 | 266 |
Ottawashooting | Ott | 470 | 4 369 | 329 | 72 | 69 |
Sydneysiege | Syd | 522 | 5 628 | 382 | 86 | 54 |
Putinmissing | Put | 126 | 316 | 0 | 9 | 117 |
Prince‑toronto | Pri | 229 | 538 | 0 | 222 | 7 |
Gurlitt | Gur | 61 | 76 | 59 | 0 | 2 |
Ebola‑essien | Ebo | 14 | 126 | 0 | 14 | 0 |
模型 | RumorEval | PHEME | ||
---|---|---|---|---|
Acc | Macro‑F1 | Acc | Macro‑F1 | |
BranchLSTM | 44.1 | 45.3 | 32.9 | 30.8 |
MTL‑2 | 51.9 | 52.4 | 35.1 | 31.2 |
MT‑ES | 55.6 | 54.5 | 35.3 | 31.8 |
RV‑ML | 59.3 | 59.6 | 38.5 | 35.7 |
JSP‑MRVM (O) | 70.0 | 70.8 | 39.5 | 36.1 |
表 3 RumorEval和PHEME数据集上本文模型与基线模型的对比 ( %)
Tab. 3 Comparison of proposed model and baseline models on RumorEval and PHEME datasets
模型 | RumorEval | PHEME | ||
---|---|---|---|---|
Acc | Macro‑F1 | Acc | Macro‑F1 | |
BranchLSTM | 44.1 | 45.3 | 32.9 | 30.8 |
MTL‑2 | 51.9 | 52.4 | 35.1 | 31.2 |
MT‑ES | 55.6 | 54.5 | 35.3 | 31.8 |
RV‑ML | 59.3 | 59.6 | 38.5 | 35.7 |
JSP‑MRVM (O) | 70.0 | 70.8 | 39.5 | 36.1 |
数据集 | 事件 | Acc | Macro‑F1 | 子类Acc | ||
---|---|---|---|---|---|---|
真 | 假 | 未验证 | ||||
PHEME | Charliehebdo | 46.0 | 39.0 | 80 | 24 | 18 |
Germanwings‑crash | 34.7 | 33.1 | 35 | 34 | 32 | |
Ferguson | 49.0 | 23.8 | 40 | 0 | 50 | |
Ottawashooting | 61.0 | 29.0 | 86 | 1 | 5 | |
Sydneysiege | 55.0 | 36.0 | 68 | 21 | 15 | |
RumorEval | 70.0 | 70.8 | 87 | 57 | 75 |
表 4 本文模型在RumorEval和PHEME数据集上的子类准确度 (%)
Tab. 4 Subclass accuracy of proposed model on RumorEval and PHEME datasets
数据集 | 事件 | Acc | Macro‑F1 | 子类Acc | ||
---|---|---|---|---|---|---|
真 | 假 | 未验证 | ||||
PHEME | Charliehebdo | 46.0 | 39.0 | 80 | 24 | 18 |
Germanwings‑crash | 34.7 | 33.1 | 35 | 34 | 32 | |
Ferguson | 49.0 | 23.8 | 40 | 0 | 50 | |
Ottawashooting | 61.0 | 29.0 | 86 | 1 | 5 | |
Sydneysiege | 55.0 | 36.0 | 68 | 21 | 15 | |
RumorEval | 70.0 | 70.8 | 87 | 57 | 75 |
模型 | Acc | Macro‑F1 |
---|---|---|
JSP‑MRVM‑G(UO) | 46.6 | 33.3 |
JSP‑MRVM‑G(O) | 46.6 | 33.3 |
JSP‑MRVM‑C(UO) | 53.3 | 56.8 |
JSP‑MRVM‑C(O) | 63.3 | 65.8 |
JSP‑MRVM‑F(UO) | 60.0 | 58.0 |
JSP‑MRVM‑F(O) | 66.0 | 67.0 |
JSP‑MRVM(UO) | 66.0 | 64.0 |
JSP‑MRVM(O) | 70.0 | 70.8 |
表 5 RumorEval数据集上的消融实验结果 ( %)
Tab. 5 Ablation experimental results on RumorEval dataset
模型 | Acc | Macro‑F1 |
---|---|---|
JSP‑MRVM‑G(UO) | 46.6 | 33.3 |
JSP‑MRVM‑G(O) | 46.6 | 33.3 |
JSP‑MRVM‑C(UO) | 53.3 | 56.8 |
JSP‑MRVM‑C(O) | 63.3 | 65.8 |
JSP‑MRVM‑F(UO) | 60.0 | 58.0 |
JSP‑MRVM‑F(O) | 66.0 | 67.0 |
JSP‑MRVM(UO) | 66.0 | 64.0 |
JSP‑MRVM(O) | 70.0 | 70.8 |
1 | DIXON S. Twitter: number of monthly active users 2010 — 2019[EB/OL]. (2022-07-27) [2022-08-10].. |
2 | THOMALA L L. Number of Sina Weibo users in China 2017‑2021[EB/OL]. (2019-11-08) [2021-08-13].. |
3 | ZUBIAGA A, AKER A, BONTCHEVA K, et al. Detection and resolution of rumours in social media: a survey[J]. ACM Computing Surveys, 2019, 51(2): No.32. 10.1145/3161603 |
4 | CAPLOW T. Rumors in war[J]. Social Forces, 1947, 25(3): 298‑302. 10.2307/3005668 |
5 | ZUBIAGA A, KOCHKINA E, LIAKATA M, et al. Stance classification in rumours as a sequential task exploiting the tree structure of social media conversations[C]// Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers. [S.l.]: The COLING 2016 Organizing Committee, 2016: 2438-2448. |
6 | ENAYET O, EL‑BELTAGY S R. NileTMRG at SemEval‑2017 task 8: determining rumour and veracity support for rumours on Twitter[C]// Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg, PA: Association for Computational Linguistics, 2017: 470-474. 10.18653/v1/s17-2082 |
7 | LUKASIK M, SRIJITH P K, VU D, et al. Hawkes processes for continuous time sequence classification: an application to rumour stance classification in Twitter[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2016: 393-398. 10.18653/v1/p16-2064 |
8 | 魏武挥. 谣言的传播与辟谣[J]. 新闻记者, 2012(5): 28-31. |
WEI W H. The spread and refutation of rumors[J]. Journalist Review, 2012(5): 28-31. | |
9 | ZUBIAGA A, LIAKATA M, PROCTER R, et al. Analysing how people orient to and spread rumours in social media by looking at conversational threads[J]. PLoS ONE, 2016, 11(3): No.e0150989. 10.1371/journal.pone.0150989 |
10 | MA J, GAO W, WONG K F. Detect rumor and stance jointly by neural multi‑task learning[C]// Proceedings of the 2018 Web Conference. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2018: 585-593. 10.1145/3184558.3188729 |
11 | 李峤,刘宇. 基于机器学习的推特谣言立场分析研究[J]. 电子设计工程, 2019, 27(21): 36-39, 44. 10.3969/j.issn.1674-6236.2019.21.009 |
LI Q, LIU Y. Research on Twitter rumor standpoint analysis based on machine learning[J]. Electronic Design Engineering, 2019, 27(21): 36-39, 44. 10.3969/j.issn.1674-6236.2019.21.009 | |
12 | KOCHKINA E, LIAKATA M, ZUBIAGA A. All‑in‑one: multi‑ task learning for rumour verification[C]// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018: 3402-3413. |
13 | MENDOZA M, POBLETE B, CASTILLO C. Twitter under crisis: Can we trust what we RT?[C]// Proceedings of the 1st Workshop on Social Media Analytics. New York: ACM: 2010: 71-79. 10.1145/1964858.1964869 |
14 | CHUANG J H, HSIEH S. Stance classification on PTT comments[C/OL]// Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters. [2021-08-11].. |
15 | RANADE S, SANGAL R, MAMIDI R. Stance classification in online debates by recognizing users’ intentions[C]// Proceedings of the 4th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Stroudsburg, PA: Association for Computational Linguistics, 2013: 61-69. 10.3115/v1/w14-43 |
16 | ZENG L, STARBIRD K, SPIRO E S. #Unconfirmed: classifying rumor stance in crisis‑related social media messages[C]// Proceedings of the 10th International AAAI Conference on Web and Social Media. Palo Alto, CA: AAAI Press, 2016: 747-750. 10.1109/hicss.2016.248 |
17 | KOCHKINA E, LIAKATA M, AUGENSTEIN I. Turing at SemEval‑2017 Task 8: sequential approach to rumour stance classification with branch‑LSTM[C]// Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg, PA: Association for Computational Linguistics, 2017: 475-480. 10.18653/v1/s17-2083 |
18 | CHEN Y C, LIU Z Y, KAO H Y. IKM at SemEval‑2017 Task 8: convolutional neural networks for stance detection and rumor verification[C]// Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg, PA: Association for Computational Linguistics, 2017: 465-469. 10.18653/v1/s17-2081 |
19 | ZHAO Z, RESNICK P, MEI Q Z. Enquiring minds: early detection of rumors in social media from enquiry posts[C]// Proceedings of the 24th International Conference on World Wide Web. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2015: 1395-1405. 10.1145/2736277.2741637 |
20 | MA J, GAO W, MITRA P, et al. Detecting rumors from microblogs with recurrent neural networks[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2016: 3818-3824. |
21 | MA J, GAO W, WONG K F. Detect rumors in microblog posts using propagation structure via kernel learning[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 708-717. 10.18653/v1/p17-1066 |
22 | LI Q Z, ZHANG Q, SI L. Rumor detection by exploiting user credibility information, attention and multi‑task learning[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 1173-1179. 10.18653/v1/p19-1113 |
23 | CARUANA R. Multitask learning[J]. Machine Learning, 1997, 28(1): 41-75. 10.1023/a:1007379606734 |
24 | LV Q, WANG Y F, ZHANG B, et al. RV‑ML: an effective rumor verification scheme based on multi‑task learning model[J]. IEEE Communications Letters, 2020, 24(11): 2527-2531. 10.1109/lcomm.2020.3011714 |
25 | FAJCIK M, SMRZ P, BURGET L. BUT‑FIT at SemEval‑2019 Task 7: determining the rumour stance with pre‑trained deep bidirectional transformers[C]// Proceedings of the 13th International Workshop on Semantic Evaluation. Stroudsburg, PA: Association for Computational Linguistics, 2019: 1097-1104. 10.18653/v1/s19-2192 |
26 | KHANDELWAL A. Fine‑tune Longformer for jointly predicting rumor stance and veracity[C] // Proceedings of the 8th ACM IKDD CODS and 26th COMAD. New York: ACM, 2021: 10-19. 10.1145/3430984.3431007 |
27 | KUMAR S, CARLEY K M. Tree LSTMs with convolution units to predict stance and rumor veracity in social media conversations[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 5047-5058. 10.18653/v1/p19-1498 |
28 | CIPOLLA R, GAL Y, KENDALL A. Multi‑task learning using uncertainty to weigh losses for scene geometry and semantics[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7482-7491. 10.1109/cvpr.2018.00781 |
29 | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07) [2021-08-13].. 10.3126/jiee.v3i1.34327 |
[1] | 侯旭东, 滕飞, 张艺. 基于深度自编码的医疗命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2686-2692. |
[2] | 左亚尧, 陈皓宇, 陈致然, 洪嘉伟, 陈坤. 融合多语义特征的命名实体识别方法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2001-2008. |
[3] | 刘博, 卿粼波, 王正勇, 刘美, 姜雪. 基于分块注意力机制和交互位置关系的群组活动识别[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2052-2057. |
[4] | 文敏, 王荣存, 姜淑娟. 基于关系图卷积网络的源代码漏洞检测[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1814-1821. |
[5] | 陈颖, 于炯, 陈嘉颖, 杜旭升. 基于交叉层级数据共享的多任务模型[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1447-1454. |
[6] | 李卓然, 冶忠林, 赵海兴, 林晶晶. 基于混合特征建模的图卷积网络方法[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3354-3363. |
[7] | 刘长红, 曾胜, 张斌, 陈勇. 基于语义关系图的跨模态张量融合网络的图像文本检索[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3018-3024. |
[8] | 谢斌红, 李书宁, 张英俊. 基于层次结构感知的细粒度实体分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3003-3010. |
[9] | 阮启铭, 过弋, 郑楠, 王业相. 基于层级多任务BERT的海关报关商品分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 71-77. |
[10] | 张继杰, 杨艳, 刘勇. 利用初始残差和解耦操作的自适应深层图卷积[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 9-15. |
[11] | 李扬, 吴安彪, 袁野, 赵琳琳, 王国仁. 基于节点相似度的无监督属性图嵌入模型[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 1-8. |
[12] | 马栋林, 马司周, 王伟杰. 基于图卷积网络和门控循环单元的多站点气温预测模型[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 287-293. |
[13] | 张元钧, 张曦煌. 基于图卷积与长短期记忆网络的动态网络表示学习模型[J]. 计算机应用, 2021, 41(7): 1857-1864. |
[14] | 武国亮, 徐继宁. 基于命名实体识别任务反馈增强的中文突发事件抽取方法[J]. 计算机应用, 2021, 41(7): 1891-1896. |
[15] | 李扬志, 袁家政, 刘宏哲. 基于时空注意力图卷积网络模型的人体骨架动作识别算法[J]. 计算机应用, 2021, 41(7): 1915-1921. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||