Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (9): 2667-2673.DOI: 10.11772/j.issn.1001-9081.2021071330
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Xujian ZHAO(), Chongwei WANG, Junli WANG
Received:
2021-07-26
Revised:
2021-09-14
Accepted:
2021-09-15
Online:
2021-09-18
Published:
2022-09-10
Contact:
Xujian ZHAO
About author:
WANG Chongwei, born in 1995, M. S. candidate. His research interests include information extraction, machine learning.Supported by:
通讯作者:
赵旭剑
作者简介:
王崇伟(1995—),男,四川泸州人,硕士研究生,主要研究方向:信息抽取、机器学习;基金资助:
CLC Number:
Xujian ZHAO, Chongwei WANG, Junli WANG. Key event extraction method from microblog by integrating social influence and temporal distribution[J]. Journal of Computer Applications, 2022, 42(9): 2667-2673.
赵旭剑, 王崇伟, 王俊力. 融合社会影响力和时间分布的微博关键事件抽取方法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2667-2673.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071330
类别 | 细粒度分类 | 示例 |
---|---|---|
显式时间表达 | 完整显式时间表达 | 2020年7月3日 |
模糊显式时间表达 | 7月3日 | |
隐式时间表达 | — | 昨天、上周五 |
Tab. 1 Classification of time expressions
类别 | 细粒度分类 | 示例 |
---|---|---|
显式时间表达 | 完整显式时间表达 | 2020年7月3日 |
模糊显式时间表达 | 7月3日 | |
隐式时间表达 | — | 昨天、上周五 |
隐式时间表达 | 时间映射 | 隐式时间 表达 | 时间映射 |
---|---|---|---|
今日/今天 | 微博发布时间 | 次日/第二天 | 基准时间( |
当日/当天 | 基准时间 | 一月前 | 基准时间( |
昨日/昨天 | 基准时间( | 去年今日 | 基准时间( |
Tab. 2 Implicit time expression mapping
隐式时间表达 | 时间映射 | 隐式时间 表达 | 时间映射 |
---|---|---|---|
今日/今天 | 微博发布时间 | 次日/第二天 | 基准时间( |
当日/当天 | 基准时间 | 一月前 | 基准时间( |
昨日/昨天 | 基准时间( | 去年今日 | 基准时间( |
数据集 | 标题 | 起止日期 | 数量 |
---|---|---|---|
Dataset1 | 疫苗事件 | 2018⁃07⁃01 — 2019⁃09⁃01 | 73 614 |
Dataset2 | 中兴事件 | 2018⁃04⁃16 — 2018⁃07⁃14 | 45 113 |
Tab. 3 Details of datasets
数据集 | 标题 | 起止日期 | 数量 |
---|---|---|---|
Dataset1 | 疫苗事件 | 2018⁃07⁃01 — 2019⁃09⁃01 | 73 614 |
Dataset2 | 中兴事件 | 2018⁃04⁃16 — 2018⁃07⁃14 | 45 113 |
数据集 | 评价标准 | 方法 | ||||
---|---|---|---|---|---|---|
Random | TF-IDF | MWDS | DCCI | 本文方法 | ||
Dataset1 | ROUGE-1 | 0.678 1 | 0.693 1 | 0.652 4 | 0.630 6 | 0.820 6 |
ROUGE-L | 0.218 3 | 0.210 6 | 0.320 8 | 0.345 6 | 0.514 2 | |
Redundancy | 28.152 0 | 20.824 1 | 36.576 9 | 47.952 8 | 23.340 9 | |
Dataset2 | ROUGE-1 | 0.728 9 | 0.817 7 | 0.684 7 | 0.675 1 | 0.830 8 |
ROUGE-L | 0.438 7 | 0.467 0 | 0.443 7 | 0.431 9 | 0.704 1 | |
Redundancy | 31.806 2 | 39.201 4 | 41.484 7 | 44.356 1 | 27.807 7 |
Tab. 4 Comparison of event extraction performance
数据集 | 评价标准 | 方法 | ||||
---|---|---|---|---|---|---|
Random | TF-IDF | MWDS | DCCI | 本文方法 | ||
Dataset1 | ROUGE-1 | 0.678 1 | 0.693 1 | 0.652 4 | 0.630 6 | 0.820 6 |
ROUGE-L | 0.218 3 | 0.210 6 | 0.320 8 | 0.345 6 | 0.514 2 | |
Redundancy | 28.152 0 | 20.824 1 | 36.576 9 | 47.952 8 | 23.340 9 | |
Dataset2 | ROUGE-1 | 0.728 9 | 0.817 7 | 0.684 7 | 0.675 1 | 0.830 8 |
ROUGE-L | 0.438 7 | 0.467 0 | 0.443 7 | 0.431 9 | 0.704 1 | |
Redundancy | 31.806 2 | 39.201 4 | 41.484 7 | 44.356 1 | 27.807 7 |
1 | 中国互联网络信息中心. 第48次中国互联网络发展状况统计报告[R]. 北京: 中国互联网络信息中心, 2021. 10.1007/978-981-33-6930-6_2 |
China Internet Network Information Center. The 48th statistical report on China’s Internet development[R]. Beijing: CNNIC, 2021. 10.1007/978-981-33-6930-6_2 | |
2 | MU L, JIN P Q, ZHAO J, et al. Detecting evolutionary stages of events on social media: a graph-kernel-based approach[J]. Future Generation Computer Systems, 2021, 123: 219-232. 10.1016/j.future.2021.05.006 |
3 | MU L, JIN P Q, ZHENG L Z, et al. Lifecycle-based event detection from microblogs[C]// Companion Proceedings of the 2018 Web Conference. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2018: 283-290. 10.1145/3184558.3186338 |
4 | 赵旭剑,王崇伟. 基于图卷积网络的微博新闻故事线抽取方法[J]. 计算机应用, 2021, 41(11):3139-3144. 10.11772/j.issn.1001-9081.2021030451 |
ZHAO X J, WANG C W. Storyline extraction method from Weibo news based on graph convolutional network[J]. Journal of Computer Applications, 2021, 41(11):3139-3144. 10.11772/j.issn.1001-9081.2021030451 | |
5 | 陈震,王静茹. 基于贝叶斯网络的网络舆情事件分析[J]. 情报科学, 2020, 38(4):51-56, 69. 10.13833/j.issn.1007-7634.2020.04.008 |
CHEN Z, WANG J R. Analysis of network public opinion events based on Bayesian network[J]. Information Science, 2020, 38(4):51-56, 69. 10.13833/j.issn.1007-7634.2020.04.008 | |
6 | 李培,翁伟,林琛. 中文微博故事线生成方法[J]. 中文信息学报, 2016, 30(3):143-151. |
LI P, WENG W, LIN C. Method for generating microblogs storylines[J]. Journal of Chinese Information Processing, 2016, 30(3):143-151. | |
7 | YUAN R F, ZHOU Q F, ZHOU W B. dTexSL: a dynamic disaster textual storyline generating framework[J]. World Wide Web, 2019, 22(5):1913-1933. 10.1007/s11280-018-0640-8 |
8 | 彭敏,傅慧,黄济民,等. 基于核主成分分析与小波变换的高质量微博提取[J]. 计算机工程, 2016, 42(1):180-186. 10.3969/j.issn.1000-3428.2016.01.032 |
PENG M, FU H, HUANG J M, et al. High quality microblog extraction based on kernel principal component analysis and wavelet transformation[J]. Computer Engineering, 2016, 42(1):180-186. 10.3969/j.issn.1000-3428.2016.01.032 | |
9 | 刘国威,成全. 基于网络舆情生命周期的微博热点事件主题演化研究[J]. 情报探索, 2018(4):11-19. |
LIU G W, CHENG Q. Research on the topic evolution of Microblog hot events based on the life cycle of network public opinion[J]. Information Research, 2018(4):11-19. | |
10 | 王东波,叶文豪,吴毅,等. 基于多特征时间抽取模型的食品安全事件演化序列生成研究[J]. 情报学报, 2017, 36(9):930-939. 10.3772/j.issn.1000-0135.2017.09.007 |
WANG D B, YE W H, WU Y, et al. Researches of generating time evolution sequences of food safety events based on multiple time extraction model[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(9):930-939. 10.3772/j.issn.1000-0135.2017.09.007 | |
11 | 欧阳逸,郭斌,何萌,等. 微博事件感知与脉络呈现系统[J]. 浙江大学学报(工学版), 2016, 50(6):1176-1182. 10.3785/j.issn.1008-973X.2016.06.023 |
OUYANG Y, GUO B, HE M, et al. Event sensing and vein presentation leveraging microblogging[J]. Journal of Zhejiang University (Engineering Science), 2016, 50(6):1176-1182. 10.3785/j.issn.1008-973X.2016.06.023 | |
12 | TRAN T A, NIEDERÉE C, KANHABUA N, et al. Balancing novelty and salience: adaptive learning to rank entities for timeline summarization of high-impact events[C]// Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, New York: ACM, 2015: 1201-1210. 10.1145/2806416.2806486 |
13 | 夏立新,陈健瑶,余华娟. 基于事理图谱的多维特征网络舆情事件可视化摘要生成研究[J]. 情报理论与实践, 2020, 43(10):157-164. 10.16353/j.cnki.1000-7490.2020.10.026 |
XIA L X, CHEN J Y, YU H J. Research on the visual summary generation of network public opinion events based on multi-dimensional characteristics of event evolution graph[J]. Information Studies: Theory and Application, 2020, 43(10):157-164. 10.16353/j.cnki.1000-7490.2020.10.026 | |
14 | 任卓明,邵凤,刘建国,等. 基于度与集聚系数的网络节点重要性度量方法研究[J]. 物理学报, 2013, 62(12): No.128901. 10.7498/aps.62.128901 |
REN Z M, SHAO F, LIU J G, et al. Node importance measurement based on the degree and clustering coefficient information[J]. Acta Physica Sinica, 2013, 62(12): No.128901. 10.7498/aps.62.128901 | |
15 | 田世海,董月文,王健. 基于NRL和k-means的舆情事件聚类研究 [J]. 情报科学, 2021, 39(2):129-136. |
TIAN S H, DONG Y W, WANG J. Clustering research on lyrical events based on NRL and K-means[J]. Information Science, 2021, 39(2):129-136. | |
16 | 李进华, 安仲杰. 基于地理坐标的微博事件检测与分析[J]. 现代图书情报技术, 2016, 32(2):90-101. 10.11925/infotech.1003-3513.2016.02.12 |
LI J H, AN Z J. Analyzing geographical coordinates data for micro-blog trending events[J]. New Technology of Library and Information Service, 2016, 32(2):90-101. 10.11925/infotech.1003-3513.2016.02.12 | |
17 | ZHAO X J, JIN P Q, YUE L H. Discovering topic time from Web news[J]. Information Processing and Management, 2015, 51(6):869-890. 10.1016/j.ipm.2015.04.001 |
18 | LIN C Y. ROUGE: a package for automatic evaluation of summaries[C]// Proceedings of the ACL-2004 Workshop: Text Summarization Branches Out. Stroudsburg, PA: Association for Computational Linguistics, 2004: 74-81. 10.3115/1218955.1219032 |
[1] | Dong LIU, Chuan LIN, Lina REN, Ruizhang HUANG. Hierarchical storyline generation method for hot news events [J]. Journal of Computer Applications, 2023, 43(8): 2376-2381. |
[2] | Cheng FANG, Bei LI, Ping HAN, Qiong WU. Fine-grained emotion classification of Chinese microblog based on syntactic dependency graph [J]. Journal of Computer Applications, 2023, 43(4): 1056-1061. |
[3] | Chunming MA, Xiuhong LI, Zhe LI, Huiru WANG, Dan YANG. Survey of event extraction [J]. Journal of Computer Applications, 2022, 42(10): 2975-2989. |
[4] | WU Guoliang, XU Jining. Chinese emergency event extraction method based on named entity recognition task feedback enhancement [J]. Journal of Computer Applications, 2021, 41(7): 1891-1896. |
[5] | Bei BI, Huiyao PAN, Feng CHEN, Jingyan SUI, Yang GAO, Yaojun WANG. Microblog rumor detection model based on heterogeneous graph attention network [J]. Journal of Computer Applications, 2021, 41(12): 3546-3550. |
[6] | LI Yanhong, ZHAO Hongwei, WANG Suge, LI Deyu. Detection of negative emotion burst topic in microblog text stream [J]. Journal of Computer Applications, 2020, 40(12): 3458-3464. |
[7] | SHI Qingwei, LIU Yushi, ZHANG Fengtian. Biterm topic evolution model of microblog [J]. Journal of Computer Applications, 2017, 37(5): 1407-1412. |
[8] | ZHOU Shuangshuang, XU Jin'an, CHEN Yufeng, ZHANG Yujie. New words detection method for microblog text based on integrating of rules and statistics [J]. Journal of Computer Applications, 2017, 37(4): 1044-1050. |
[9] | LIU Zheng, WEI Zhihua, ZHANG Renxian. Rumor detection based on convolutional neural network [J]. Journal of Computer Applications, 2017, 37(11): 3053-3056. |
[10] | TENG Fei, ZHENG Chaomei, LI Wen. Multidimensional topic model for oriented sentiment analysis based on long short-term memory [J]. Journal of Computer Applications, 2016, 36(8): 2252-2256. |
[11] | GAO Mingxia, CHEN Furong. Credibility evaluating method of Chinese microblog based on information fusion [J]. Journal of Computer Applications, 2016, 36(8): 2071-2075. |
[12] | CHEN Xin, WANG Suge, LIAO Jian. Automatic identification of new sentiment word about microblog based on word association [J]. Journal of Computer Applications, 2016, 36(2): 424-427. |
[13] | YE Jingjing, LI Lin, ZHONG Luo. Keyword extraction method for microblog based on hashtag [J]. Journal of Computer Applications, 2016, 36(2): 563-567. |
[14] | ZHAO Yu, SHAO Bilin, BIAN Genqing, SONG Dan. Prediction of retweeting behavior for imbalanced dataset in microblogs [J]. Journal of Computer Applications, 2015, 35(7): 1959-1964. |
[15] | JIA Chongchong, WANG Mingyang, CHE Xin. Evaluation of microblog users' influence based on Hrank [J]. Journal of Computer Applications, 2015, 35(4): 1017-1020. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||