Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (8): 2376-2381.DOI: 10.11772/j.issn.1001-9081.2022091377
• The 19th International Conference on Web Information Systems and Applications (WISA 2022) • Previous Articles Next Articles
Dong LIU1,2, Chuan LIN1,2(), Lina REN1,2, Ruizhang HUANG1,2
Received:
2022-09-06
Revised:
2022-10-26
Accepted:
2022-10-28
Online:
2022-12-12
Published:
2023-08-10
Contact:
Chuan LIN
About author:
LIU Dong, born in 1997, M. S. candidate. His research interests include natural language processing, text mining.Supported by:
刘东1,2, 林川1,2(), 任丽娜1,2, 黄瑞章1,2
通讯作者:
林川
作者简介:
刘东(1997—),男,四川成都人,硕士研究生,CCF会员,主要研究方向:自然语言处理、文本挖掘基金资助:
CLC Number:
Dong LIU, Chuan LIN, Lina REN, Ruizhang HUANG. Hierarchical storyline generation method for hot news events[J]. Journal of Computer Applications, 2023, 43(8): 2376-2381.
刘东, 林川, 任丽娜, 黄瑞章. 面向热点新闻事件的层次化故事脉络生成方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2376-2381.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022091377
数据集 | 相关文档数 | 无关文档数 | 时间 |
---|---|---|---|
Dataset1 | 536 | 1 000 | 2021-04至2021-06 |
Dataset2 | 320 | 600 | 2022-02至2022-04 |
Tab. 1 Dataset details
数据集 | 相关文档数 | 无关文档数 | 时间 |
---|---|---|---|
Dataset1 | 536 | 1 000 | 2021-04至2021-06 |
Dataset2 | 320 | 600 | 2022-02至2022-04 |
参数类型 | 参数值 | 参数类型 | 参数值 |
---|---|---|---|
话题关联阈值 | 0.5 | 分支关联阈值 | 0.6 |
Tab. 2 Algorithm parameter setting
参数类型 | 参数值 | 参数类型 | 参数值 |
---|---|---|---|
话题关联阈值 | 0.5 | 分支关联阈值 | 0.6 |
数据集 | HSGM | 基于singlePass的方法 | 基于k-means的方法 | ||||||
---|---|---|---|---|---|---|---|---|---|
准确率 | 召回率 | F值 | 准确率 | 召回率 | F值 | 准确率 | 召回率 | F值 | |
Dataset1 | 97.06 | 98.15 | 97.51 | 97.96 | 97.10 | 93.30 | 92.53 | 94.14 | 91.64 |
Dataset2 | 90.69 | 73.61 | 82.95 | 87.41 | 56.95 | 68.72 | 85.11 | 61.10 | 73.40 |
Tab. 3 Experimental results on each dataset
数据集 | HSGM | 基于singlePass的方法 | 基于k-means的方法 | ||||||
---|---|---|---|---|---|---|---|---|---|
准确率 | 召回率 | F值 | 准确率 | 召回率 | F值 | 准确率 | 召回率 | F值 | |
Dataset1 | 97.06 | 98.15 | 97.51 | 97.96 | 97.10 | 93.30 | 92.53 | 94.14 | 91.64 |
Dataset2 | 90.69 | 73.61 | 82.95 | 87.41 | 56.95 | 68.72 | 85.11 | 61.10 | 73.40 |
指标 | 数据集 | HSGM | Story Forest | Story Graph |
---|---|---|---|---|
准确性 | Dataset1 | 3.94 | 3.94 | 3.83 |
Dataset2 | 3.83 | 3.77 | 3.72 | |
可理解性 | Dataset1 | 4.33 | 4.17 | 4.06 |
Dataset2 | 4.28 | 4.06 | 4.00 | |
完整性 | Dataset1 | 4.22 | 4.11 | 3.94 |
Dataset2 | 4.17 | 4.06 | 3.94 |
Tab. 4 User experience-based score
指标 | 数据集 | HSGM | Story Forest | Story Graph |
---|---|---|---|---|
准确性 | Dataset1 | 3.94 | 3.94 | 3.83 |
Dataset2 | 3.83 | 3.77 | 3.72 | |
可理解性 | Dataset1 | 4.33 | 4.17 | 4.06 |
Dataset2 | 4.28 | 4.06 | 4.00 | |
完整性 | Dataset1 | 4.22 | 4.11 | 3.94 |
Dataset2 | 4.17 | 4.06 | 3.94 |
1 | 张仰森,段宇翔,黄改娟,等. 社交媒体话题检测与追踪技术研究综述[J]. 中文信息学报, 2019, 33(7): 1-10. 10.3969/j.issn.1003-0077.2019.07.001 |
ZHANG Y S, DUAN Y X, HUANG G J, et al. A survey on topic detection and tracking methods in social media[J]. Journal of Chinese Information Processing, 2019, 33(7): 1-10. 10.3969/j.issn.1003-0077.2019.07.001 | |
2 | XU G X, MENG Y T, CHEN Z, et al. Research on topic detection and tracking for online news texts[J]. IEEE Access, 2019, 7: 58407-58418. 10.1109/access.2019.2914097 |
3 | 张晨昕,饶元,樊笑冰,等. 基于社交媒体的事件脉络挖掘研究进展[J]. 中文信息学报, 2019, 33(11): 15-30. 10.3969/j.issn.1003-0077.2019.11.002 |
ZHANG C X, RAO Y, FAN X B, et al. Research progress of event summarization based on social media[J]. Journal of Chinese Information Processing, 2019, 33(11): 15-30. 10.3969/j.issn.1003-0077.2019.11.002 | |
4 | 赵旭剑,王崇伟,金培权,等. 面向Web的故事脉络挖掘研究综述[J]. 中文信息学报, 2021, 35(11): 13-33. 10.3969/j.issn.1003-0077.2021.11.002 |
ZHAO X J, WANG C W, JIN P Q, et al. A survey of Web-oriented storyline mining[J]. Journal of Chinese Information Processing, 2021, 35(11): 13-33. 10.3969/j.issn.1003-0077.2021.11.002 | |
5 | 付佳兵,董守斌. 一种基于词覆盖的新闻事件脉络链构建方法[J]. 北京大学学报(自然科学版), 2016, 52(1): 104-112. |
FU J B, DONG S B. Constructing a news story chain from word coverage perspective[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(1): 104-112. | |
6 | LIU B, NIU D, LAI K F, et al. Growing story forest online from massive breaking news[C]// Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York: ACM, 2017: 777-785. 10.1145/3132847.3132852 |
7 | LIU B, HAN F X, NIU D, et al. Story forest: extracting events and telling stories from breaking news[J]. ACM Transactions on Knowledge Discovery from Data, 2020, 14(3): No.31. 10.1145/3377939 |
8 | ANSAH J, LIU L, KANG W, et al. A graph is worth a thousand words: telling event stories using timeline summarization graphs[C]// Proceedings of the 2019 World Wide Web Conference. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2019: 2565-2571. 10.1145/3308558.3313396 |
9 | GOYAL P, KAUSHIK P, GUPTA P, et al. Multilevel event detection, storyline generation, and summarization for tweet streams[J]. IEEE Transactions on Computational Social Systems, 2020, 7(1): 8-23. 10.1109/tcss.2019.2954116 |
10 | ZHOU D Y, GUO L S, HE Y L. Neural storyline extraction model for storyline generation from news articles[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: ACL, 2018: 1727-1736. 10.18653/v1/n18-1156 |
11 | 佘玉轩,熊赟. 基于贝叶斯网络的故事线挖掘算法[J]. 计算机工程, 2018, 44(3): 55-59. 10.3969/j.issn.1000-3428.2018.03.009 |
SHE Y X, XIONG Y. Storyline mining algorithm based on Bayesian network[J]. Computer Engineering, 2018: 44(3): 55-59. 10.3969/j.issn.1000-3428.2018.03.009 | |
12 | GUO L S, ZHOU D Y, HE Y L, et al. Storyline extraction from news articles with dynamic dependency[J]. Intelligent Data Analysis, 2020, 24(1): 183-197. 10.3233/ida-184448 |
13 | 赵天资,段亮,岳昆,等. 基于Biterm主题模型的新闻线索生成方法[J]. 数据分析与知识发现, 2021, 5(2): 1-13. |
ZHAO T Z, DUAN L, YUE K, et al. Generating news clues with Biterm topic model[J]. Data Analysis and Knowledge Discovery, 2021, 5(2): 1-13. | |
14 | 李莹莹,马帅,蒋浩谊,等. 一种基于社交事件关联的故事脉络生成方法[J]. 计算机研究与发展, 2018, 55(9): 1972-1986. 10.7544/issn1000-1239.2018.20180155 |
LI Y Y, MA S, JIANG H Y, et al. An approach for storytelling by correlating events from social networks[J]. Journal of Computer Research and Development, 2018, 55(9): 1972-1986. 10.7544/issn1000-1239.2018.20180155 | |
15 | DEHGHANI N, ASADPOUR M. SGSG: semantic graph-based storyline generation in twitter[J]. Journal of Information Science, 2019, 45(3): 304-321. 10.1177/0165551518775304 |
16 | SUN W J, WANG Y H, GAO Y Q, et al. Comprehensive event storyline generation from microblogs[C]// Proceedings of the 2019 ACM Multimedia Asia. New York: ACM, 2019: No.48. 10.1145/3338533.3366601 |
17 | 樊笑冰,饶元,王硕,等. 基于命名实体敏感的分层新闻故事线生成方法[J]. 中文信息学报, 2021, 35(1): 113-124. |
FAN X B, RAO Y, WANG S, et al. Named entity sensitive generation of hierarchical news storyline[J]. Journal of Chinese Information Processing, 2021, 35(1): 113-124. | |
18 | 陈黎明,黄瑞章,秦永彬,等. 面向新闻事件的故事树构建方法[J]. 计算机工程与设计, 2020, 41(7): 1910-1919. |
CHEN L M, HUANG R Z, QIN Y B, et al. Story tree construction approach for news events[J]. Computer Engineering and Design, 2020, 41(7): 1910-1919. | |
19 | 丁梦佩. 社交网络中的话题漂移研究[D]. 北京:北京邮电大学, 2019:1-68. |
DING M P. Research on topic drift in social networks[D]. Beijing: Beijing University of Posts and Telecommunications, 2019:1-68. | |
20 | 王小林,杨林,王东,等. 改进的TF-IDF 关键词提取方法[J]. 计算机科学与应用, 2013, 3(1): 64-68. |
WANG X L, YANG L, WANG D, et al. Improved TF-IDF keyword extraction algorithm[J]. Computer Science and Application, 2013, 3(1): 64-68. | |
21 | 贵州耕云科技有限公司. 一种基于标题高频切分的新闻热点短语提取方法: 201710743020.1[P]. 2018-01-09. 10.26789/jwc.2020.01.002 |
Guizhou Cloud Pioneer Technology Co. Ltd. A news hot word phrase extraction method based on title high frequency segmentation: 201710743020.1[P]. 2018-01-09. 10.26789/jwc.2020.01.002 | |
22 | LI S, ZHAO Z, HU R F, et al. Analogical reasoning on Chinese morphological and semantic relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: ACL, 2018: 138-143. 10.18653/v1/p18-2023 |
23 | ROUSSEEUW P J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis[J]. Journal of Computational and Applied Mathematics, 1987, 20: 53-65. 10.1016/0377-0427(87)90125-7 |
24 | PAPKA R, ALLAN J. On-line new event detection using single pass clustering[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1998: 37-45. 10.1145/290941.290954 |
25 | MacQUEEN J. Some methods for classification and analysis of multivariate observations[C]// Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. Oakland, CA: University of California Press, 1967: 281-297. |
[1] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[2] | Qing WANG, Jieyu ZHAO, Xulun YE, Nongxiao WANG. Enhanced deep subspace clustering method with unified framework [J]. Journal of Computer Applications, 2024, 44(7): 1995-2003. |
[3] | Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN. Survey of incomplete multi-view clustering [J]. Journal of Computer Applications, 2024, 44(6): 1673-1682. |
[4] | Xiaoxia JIANG, Ruizhang HUANG, Ruina BAI, Lina REN, Yanping CHEN. Deep event clustering method based on event representation and contrastive learning [J]. Journal of Computer Applications, 2024, 44(6): 1734-1742. |
[5] | Tianyu HUANG, Yuanxing LI, Hao CHEN, Zijia GUO, Mingjun WEI. User cluster partitioning method based on weighted fuzzy clustering in ground-air collaboration scenarios [J]. Journal of Computer Applications, 2024, 44(5): 1555-1561. |
[6] | Lin GAO, Yu ZHOU, Tak Wu KWONG. Evolutionary bi-level adaptive local feature selection [J]. Journal of Computer Applications, 2024, 44(5): 1408-1414. |
[7] | Tongtong XU, Bin XIE, Chunhao ZHANG, Ximei ZHANG. Multi-order nearest neighbor graph clustering algorithm by fusing transition probability matrix [J]. Journal of Computer Applications, 2024, 44(5): 1527-1538. |
[8] | Yu DING, Hanlin ZHANG, Rong LUO, Hua MENG. Fuzzy clustering algorithm based on belief subcluster cutting [J]. Journal of Computer Applications, 2024, 44(4): 1128-1138. |
[9] | Lin SUN, Menghan LIU. K-means clustering based on adaptive cuckoo optimization feature selection [J]. Journal of Computer Applications, 2024, 44(3): 831-841. |
[10] | Long CHEN, Xuanlin YU, Wen CHEN, Yi YAO, Wenjing ZHU, Ying JIA, Denghong LI, Zhi REN. Efficient clustered routing protocol for intelligent road cone ad-hoc networks based on non-random clustering [J]. Journal of Computer Applications, 2024, 44(3): 869-875. |
[11] | Zhuo ZHANG, Huazhu CHEN. Deep subspace clustering based on multiscale self-representation learning with consistency and diversity [J]. Journal of Computer Applications, 2024, 44(2): 353-359. |
[12] | Chenghao YANG, Jie HU, Hongjun WANG, Bo PENG. Incomplete multi-view clustering algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(12): 3784-3789. |
[13] | Yunhua ZHU, Bing KONG, Lihua ZHOU, Hongmei CHEN, Chongming BAO. Multi-view clustering network guided by graph contrastive learning [J]. Journal of Computer Applications, 2024, 44(10): 3267-3274. |
[14] | Chunyong YIN, Yongcheng ZHOU. Automatically adjusted clustered federated learning for double-ended clustering [J]. Journal of Computer Applications, 2024, 44(10): 3011-3020. |
[15] | Xueran XU, Geng YANG, Yuxian HUANG. Differential privacy clustering algorithm in horizontal federated learning [J]. Journal of Computer Applications, 2024, 44(1): 217-222. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||