Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (8): 2376-2381.DOI: 10.11772/j.issn.1001-9081.2022091377
• The 19th International Conference on Web Information Systems and Applications (WISA 2022) • Previous Articles Next Articles
					
						                                                                                                                                                                                                                                                    Dong LIU1,2, Chuan LIN1,2( ), Lina REN1,2, Ruizhang HUANG1,2
), Lina REN1,2, Ruizhang HUANG1,2
												  
						
						
						
					
				
Received:2022-09-06
															
							
																	Revised:2022-10-26
															
							
																	Accepted:2022-10-28
															
							
							
																	Online:2022-12-12
															
							
																	Published:2023-08-10
															
							
						Contact:
								Chuan LIN   
													About author:LIU Dong, born in 1997, M. S. candidate. His research interests include natural language processing, text mining.Supported by:
        
                   
            刘东1,2, 林川1,2( ), 任丽娜1,2, 黄瑞章1,2
), 任丽娜1,2, 黄瑞章1,2
                  
        
        
        
        
    
通讯作者:
					林川
							作者简介:刘东(1997—),男,四川成都人,硕士研究生,CCF会员,主要研究方向:自然语言处理、文本挖掘基金资助:CLC Number:
Dong LIU, Chuan LIN, Lina REN, Ruizhang HUANG. Hierarchical storyline generation method for hot news events[J]. Journal of Computer Applications, 2023, 43(8): 2376-2381.
刘东, 林川, 任丽娜, 黄瑞章. 面向热点新闻事件的层次化故事脉络生成方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2376-2381.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022091377
| 数据集 | 相关文档数 | 无关文档数 | 时间 | 
|---|---|---|---|
| Dataset1 | 536 | 1 000 | 2021-04至2021-06 | 
| Dataset2 | 320 | 600 | 2022-02至2022-04 | 
Tab. 1 Dataset details
| 数据集 | 相关文档数 | 无关文档数 | 时间 | 
|---|---|---|---|
| Dataset1 | 536 | 1 000 | 2021-04至2021-06 | 
| Dataset2 | 320 | 600 | 2022-02至2022-04 | 
| 参数类型 | 参数值 | 参数类型 | 参数值 | 
|---|---|---|---|
| 话题关联阈值 | 0.5 | 分支关联阈值 | 0.6 | 
Tab. 2 Algorithm parameter setting
| 参数类型 | 参数值 | 参数类型 | 参数值 | 
|---|---|---|---|
| 话题关联阈值 | 0.5 | 分支关联阈值 | 0.6 | 
| 数据集 | HSGM | 基于singlePass的方法 | 基于k-means的方法 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 准确率 | 召回率 | F值 | 准确率 | 召回率 | F值 | 准确率 | 召回率 | F值 | |
| Dataset1 | 97.06 | 98.15 | 97.51 | 97.96 | 97.10 | 93.30 | 92.53 | 94.14 | 91.64 | 
| Dataset2 | 90.69 | 73.61 | 82.95 | 87.41 | 56.95 | 68.72 | 85.11 | 61.10 | 73.40 | 
Tab. 3 Experimental results on each dataset
| 数据集 | HSGM | 基于singlePass的方法 | 基于k-means的方法 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 准确率 | 召回率 | F值 | 准确率 | 召回率 | F值 | 准确率 | 召回率 | F值 | |
| Dataset1 | 97.06 | 98.15 | 97.51 | 97.96 | 97.10 | 93.30 | 92.53 | 94.14 | 91.64 | 
| Dataset2 | 90.69 | 73.61 | 82.95 | 87.41 | 56.95 | 68.72 | 85.11 | 61.10 | 73.40 | 
| 指标 | 数据集 | HSGM | Story Forest | Story Graph | 
|---|---|---|---|---|
| 准确性 | Dataset1 | 3.94 | 3.94 | 3.83 | 
| Dataset2 | 3.83 | 3.77 | 3.72 | |
| 可理解性 | Dataset1 | 4.33 | 4.17 | 4.06 | 
| Dataset2 | 4.28 | 4.06 | 4.00 | |
| 完整性 | Dataset1 | 4.22 | 4.11 | 3.94 | 
| Dataset2 | 4.17 | 4.06 | 3.94 | 
Tab. 4 User experience-based score
| 指标 | 数据集 | HSGM | Story Forest | Story Graph | 
|---|---|---|---|---|
| 准确性 | Dataset1 | 3.94 | 3.94 | 3.83 | 
| Dataset2 | 3.83 | 3.77 | 3.72 | |
| 可理解性 | Dataset1 | 4.33 | 4.17 | 4.06 | 
| Dataset2 | 4.28 | 4.06 | 4.00 | |
| 完整性 | Dataset1 | 4.22 | 4.11 | 3.94 | 
| Dataset2 | 4.17 | 4.06 | 3.94 | 
| 1 | 张仰森,段宇翔,黄改娟,等. 社交媒体话题检测与追踪技术研究综述[J]. 中文信息学报, 2019, 33(7): 1-10. 10.3969/j.issn.1003-0077.2019.07.001 | 
| ZHANG Y S, DUAN Y X, HUANG G J, et al. A survey on topic detection and tracking methods in social media[J]. Journal of Chinese Information Processing, 2019, 33(7): 1-10. 10.3969/j.issn.1003-0077.2019.07.001 | |
| 2 | XU G X, MENG Y T, CHEN Z, et al. Research on topic detection and tracking for online news texts[J]. IEEE Access, 2019, 7: 58407-58418. 10.1109/access.2019.2914097 | 
| 3 | 张晨昕,饶元,樊笑冰,等. 基于社交媒体的事件脉络挖掘研究进展[J]. 中文信息学报, 2019, 33(11): 15-30. 10.3969/j.issn.1003-0077.2019.11.002 | 
| ZHANG C X, RAO Y, FAN X B, et al. Research progress of event summarization based on social media[J]. Journal of Chinese Information Processing, 2019, 33(11): 15-30. 10.3969/j.issn.1003-0077.2019.11.002 | |
| 4 | 赵旭剑,王崇伟,金培权,等. 面向Web的故事脉络挖掘研究综述[J]. 中文信息学报, 2021, 35(11): 13-33. 10.3969/j.issn.1003-0077.2021.11.002 | 
| ZHAO X J, WANG C W, JIN P Q, et al. A survey of Web-oriented storyline mining[J]. Journal of Chinese Information Processing, 2021, 35(11): 13-33. 10.3969/j.issn.1003-0077.2021.11.002 | |
| 5 | 付佳兵,董守斌. 一种基于词覆盖的新闻事件脉络链构建方法[J]. 北京大学学报(自然科学版), 2016, 52(1): 104-112. | 
| FU J B, DONG S B. Constructing a news story chain from word coverage perspective[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(1): 104-112. | |
| 6 | LIU B, NIU D, LAI K F, et al. Growing story forest online from massive breaking news[C]// Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York: ACM, 2017: 777-785. 10.1145/3132847.3132852 | 
| 7 | LIU B, HAN F X, NIU D, et al. Story forest: extracting events and telling stories from breaking news[J]. ACM Transactions on Knowledge Discovery from Data, 2020, 14(3): No.31. 10.1145/3377939 | 
| 8 | ANSAH J, LIU L, KANG W, et al. A graph is worth a thousand words: telling event stories using timeline summarization graphs[C]// Proceedings of the 2019 World Wide Web Conference. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2019: 2565-2571. 10.1145/3308558.3313396 | 
| 9 | GOYAL P, KAUSHIK P, GUPTA P, et al. Multilevel event detection, storyline generation, and summarization for tweet streams[J]. IEEE Transactions on Computational Social Systems, 2020, 7(1): 8-23. 10.1109/tcss.2019.2954116 | 
| 10 | ZHOU D Y, GUO L S, HE Y L. Neural storyline extraction model for storyline generation from news articles[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: ACL, 2018: 1727-1736. 10.18653/v1/n18-1156 | 
| 11 | 佘玉轩,熊赟. 基于贝叶斯网络的故事线挖掘算法[J]. 计算机工程, 2018, 44(3): 55-59. 10.3969/j.issn.1000-3428.2018.03.009 | 
| SHE Y X, XIONG Y. Storyline mining algorithm based on Bayesian network[J]. Computer Engineering, 2018: 44(3): 55-59. 10.3969/j.issn.1000-3428.2018.03.009 | |
| 12 | GUO L S, ZHOU D Y, HE Y L, et al. Storyline extraction from news articles with dynamic dependency[J]. Intelligent Data Analysis, 2020, 24(1): 183-197. 10.3233/ida-184448 | 
| 13 | 赵天资,段亮,岳昆,等. 基于Biterm主题模型的新闻线索生成方法[J]. 数据分析与知识发现, 2021, 5(2): 1-13. | 
| ZHAO T Z, DUAN L, YUE K, et al. Generating news clues with Biterm topic model[J]. Data Analysis and Knowledge Discovery, 2021, 5(2): 1-13. | |
| 14 | 李莹莹,马帅,蒋浩谊,等. 一种基于社交事件关联的故事脉络生成方法[J]. 计算机研究与发展, 2018, 55(9): 1972-1986. 10.7544/issn1000-1239.2018.20180155 | 
| LI Y Y, MA S, JIANG H Y, et al. An approach for storytelling by correlating events from social networks[J]. Journal of Computer Research and Development, 2018, 55(9): 1972-1986. 10.7544/issn1000-1239.2018.20180155 | |
| 15 | DEHGHANI N, ASADPOUR M. SGSG: semantic graph-based storyline generation in twitter[J]. Journal of Information Science, 2019, 45(3): 304-321. 10.1177/0165551518775304 | 
| 16 | SUN W J, WANG Y H, GAO Y Q, et al. Comprehensive event storyline generation from microblogs[C]// Proceedings of the 2019 ACM Multimedia Asia. New York: ACM, 2019: No.48. 10.1145/3338533.3366601 | 
| 17 | 樊笑冰,饶元,王硕,等. 基于命名实体敏感的分层新闻故事线生成方法[J]. 中文信息学报, 2021, 35(1): 113-124. | 
| FAN X B, RAO Y, WANG S, et al. Named entity sensitive generation of hierarchical news storyline[J]. Journal of Chinese Information Processing, 2021, 35(1): 113-124. | |
| 18 | 陈黎明,黄瑞章,秦永彬,等. 面向新闻事件的故事树构建方法[J]. 计算机工程与设计, 2020, 41(7): 1910-1919. | 
| CHEN L M, HUANG R Z, QIN Y B, et al. Story tree construction approach for news events[J]. Computer Engineering and Design, 2020, 41(7): 1910-1919. | |
| 19 | 丁梦佩. 社交网络中的话题漂移研究[D]. 北京:北京邮电大学, 2019:1-68. | 
| DING M P. Research on topic drift in social networks[D]. Beijing: Beijing University of Posts and Telecommunications, 2019:1-68. | |
| 20 | 王小林,杨林,王东,等. 改进的TF-IDF 关键词提取方法[J]. 计算机科学与应用, 2013, 3(1): 64-68. | 
| WANG X L, YANG L, WANG D, et al. Improved TF-IDF keyword extraction algorithm[J]. Computer Science and Application, 2013, 3(1): 64-68. | |
| 21 | 贵州耕云科技有限公司. 一种基于标题高频切分的新闻热点短语提取方法: 201710743020.1[P]. 2018-01-09. 10.26789/jwc.2020.01.002 | 
| Guizhou Cloud Pioneer Technology Co. Ltd. A news hot word phrase extraction method based on title high frequency segmentation: 201710743020.1[P]. 2018-01-09. 10.26789/jwc.2020.01.002 | |
| 22 | LI S, ZHAO Z, HU R F, et al. Analogical reasoning on Chinese morphological and semantic relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: ACL, 2018: 138-143. 10.18653/v1/p18-2023 | 
| 23 | ROUSSEEUW P J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis[J]. Journal of Computational and Applied Mathematics, 1987, 20: 53-65. 10.1016/0377-0427(87)90125-7 | 
| 24 | PAPKA R, ALLAN J. On-line new event detection using single pass clustering[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1998: 37-45. 10.1145/290941.290954 | 
| 25 | MacQUEEN J. Some methods for classification and analysis of multivariate observations[C]// Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. Oakland, CA: University of California Press, 1967: 281-297. | 
| [1] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. | 
| [2] | Qing WANG, Jieyu ZHAO, Xulun YE, Nongxiao WANG. Enhanced deep subspace clustering method with unified framework [J]. Journal of Computer Applications, 2024, 44(7): 1995-2003. | 
| [3] | Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN. Survey of incomplete multi-view clustering [J]. Journal of Computer Applications, 2024, 44(6): 1673-1682. | 
| [4] | Xiaoxia JIANG, Ruizhang HUANG, Ruina BAI, Lina REN, Yanping CHEN. Deep event clustering method based on event representation and contrastive learning [J]. Journal of Computer Applications, 2024, 44(6): 1734-1742. | 
| [5] | Tianyu HUANG, Yuanxing LI, Hao CHEN, Zijia GUO, Mingjun WEI. User cluster partitioning method based on weighted fuzzy clustering in ground-air collaboration scenarios [J]. Journal of Computer Applications, 2024, 44(5): 1555-1561. | 
| [6] | Lin GAO, Yu ZHOU, Tak Wu KWONG. Evolutionary bi-level adaptive local feature selection [J]. Journal of Computer Applications, 2024, 44(5): 1408-1414. | 
| [7] | Tongtong XU, Bin XIE, Chunhao ZHANG, Ximei ZHANG. Multi-order nearest neighbor graph clustering algorithm by fusing transition probability matrix [J]. Journal of Computer Applications, 2024, 44(5): 1527-1538. | 
| [8] | Yu DING, Hanlin ZHANG, Rong LUO, Hua MENG. Fuzzy clustering algorithm based on belief subcluster cutting [J]. Journal of Computer Applications, 2024, 44(4): 1128-1138. | 
| [9] | Lin SUN, Menghan LIU. K-means clustering based on adaptive cuckoo optimization feature selection [J]. Journal of Computer Applications, 2024, 44(3): 831-841. | 
| [10] | Long CHEN, Xuanlin YU, Wen CHEN, Yi YAO, Wenjing ZHU, Ying JIA, Denghong LI, Zhi REN. Efficient clustered routing protocol for intelligent road cone ad-hoc networks based on non-random clustering [J]. Journal of Computer Applications, 2024, 44(3): 869-875. | 
| [11] | Zhuo ZHANG, Huazhu CHEN. Deep subspace clustering based on multiscale self-representation learning with consistency and diversity [J]. Journal of Computer Applications, 2024, 44(2): 353-359. | 
| [12] | Chenghao YANG, Jie HU, Hongjun WANG, Bo PENG. Incomplete multi-view clustering algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(12): 3784-3789. | 
| [13] | Yunhua ZHU, Bing KONG, Lihua ZHOU, Hongmei CHEN, Chongming BAO. Multi-view clustering network guided by graph contrastive learning [J]. Journal of Computer Applications, 2024, 44(10): 3267-3274. | 
| [14] | Chunyong YIN, Yongcheng ZHOU. Automatically adjusted clustered federated learning for double-ended clustering [J]. Journal of Computer Applications, 2024, 44(10): 3011-3020. | 
| [15] | Xueran XU, Geng YANG, Yuxian HUANG. Differential privacy clustering algorithm in horizontal federated learning [J]. Journal of Computer Applications, 2024, 44(1): 217-222. | 
| Viewed | ||||||
| Full text |  | |||||
| Abstract |  | |||||