Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (2): 345-353.DOI: 10.11772/j.issn.1001-9081.2024030281
• Artificial intelligence •
Yalun WANG, Yangsen ZHANG(), Siwen ZHU
Received:
2024-03-18
Revised:
2024-04-30
Accepted:
2024-05-31
Online:
2024-07-22
Published:
2025-02-10
Contact:
Yangsen ZHANG
About author:
WANG Yalun, born in 2000, M. S. candidate. Her research interests include natural language processing.Supported by:
通讯作者:
张仰森
作者简介:
王雅伦(2000—),女,北京人,硕士研究生,CCF会员,主要研究方向:自然语言处理;基金资助:
CLC Number:
Yalun WANG, Yangsen ZHANG, Siwen ZHU. Headline generation model with position embedding for knowledge reasoning[J]. Journal of Computer Applications, 2025, 45(2): 345-353.
王雅伦, 张仰森, 朱思文. 面向知识推理的位置编码标题生成模型[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 345-353.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024030281
参数 | 值 | 参数 | 值 |
---|---|---|---|
批次大小 | 128 | 丢弃率 | 0.15 |
训练轮次 | 15 | 梯度裁剪 | 5 |
学习率 | 0.001 | 优化器 | Adam |
Tab. 1 Experimental parameter setting
参数 | 值 | 参数 | 值 |
---|---|---|---|
批次大小 | 128 | 丢弃率 | 0.15 |
训练轮次 | 15 | 梯度裁剪 | 5 |
学习率 | 0.001 | 优化器 | Adam |
模型 | ROUGE-1/% | ROUGE-2/% | ROUGE-L/% | 参数量/106 |
---|---|---|---|---|
RNN-context | 29.9 | 17.4 | 27.2 | 2.0 |
ASPM | 32.8 | 16.8 | 32.8 | 2.0 |
T5 PEGASUS | 34.1 | 22.2 | 31.7 | 275.0 |
CopyNet | 34.4 | 21.6 | 31.3 | 5.0 |
DQN | 35.7 | 22.6 | 32.8 | 62.0 |
BERTSUM | 37.0 | 17.8 | 32.7 | 110.5 |
Transformer-XL | 37.0 | 19.6 | 34.2 | 41.0 |
CBART | 37.1 | 21.5 | 35.8 | 121.0 |
PGN+2T+IF | 37.4 | 23.8 | 34.2 | 39.0 |
RNN-context- SDLM | 38.8 | 26.2 | 36.1 | 32.0 |
Tran-A-SDLM | 39.0 | 26.9 | 36.6 | 46.0 |
Tab. 2 Comparison of experimental results of different models on LCSTS dataset
模型 | ROUGE-1/% | ROUGE-2/% | ROUGE-L/% | 参数量/106 |
---|---|---|---|---|
RNN-context | 29.9 | 17.4 | 27.2 | 2.0 |
ASPM | 32.8 | 16.8 | 32.8 | 2.0 |
T5 PEGASUS | 34.1 | 22.2 | 31.7 | 275.0 |
CopyNet | 34.4 | 21.6 | 31.3 | 5.0 |
DQN | 35.7 | 22.6 | 32.8 | 62.0 |
BERTSUM | 37.0 | 17.8 | 32.7 | 110.5 |
Transformer-XL | 37.0 | 19.6 | 34.2 | 41.0 |
CBART | 37.1 | 21.5 | 35.8 | 121.0 |
PGN+2T+IF | 37.4 | 23.8 | 34.2 | 39.0 |
RNN-context- SDLM | 38.8 | 26.2 | 36.1 | 32.0 |
Tran-A-SDLM | 39.0 | 26.9 | 36.6 | 46.0 |
模型 | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
Tran⁃A⁃SDLM | 39.0 | 26.9 | 36.6 |
-A | 38.9 | 26.6 | 36.3 |
-Tran | 38.8 | 26.2 | 36.1 |
-SDLM | 38.2 | 25.7 | 35.4 |
Tab. 3 Results of ablation study
模型 | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
Tran⁃A⁃SDLM | 39.0 | 26.9 | 36.6 |
-A | 38.9 | 26.6 | 36.3 |
-Tran | 38.8 | 26.2 | 36.1 |
-SDLM | 38.2 | 25.7 | 35.4 |
序号 | 参考标题 | 对比模型 | 生成标题 |
---|---|---|---|
1 | 男生高考作弊追打监考老师: 你知道我爸是谁? | RNN-context | 高考作弊事件中男生动手伤害女监考官 |
CopyNet | 男考生不满没收作弊手机,踹女监考老师 | ||
BERTSUM | 阜新高考男生作弊被抓后攻击监考老师 | ||
RNN-context-SDLM(-Tran) | 高考生作弊被抓:你知道我爸是谁啊? | ||
Tran-A-SDLM | 高考生作弊被抓踹监考老师:你知道我爸是谁啊? | ||
2 | 教育部原发言人: 现在语文课至少一半不该学 | RNN-context | 教育部发言人:语文教材修订稿 |
CopyNet | 前教育部发言人:语文课至少一半不该学,应增加传统文化的比例 | ||
BERTSUM | 专家:语文课至少一半不该学,应修订 | ||
RNN-context-SDLM(-Tran) | 教育部发言人:语文课至少一半不该学内容 | ||
Tran-A-SDLM | 教育部原发言人:语文课至少一半不该学 |
Tab. 4 Results of different headlines generated by different models for same text
序号 | 参考标题 | 对比模型 | 生成标题 |
---|---|---|---|
1 | 男生高考作弊追打监考老师: 你知道我爸是谁? | RNN-context | 高考作弊事件中男生动手伤害女监考官 |
CopyNet | 男考生不满没收作弊手机,踹女监考老师 | ||
BERTSUM | 阜新高考男生作弊被抓后攻击监考老师 | ||
RNN-context-SDLM(-Tran) | 高考生作弊被抓:你知道我爸是谁啊? | ||
Tran-A-SDLM | 高考生作弊被抓踹监考老师:你知道我爸是谁啊? | ||
2 | 教育部原发言人: 现在语文课至少一半不该学 | RNN-context | 教育部发言人:语文教材修订稿 |
CopyNet | 前教育部发言人:语文课至少一半不该学,应增加传统文化的比例 | ||
BERTSUM | 专家:语文课至少一半不该学,应修订 | ||
RNN-context-SDLM(-Tran) | 教育部发言人:语文课至少一半不该学内容 | ||
Tran-A-SDLM | 教育部原发言人:语文课至少一半不该学 |
1 | 夏吾吉,黄鹤鸣,更藏措毛,等. 基于无监督学习和监督学习的抽取式文本摘要综述[J]. 计算机应用, 2024, 44(4): 1035-1048. |
XIA W J, HUANG H M, GENGZANGCUOMAO, et al. Survey of extractive text summarization based on unsupervised learning and supervised learning[J]. Journal of Computer Applications, 2024, 44(4): 1035-1048. | |
2 | 朱永清,赵鹏,赵菲菲,等. 基于深度学习的生成式文本摘要技术综述[J]. 计算机工程, 2021, 47(11):11-21, 28. |
ZHU Y Q, ZHAO P, ZHAO F F, et al. Survey on abstractive text summarization technologies based on deep learning[J]. Computer Engineering, 2021, 47(11): 11-21, 28. | |
3 | ZHENG C, CAI Y, ZHANG G, et al. Controllable abstractive sentence summarization with guiding entities[C]// Proceedings of the 28th International Conference on Computational Linguistics. [S.l.]: International Committee on Computational Linguistics, 2020: 5668-5678. |
4 | XU P, ZHU X, CLIFTON D A. Multimodal learning with transformers: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 12113-12132. |
5 | DAI Z, YANG Z, YANG Y, et al. Transformer-XL: attentive language models beyond a fixed-length context[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 2978-2988. |
6 | YANG Z, DAI Z, YANG Y, et al. XLNet: generalized autoregressive pretraining for language understanding[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 5753-5763. |
7 | 石磊,阮选敏,魏瑞斌,等. 基于序列到序列模型的生成式文本摘要研究综述[J]. 情报学报, 2019, 38(10):1102-1116. |
SHI L, RUAN X M, WEI R B, et al. Abstractive summarization based on sequence to sequence models: a review[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(10): 1102-1116. | |
8 | RUSH A M, CHOPRA S, WESTON J. A neural attention model for abstractive sentence summarization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2015: 379-389. |
9 | NALLAPATI R, ZHOU B, DOS SANTOS C, et al. Abstractive text summarization using sequence-to-sequence RNNs and beyond[C]// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg: ACL, 2016: 280-290. |
10 | CHOPRA S, AULI M, RUSH A M. Abstractive sentence summarization with attentive recurrent neural networks[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2016: 93-98. |
11 | GU J, LU Z, LI H, et al. Incorporating copying mechanism in sequence-to-sequence learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2016: 1631-1640. |
12 | 毛兴静,魏勇,杨昱睿,等. 基于关键词异构图的生成式摘要研究[J]. 计算机科学, 2024, 51(7):278-286. |
MAO X J, WEI Y, YANG Y R, et al. KHGAS: keywords guided heterogeneous graph for abstractive summarization[J]. Computer Science, 2024, 51(7):278-286. | |
13 | 张志远,肖芮. 融合全局编码与主题解码的文本摘要方法[J]. 计算机应用与软件, 2023, 40(4):134-140, 183. |
ZHANG Z Y, XIAO R. Text summarization method combining global coding and subject decoding[J]. Computer Applications and Software, 2023, 40(4): 134-140, 183. | |
14 | 崔卓,李红莲,张乐,等. 一种融合义原的中文摘要生成方法[J]. 中文信息学报, 2022, 36(6): 146-154. |
CUI Z, LI H L, ZHANG L, et al. A Chinese summary generation method incorporating sememes[J]. Journal of Chinese Information Processing, 2022, 36(6): 146-154. | |
15 | SUN G, WANG Z, ZHAO J. Automatic text summarization using deep reinforcement learning and beyond[J]. Information Technology and Control, 2021, 50(3): 458-469. |
16 | ZHANG Y, YANG C, ZHOU Z, et al. Enhancing Transformer with sememe knowledge[C]// Proceedings of the 5th Workshop on Representation Learning for NLP. Stroudsburg: ACL, 2020: 177-184. |
17 | GU Y, YAN J, ZHU H, et al. Language modeling with sparse product of sememe experts[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2018: 4642-4651. |
18 | SU M H, WU C H, CHENG H T. A two-stage Transformer-based approach for variable-length abstractive summarization[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2020, 28: 2061-2072. |
19 | 李旭军,王珺,余孟. 融合预训练和注意力增强的中文自动摘要研究[J]. 计算机工程与应用, 2023, 59(14): 134-141. |
LI X J, WANG J, YU M. Research on automatic Chinese summarization combining pre-training and attention enhancement[J]. Computer Engineering and Applications, 2023, 59(14): 134-141. | |
20 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
21 | SHAW P, USZKOREIT J, VASWANI A. Self-attention with relative position representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Stroudsburg: ACL, 2018: 464-468. |
22 | GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 1243-1252. |
23 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186. |
24 | LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. [2023-12-07].. |
25 | 郑志超,陈进东,张健. 融合非负正弦位置编码和混合注意力机制的情感分析模型[J]. 计算机工程与应用, 2024, 60(15):101-110. |
ZHENG Z C, CHEN J D, ZHANG J. Sentiment classification model based on non-negative sinusoidal positional encoding and hybrid attention mechanism[J]. Computer Engineering and Applications, 2024, 60(15):101-110. | |
26 | HE P, LIU X, GAO J, et al. DeBERTa: decoding-enhanced BERT with disentangled attention[EB/OL]. [2024-01-12].. |
27 | CHU X, TIAN Z, ZHANG B, et al. Conditional positional encodings for vision transformers[EB/OL]. [2023-10-24].. |
28 | ABDU-AGUYE M G, GOMAA W, MAKIHARA Y, et al. Adaptive pooling is all you need: an empirical study on hyperparameter-insensitive human action recognition using wearable sensors[C]// Proceedings of the 2020 International Joint Conference on Neural Networks. Piscataway: IEEE, 2020:1-6. |
29 | ZHAO S, ZHANG T, HU M, et al. AP-BERT: enhanced pre-trained model through average pooling[J]. Applied Intelligence, 2022, 52(14): 15929-15937. |
30 | LOCHTER J V, SILVA R M, ALMEIDA T A. Deep learning models for representing out-of-vocabulary words[C]// Proceedings of the 2020 Brazilian Conference on Intelligent Systems, LNCS 12319. Cham: Springer, 2020: 418-434. |
31 | BENAMAR A, GROUIN C, BOTHUA M, et al. Evaluating tokenizers impact on OOVs representation with Transformers models[C]// Proceedings of the 13th Language Resources and Evaluation Conference. Paris: European Language Resources Association, 2022: 4193-4204. |
32 | 孙茂松,陈新雄. 借重于人工知识库的词和义项的向量表示:以HowNet为例[J]. 中文信息学报, 2016, 30(6):1-6, 14. |
SUN M S, CHEN X X. Embedding for words and word senses based on human annotated knowledge base: a case study on HowNet[J]. Journal of Chinese Information Processing, 2016, 30(6): 1-6, 14. | |
33 | HU B, CHEN Q, ZHU F. LCSTS: a large scale Chinese short text summarization dataset[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2015: 1967-1972. |
34 | LIN C Y. ROUGE: a package for automatic evaluation of summaries[C]// Proceedings of the ACL-04 Workshop: Text Summarization Branches Out. Stroudsburg: ACL, 2004: 74-81. |
35 | XUE L, CONSTANT N, ROBERTS A, et al. mT5: a massively multilingual pre-trained text-to-text transformer[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 483-498. |
36 | ZHANG J, ZHAO Y, SALEH M, et al. PEGASUS: pre-training with extracted gap-sentences for abstractive summarization[C]// Proceedings of the 37th International Conference on Machine Learning. New York: JMLR.org, 2020: 11328-11339. |
37 | HE X. Parallel refinements for lexically constrained text generation with BART[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 8653-8666. |
[1] | Xueqiang LYU, Tao WANG, Xindong YOU, Ge XU. HTLR: named entity recognition framework with hierarchical fusion of multi-knowledge [J]. Journal of Computer Applications, 2025, 45(1): 40-47. |
[2] | Jietao LIANG, Bing LUO, Lanhui FU, Qingling CHANG, Nannan LI, Ningbo YI, Qi FENG, Xin HE, Fuqin DENG. Point cloud registration method based on coordinate geometric sampling [J]. Journal of Computer Applications, 2025, 45(1): 214-222. |
[3] | Jie WU, Ansi ZHANG, Maodong WU, Yizong ZHANG, Congbao WANG. Overview of research and application of knowledge graph in equipment fault diagnosis [J]. Journal of Computer Applications, 2024, 44(9): 2651-2659. |
[4] | Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902. |
[5] | Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769. |
[6] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[7] | Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951. |
[8] | Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688. |
[9] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[10] | Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746. |
[11] | Yuwei DING, Hongbo SHI, Jie LI, Min LIANG. Image denoising network based on local and global feature decoupling [J]. Journal of Computer Applications, 2024, 44(8): 2571-2579. |
[12] | Kaili DENG, Weibo WEI, Zhenkuan PAN. Industrial defect detection method with improved masked autoencoder [J]. Journal of Computer Applications, 2024, 44(8): 2595-2603. |
[13] | Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642. |
[14] | Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482. |
[15] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||