Headline generation model with position embedding for knowledge reasoning

doi:10.11772/j.issn.1001-9081.2024030281

Abstract

Abstract:

As the smallest semantic unit， sememe is crucial for headline generation task. Although Sememe-Driven Language Model （SDLM） is one of the mainstream models， it has limited encoding capability when dealing with long text sequences， does not fully consider positional relationships， and is prone to introduce noisy knowledge to affect the quality of generated headlines. To address the above problems， a Transformer-based generative headline model was proposed， namely Tran-A-SDLM （Transformer Adaption based Sememe-Driven Language Model with positional embedding and knowledge reasoning）， which fully combined the advantages of adaptive position embedding and knowledge reasoning mechanism. Firstly， Transformer model was introduced to enhance the model’s encoding capability for text sequences. Secondly， the adaptive positional embedding mechanism was utilized to enhance the model’s positional awareness capability， thereby improving the learning of contextual sememe knowledge. In addition， a knowledge reasoning module was introduced for representing the sememe knowledge and guiding the model to generate accurate headlines. Finally， to demonstrate the superiority of Tran-A-SDLM， experiments were conducted on Large scale Chinese Short Text Summarization （LCSTS） dataset. Experimental results show that Tran-A-SDLM achieves improvements of 0.2， 0.7 and 0.5 percentage points respectively in ROUGE-1， ROUGE-2 and ROUGE-L scores， compared to RNN-context-SDLM. Results of the ablation study further validate the effectiveness of the proposed model.

Key words: generative headline, adaptive position embedding, Transformer, knowledge reasoning, natural language processing

摘要：

义原作为最小的语义单位对于标题生成任务至关重要。尽管义原驱动的神经语言模型（SDLM）是主流模型之一，但它在处理长文本序列时编码能力有限，未充分考虑位置关系，易引入噪声知识进而影响生成标题的质量。针对上述问题，提出一种基于Transformer的生成式标题模型Tran-A-SDLM（Transformer Adaption based Sememe-Driven Language Model with positional embedding and knowledge reasoning）。该模型充分结合自适应位置编码和知识推理机制的优势。首先，引入Transformer模型以增强模型对文本序列的编码能力；其次，利用自适应位置编码机制增强模型的位置感知能力，从而增强对上下文义原知识的学习；此外，引入知识推理模块，用于表示义原知识，并指导模型生成准确标题；最后，为验证Tran-A-SDLM的优越性，在大规模中文短文本摘要（LCSTS）数据集上进行实验。实验结果表明，与RNN-context-SDLM相比，Tran-A-SDLM在ROUGE-1、ROUGE-2和ROUGE-L值上分别提升了0.2、0.7和0.5个百分点。消融实验结果进一步验证了所提模型的有效性。

关键词: 生成式标题, 自适应位置编码, Transformer, 知识推理, 自然语言处理

CLC Number:

TP391.1

Yalun WANG, Yangsen ZHANG, Siwen ZHU. Headline generation model with position embedding for knowledge reasoning[J]. Journal of Computer Applications, 2025, 45(2): 345-353.

王雅伦, 张仰森, 朱思文. 面向知识推理的位置编码标题生成模型[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 345-353.

Figures/Tables 7

References 37

1	夏吾吉，黄鹤鸣，更藏措毛，等. 基于无监督学习和监督学习的抽取式文本摘要综述［J］. 计算机应用， 2024， 44（4）： 1035-1048.
	XIA W J， HUANG H M， GENGZANGCUOMAO， et al. Survey of extractive text summarization based on unsupervised learning and supervised learning［J］. Journal of Computer Applications， 2024， 44（4）： 1035-1048.
2	朱永清，赵鹏，赵菲菲，等. 基于深度学习的生成式文本摘要技术综述［J］. 计算机工程， 2021， 47（11）：11-21， 28.
	ZHU Y Q， ZHAO P， ZHAO F F， et al. Survey on abstractive text summarization technologies based on deep learning［J］. Computer Engineering， 2021， 47（11）： 11-21， 28.
3	ZHENG C， CAI Y， ZHANG G， et al. Controllable abstractive sentence summarization with guiding entities［C］// Proceedings of the 28th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2020： 5668-5678.
4	XU P， ZHU X， CLIFTON D A. Multimodal learning with transformers： a survey［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023， 45（10）： 12113-12132.
5	DAI Z， YANG Z， YANG Y， et al. Transformer-XL： attentive language models beyond a fixed-length context［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2019： 2978-2988.
6	YANG Z， DAI Z， YANG Y， et al. XLNet： generalized autoregressive pretraining for language understanding［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2019： 5753-5763.
7	石磊，阮选敏，魏瑞斌，等. 基于序列到序列模型的生成式文本摘要研究综述［J］. 情报学报， 2019， 38（10）：1102-1116.
	SHI L， RUAN X M， WEI R B， et al. Abstractive summarization based on sequence to sequence models： a review［J］. Journal of the China Society for Scientific and Technical Information， 2019， 38（10）： 1102-1116.
8	RUSH A M， CHOPRA S， WESTON J. A neural attention model for abstractive sentence summarization［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2015： 379-389.
9	NALLAPATI R， ZHOU B， DOS SANTOS C， et al. Abstractive text summarization using sequence-to-sequence RNNs and beyond［C］// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg： ACL， 2016： 280-290.
10	CHOPRA S， AULI M， RUSH A M. Abstractive sentence summarization with attentive recurrent neural networks［C］// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2016： 93-98.
11	GU J， LU Z， LI H， et al. Incorporating copying mechanism in sequence-to-sequence learning［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2016： 1631-1640.
12	毛兴静，魏勇，杨昱睿，等. 基于关键词异构图的生成式摘要研究［J］. 计算机科学， 2024， 51（7）：278-286.
	MAO X J， WEI Y， YANG Y R， et al. KHGAS： keywords guided heterogeneous graph for abstractive summarization［J］. Computer Science， 2024， 51（7）：278-286.
13	张志远，肖芮. 融合全局编码与主题解码的文本摘要方法［J］. 计算机应用与软件， 2023， 40（4）：134-140， 183.
	ZHANG Z Y， XIAO R. Text summarization method combining global coding and subject decoding［J］. Computer Applications and Software， 2023， 40（4）： 134-140， 183.
14	崔卓，李红莲，张乐，等. 一种融合义原的中文摘要生成方法［J］. 中文信息学报， 2022， 36（6）： 146-154.
	CUI Z， LI H L， ZHANG L， et al. A Chinese summary generation method incorporating sememes［J］. Journal of Chinese Information Processing， 2022， 36（6）： 146-154.
15	SUN G， WANG Z， ZHAO J. Automatic text summarization using deep reinforcement learning and beyond［J］. Information Technology and Control， 2021， 50（3）： 458-469.
16	ZHANG Y， YANG C， ZHOU Z， et al. Enhancing Transformer with sememe knowledge［C］// Proceedings of the 5th Workshop on Representation Learning for NLP. Stroudsburg： ACL， 2020： 177-184.
17	GU Y， YAN J， ZHU H， et al. Language modeling with sparse product of sememe experts［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2018： 4642-4651.
18	SU M H， WU C H， CHENG H T. A two-stage Transformer-based approach for variable-length abstractive summarization［J］. IEEE/ACM Transactions on Audio， Speech and Language Processing， 2020， 28： 2061-2072.
19	李旭军，王珺，余孟. 融合预训练和注意力增强的中文自动摘要研究［J］. 计算机工程与应用， 2023， 59（14）： 134-141.
	LI X J， WANG J， YU M. Research on automatic Chinese summarization combining pre-training and attention enhancement［J］. Computer Engineering and Applications， 2023， 59（14）： 134-141.
20	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
21	SHAW P， USZKOREIT J， VASWANI A. Self-attention with relative position representations［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 2 （Short Papers）. Stroudsburg： ACL， 2018： 464-468.
22	GEHRING J， AULI M， GRANGIER D， et al. Convolutional sequence to sequence learning［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1243-1252.
23	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
24	LIU Y， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach［EB/OL］. ［2023-12-07］..
25	郑志超，陈进东，张健. 融合非负正弦位置编码和混合注意力机制的情感分析模型［J］. 计算机工程与应用， 2024， 60（15）：101-110.
	ZHENG Z C， CHEN J D， ZHANG J. Sentiment classification model based on non-negative sinusoidal positional encoding and hybrid attention mechanism［J］. Computer Engineering and Applications， 2024， 60（15）：101-110.
26	HE P， LIU X， GAO J， et al. DeBERTa： decoding-enhanced BERT with disentangled attention［EB/OL］. ［2024-01-12］..
27	CHU X， TIAN Z， ZHANG B， et al. Conditional positional encodings for vision transformers［EB/OL］. ［2023-10-24］..
28	ABDU-AGUYE M G， GOMAA W， MAKIHARA Y， et al. Adaptive pooling is all you need： an empirical study on hyperparameter-insensitive human action recognition using wearable sensors［C］// Proceedings of the 2020 International Joint Conference on Neural Networks. Piscataway： IEEE， 2020：1-6.
29	ZHAO S， ZHANG T， HU M， et al. AP-BERT： enhanced pre-trained model through average pooling［J］. Applied Intelligence， 2022， 52（14）： 15929-15937.
30	LOCHTER J V， SILVA R M， ALMEIDA T A. Deep learning models for representing out-of-vocabulary words［C］// Proceedings of the 2020 Brazilian Conference on Intelligent Systems， LNCS 12319. Cham： Springer， 2020： 418-434.
31	BENAMAR A， GROUIN C， BOTHUA M， et al. Evaluating tokenizers impact on OOVs representation with Transformers models［C］// Proceedings of the 13th Language Resources and Evaluation Conference. Paris： European Language Resources Association， 2022： 4193-4204.
32	孙茂松，陈新雄. 借重于人工知识库的词和义项的向量表示：以HowNet为例［J］. 中文信息学报， 2016， 30（6）：1-6， 14.
	SUN M S， CHEN X X. Embedding for words and word senses based on human annotated knowledge base： a case study on HowNet［J］. Journal of Chinese Information Processing， 2016， 30（6）： 1-6， 14.
33	HU B， CHEN Q， ZHU F. LCSTS： a large scale Chinese short text summarization dataset［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2015： 1967-1972.
34	LIN C Y. ROUGE： a package for automatic evaluation of summaries［C］// Proceedings of the ACL-04 Workshop： Text Summarization Branches Out. Stroudsburg： ACL， 2004： 74-81.
35	XUE L， CONSTANT N， ROBERTS A， et al. mT5： a massively multilingual pre-trained text-to-text transformer［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2021： 483-498.
36	ZHANG J， ZHAO Y， SALEH M， et al. PEGASUS： pre-training with extracted gap-sentences for abstractive summarization［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 11328-11339.
37	HE X. Parallel refinements for lexically constrained text generation with BART［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2021： 8653-8666.

参数	值	参数	值
批次大小	128	丢弃率	0.15
训练轮次	15	梯度裁剪	5
学习率	0.001	优化器	Adam

参数	值	参数	值
批次大小	128	丢弃率	0.15
训练轮次	15	梯度裁剪	5
学习率	0.001	优化器	Adam

模型	ROUGE-1/%	ROUGE-2/%	ROUGE-L/%	参数量/10⁶
RNN-context	29.9	17.4	27.2	2.0
ASPM	32.8	16.8	32.8	2.0
T5 PEGASUS	34.1	22.2	31.7	275.0
CopyNet	34.4	21.6	31.3	5.0
DQN	35.7	22.6	32.8	62.0
BERTSUM	37.0	17.8	32.7	110.5
Transformer-XL	37.0	19.6	34.2	41.0
CBART	37.1	21.5	35.8	121.0
PGN+2T+IF	37.4	23.8	34.2	39.0
RNN-context- SDLM	38.8	26.2	36.1	32.0
Tran-A-SDLM	39.0	26.9	36.6	46.0

模型	ROUGE-1/%	ROUGE-2/%	ROUGE-L/%	参数量/10⁶
RNN-context	29.9	17.4	27.2	2.0
ASPM	32.8	16.8	32.8	2.0
T5 PEGASUS	34.1	22.2	31.7	275.0
CopyNet	34.4	21.6	31.3	5.0
DQN	35.7	22.6	32.8	62.0
BERTSUM	37.0	17.8	32.7	110.5
Transformer-XL	37.0	19.6	34.2	41.0
CBART	37.1	21.5	35.8	121.0
PGN+2T+IF	37.4	23.8	34.2	39.0
RNN-context- SDLM	38.8	26.2	36.1	32.0
Tran-A-SDLM	39.0	26.9	36.6	46.0

模型	ROUGE-1	ROUGE-2	ROUGE-L
Tran⁃A⁃SDLM	39.0	26.9	36.6
-A	38.9	26.6	36.3
-Tran	38.8	26.2	36.1
-SDLM	38.2	25.7	35.4