面向知识推理的位置编码标题生成模型

doi:10.11772/j.issn.1001-9081.2024030281

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (2): 345-353.DOI: 10.11772/j.issn.1001-9081.2024030281

• 人工智能 •

面向知识推理的位置编码标题生成模型

王雅伦, 张仰森(), 朱思文

北京信息科技大学智能信息处理研究所，北京 100101

收稿日期:2024-03-18 修回日期:2024-04-30 接受日期:2024-05-31 发布日期:2024-07-22 出版日期:2025-02-10
通讯作者: 张仰森
作者简介:王雅伦（2000—），女，北京人，硕士研究生，CCF会员，主要研究方向：自然语言处理；
朱思文（1998—），男，江西赣州人，硕士研究生，主要研究方向：自然语言处理。
基金资助:
国家自然科学基金资助项目(62176023)

Headline generation model with position embedding for knowledge reasoning

Yalun WANG, Yangsen ZHANG(), Siwen ZHU

Institute of Intelligent Information Processing，Beijing Information Science and Technology University，Beijing 100101，China

Received:2024-03-18 Revised:2024-04-30 Accepted:2024-05-31 Online:2024-07-22 Published:2025-02-10
Contact: Yangsen ZHANG
About author:WANG Yalun， born in 2000， M. S. candidate. Her research interests include natural language processing.
ZHU Siwen， born in 1998， M. S. candidate. His research interests include natural language processing.
Supported by:
National Natural Science Foundation of China(62176023)

摘要/Abstract

摘要：

义原作为最小的语义单位对于标题生成任务至关重要。尽管义原驱动的神经语言模型（SDLM）是主流模型之一，但它在处理长文本序列时编码能力有限，未充分考虑位置关系，易引入噪声知识进而影响生成标题的质量。针对上述问题，提出一种基于Transformer的生成式标题模型Tran-A-SDLM（Transformer Adaption based Sememe-Driven Language Model with positional embedding and knowledge reasoning）。该模型充分结合自适应位置编码和知识推理机制的优势。首先，引入Transformer模型以增强模型对文本序列的编码能力；其次，利用自适应位置编码机制增强模型的位置感知能力，从而增强对上下文义原知识的学习；此外，引入知识推理模块，用于表示义原知识，并指导模型生成准确标题；最后，为验证Tran-A-SDLM的优越性，在大规模中文短文本摘要（LCSTS）数据集上进行实验。实验结果表明，与RNN-context-SDLM相比，Tran-A-SDLM在ROUGE-1、ROUGE-2和ROUGE-L值上分别提升了0.2、0.7和0.5个百分点。消融实验结果进一步验证了所提模型的有效性。

关键词: 生成式标题, 自适应位置编码, Transformer, 知识推理, 自然语言处理

Abstract:

As the smallest semantic unit， sememe is crucial for headline generation task. Although Sememe-Driven Language Model （SDLM） is one of the mainstream models， it has limited encoding capability when dealing with long text sequences， does not fully consider positional relationships， and is prone to introduce noisy knowledge to affect the quality of generated headlines. To address the above problems， a Transformer-based generative headline model was proposed， namely Tran-A-SDLM （Transformer Adaption based Sememe-Driven Language Model with positional embedding and knowledge reasoning）， which fully combined the advantages of adaptive position embedding and knowledge reasoning mechanism. Firstly， Transformer model was introduced to enhance the model’s encoding capability for text sequences. Secondly， the adaptive positional embedding mechanism was utilized to enhance the model’s positional awareness capability， thereby improving the learning of contextual sememe knowledge. In addition， a knowledge reasoning module was introduced for representing the sememe knowledge and guiding the model to generate accurate headlines. Finally， to demonstrate the superiority of Tran-A-SDLM， experiments were conducted on Large scale Chinese Short Text Summarization （LCSTS） dataset. Experimental results show that Tran-A-SDLM achieves improvements of 0.2， 0.7 and 0.5 percentage points respectively in ROUGE-1， ROUGE-2 and ROUGE-L scores， compared to RNN-context-SDLM. Results of the ablation study further validate the effectiveness of the proposed model.

Key words: generative headline, adaptive position embedding, Transformer, knowledge reasoning, natural language processing

中图分类号:

TP391.1

王雅伦, 张仰森, 朱思文. 面向知识推理的位置编码标题生成模型[J]. 计算机应用, 2025, 45(2): 345-353.

Yalun WANG, Yangsen ZHANG, Siwen ZHU. Headline generation model with position embedding for knowledge reasoning[J]. Journal of Computer Applications, 2025, 45(2): 345-353.

图/表 7

参考文献 37

1	夏吾吉，黄鹤鸣，更藏措毛，等. 基于无监督学习和监督学习的抽取式文本摘要综述［J］. 计算机应用， 2024， 44（4）： 1035-1048.
	XIA W J， HUANG H M， GENGZANGCUOMAO， et al. Survey of extractive text summarization based on unsupervised learning and supervised learning［J］. Journal of Computer Applications， 2024， 44（4）： 1035-1048.
2	朱永清，赵鹏，赵菲菲，等. 基于深度学习的生成式文本摘要技术综述［J］. 计算机工程， 2021， 47（11）：11-21， 28.
	ZHU Y Q， ZHAO P， ZHAO F F， et al. Survey on abstractive text summarization technologies based on deep learning［J］. Computer Engineering， 2021， 47（11）： 11-21， 28.
3	ZHENG C， CAI Y， ZHANG G， et al. Controllable abstractive sentence summarization with guiding entities［C］// Proceedings of the 28th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2020： 5668-5678.
4	XU P， ZHU X， CLIFTON D A. Multimodal learning with transformers： a survey［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023， 45（10）： 12113-12132.
5	DAI Z， YANG Z， YANG Y， et al. Transformer-XL： attentive language models beyond a fixed-length context［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2019： 2978-2988.
6	YANG Z， DAI Z， YANG Y， et al. XLNet： generalized autoregressive pretraining for language understanding［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2019： 5753-5763.
7	石磊，阮选敏，魏瑞斌，等. 基于序列到序列模型的生成式文本摘要研究综述［J］. 情报学报， 2019， 38（10）：1102-1116.
	SHI L， RUAN X M， WEI R B， et al. Abstractive summarization based on sequence to sequence models： a review［J］. Journal of the China Society for Scientific and Technical Information， 2019， 38（10）： 1102-1116.
8	RUSH A M， CHOPRA S， WESTON J. A neural attention model for abstractive sentence summarization［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2015： 379-389.
9	NALLAPATI R， ZHOU B， DOS SANTOS C， et al. Abstractive text summarization using sequence-to-sequence RNNs and beyond［C］// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg： ACL， 2016： 280-290.
10	CHOPRA S， AULI M， RUSH A M. Abstractive sentence summarization with attentive recurrent neural networks［C］// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2016： 93-98.
11	GU J， LU Z， LI H， et al. Incorporating copying mechanism in sequence-to-sequence learning［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2016： 1631-1640.
12	毛兴静，魏勇，杨昱睿，等. 基于关键词异构图的生成式摘要研究［J］. 计算机科学， 2024， 51（7）：278-286.
	MAO X J， WEI Y， YANG Y R， et al. KHGAS： keywords guided heterogeneous graph for abstractive summarization［J］. Computer Science， 2024， 51（7）：278-286.
13	张志远，肖芮. 融合全局编码与主题解码的文本摘要方法［J］. 计算机应用与软件， 2023， 40（4）：134-140， 183.
	ZHANG Z Y， XIAO R. Text summarization method combining global coding and subject decoding［J］. Computer Applications and Software， 2023， 40（4）： 134-140， 183.
14	崔卓，李红莲，张乐，等. 一种融合义原的中文摘要生成方法［J］. 中文信息学报， 2022， 36（6）： 146-154.
	CUI Z， LI H L， ZHANG L， et al. A Chinese summary generation method incorporating sememes［J］. Journal of Chinese Information Processing， 2022， 36（6）： 146-154.
15	SUN G， WANG Z， ZHAO J. Automatic text summarization using deep reinforcement learning and beyond［J］. Information Technology and Control， 2021， 50（3）： 458-469.
16	ZHANG Y， YANG C， ZHOU Z， et al. Enhancing Transformer with sememe knowledge［C］// Proceedings of the 5th Workshop on Representation Learning for NLP. Stroudsburg： ACL， 2020： 177-184.
17	GU Y， YAN J， ZHU H， et al. Language modeling with sparse product of sememe experts［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2018： 4642-4651.
18	SU M H， WU C H， CHENG H T. A two-stage Transformer-based approach for variable-length abstractive summarization［J］. IEEE/ACM Transactions on Audio， Speech and Language Processing， 2020， 28： 2061-2072.
19	李旭军，王珺，余孟. 融合预训练和注意力增强的中文自动摘要研究［J］. 计算机工程与应用， 2023， 59（14）： 134-141.
	LI X J， WANG J， YU M. Research on automatic Chinese summarization combining pre-training and attention enhancement［J］. Computer Engineering and Applications， 2023， 59（14）： 134-141.
20	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
21	SHAW P， USZKOREIT J， VASWANI A. Self-attention with relative position representations［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 2 （Short Papers）. Stroudsburg： ACL， 2018： 464-468.
22	GEHRING J， AULI M， GRANGIER D， et al. Convolutional sequence to sequence learning［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1243-1252.
23	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
24	LIU Y， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach［EB/OL］. ［2023-12-07］..
25	郑志超，陈进东，张健. 融合非负正弦位置编码和混合注意力机制的情感分析模型［J］. 计算机工程与应用， 2024， 60（15）：101-110.
	ZHENG Z C， CHEN J D， ZHANG J. Sentiment classification model based on non-negative sinusoidal positional encoding and hybrid attention mechanism［J］. Computer Engineering and Applications， 2024， 60（15）：101-110.
26	HE P， LIU X， GAO J， et al. DeBERTa： decoding-enhanced BERT with disentangled attention［EB/OL］. ［2024-01-12］..
27	CHU X， TIAN Z， ZHANG B， et al. Conditional positional encodings for vision transformers［EB/OL］. ［2023-10-24］..
28	ABDU-AGUYE M G， GOMAA W， MAKIHARA Y， et al. Adaptive pooling is all you need： an empirical study on hyperparameter-insensitive human action recognition using wearable sensors［C］// Proceedings of the 2020 International Joint Conference on Neural Networks. Piscataway： IEEE， 2020：1-6.
29	ZHAO S， ZHANG T， HU M， et al. AP-BERT： enhanced pre-trained model through average pooling［J］. Applied Intelligence， 2022， 52（14）： 15929-15937.
30	LOCHTER J V， SILVA R M， ALMEIDA T A. Deep learning models for representing out-of-vocabulary words［C］// Proceedings of the 2020 Brazilian Conference on Intelligent Systems， LNCS 12319. Cham： Springer， 2020： 418-434.
31	BENAMAR A， GROUIN C， BOTHUA M， et al. Evaluating tokenizers impact on OOVs representation with Transformers models［C］// Proceedings of the 13th Language Resources and Evaluation Conference. Paris： European Language Resources Association， 2022： 4193-4204.
32	孙茂松，陈新雄. 借重于人工知识库的词和义项的向量表示：以HowNet为例［J］. 中文信息学报， 2016， 30（6）：1-6， 14.
	SUN M S， CHEN X X. Embedding for words and word senses based on human annotated knowledge base： a case study on HowNet［J］. Journal of Chinese Information Processing， 2016， 30（6）： 1-6， 14.
33	HU B， CHEN Q， ZHU F. LCSTS： a large scale Chinese short text summarization dataset［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2015： 1967-1972.
34	LIN C Y. ROUGE： a package for automatic evaluation of summaries［C］// Proceedings of the ACL-04 Workshop： Text Summarization Branches Out. Stroudsburg： ACL， 2004： 74-81.
35	XUE L， CONSTANT N， ROBERTS A， et al. mT5： a massively multilingual pre-trained text-to-text transformer［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2021： 483-498.
36	ZHANG J， ZHAO Y， SALEH M， et al. PEGASUS： pre-training with extracted gap-sentences for abstractive summarization［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 11328-11339.
37	HE X. Parallel refinements for lexically constrained text generation with BART［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2021： 8653-8666.

参数	值	参数	值
批次大小	128	丢弃率	0.15
训练轮次	15	梯度裁剪	5
学习率	0.001	优化器	Adam

参数	值	参数	值
批次大小	128	丢弃率	0.15
训练轮次	15	梯度裁剪	5
学习率	0.001	优化器	Adam

模型	ROUGE-1/%	ROUGE-2/%	ROUGE-L/%	参数量/10⁶
RNN-context	29.9	17.4	27.2	2.0
ASPM	32.8	16.8	32.8	2.0
T5 PEGASUS	34.1	22.2	31.7	275.0
CopyNet	34.4	21.6	31.3	5.0
DQN	35.7	22.6	32.8	62.0
BERTSUM	37.0	17.8	32.7	110.5
Transformer-XL	37.0	19.6	34.2	41.0
CBART	37.1	21.5	35.8	121.0
PGN+2T+IF	37.4	23.8	34.2	39.0
RNN-context- SDLM	38.8	26.2	36.1	32.0
Tran-A-SDLM	39.0	26.9	36.6	46.0

模型	ROUGE-1/%	ROUGE-2/%	ROUGE-L/%	参数量/10⁶
RNN-context	29.9	17.4	27.2	2.0
ASPM	32.8	16.8	32.8	2.0
T5 PEGASUS	34.1	22.2	31.7	275.0
CopyNet	34.4	21.6	31.3	5.0
DQN	35.7	22.6	32.8	62.0
BERTSUM	37.0	17.8	32.7	110.5
Transformer-XL	37.0	19.6	34.2	41.0
CBART	37.1	21.5	35.8	121.0
PGN+2T+IF	37.4	23.8	34.2	39.0
RNN-context- SDLM	38.8	26.2	36.1	32.0
Tran-A-SDLM	39.0	26.9	36.6	46.0

模型	ROUGE-1	ROUGE-2	ROUGE-L
Tran⁃A⁃SDLM	39.0	26.9	36.6
-A	38.9	26.6	36.3
-Tran	38.8	26.2	36.1
-SDLM	38.2	25.7	35.4

面向知识推理的位置编码标题生成模型

Headline generation model with position embedding for knowledge reasoning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 37

相关文章 15

编辑推荐

Metrics

序号	参考标题	对比模型	生成标题
1	男生高考作弊追打监考老师：你知道我爸是谁？	RNN-context	高考作弊事件中男生动手伤害女监考官
		CopyNet	男考生不满没收作弊手机，踹女监考老师
		BERTSUM	阜新高考男生作弊被抓后攻击监考老师
		RNN-context-SDLM（-Tran）	高考生作弊被抓：你知道我爸是谁啊？
		Tran-A-SDLM	高考生作弊被抓踹监考老师：你知道我爸是谁啊？
2	教育部原发言人：现在语文课至少一半不该学	RNN-context	教育部发言人：语文教材修订稿
		CopyNet	前教育部发言人：语文课至少一半不该学，应增加传统文化的比例
		BERTSUM	专家：语文课至少一半不该学，应修订
		RNN-context-SDLM（-Tran）	教育部发言人：语文课至少一半不该学内容
		Tran-A-SDLM	教育部原发言人：语文课至少一半不该学

[1]	梁杰涛, 罗兵, 付兰慧, 常青玲, 李楠楠, 易宁波, 冯其, 何鑫, 邓辅秦. 基于坐标几何采样的点云配准方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 214-222.
[2]	吕学强, 王涛, 游新冬, 徐戈. 层次融合多元知识的命名实体识别框架——HTLR[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 40-47.
[3]	武杰, 张安思, 吴茂东, 张仪宗, 王从宝. 知识图谱在装备故障诊断领域的研究与应用综述[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2651-2659.
[4]	任烈弘, 黄铝文, 田旭, 段飞. 基于DFT的频率敏感双分支Transformer多变量长时间序列预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2739-2746.
[5]	贾洁茹, 杨建超, 张硕蕊, 闫涛, 陈斌. 基于自蒸馏视觉Transformer的无监督行人重识别[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2893-2902.
[6]	方介泼, 陶重犇. 应对零日攻击的混合车联网入侵检测系统[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2763-2769.
[7]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[8]	杨鑫, 陈雪妮, 吴春江, 周世杰. 结合变种残差模型和Transformer的城市公路短时交通流预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2947-2951.
[9]	帅奇, 王海瑞, 朱贵富. 基于双向对比训练的中文故事结尾生成模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2683-2688.
[10]	李金金, 桑国明, 张益嘉. APK-CNN和Transformer增强的多域虚假新闻检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2674-2682.
[11]	丁宇伟, 石洪波, 李杰, 梁敏. 基于局部和全局特征解耦的图像去噪网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2571-2579.
[12]	邓凯丽, 魏伟波, 潘振宽. 改进掩码自编码器的工业缺陷检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2595-2603.
[13]	杨帆, 邹窈, 朱明志, 马振伟, 程大伟, 蒋昌俊. 基于图注意力Transformer神经网络的信用卡欺诈检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2634-2642.
[14]	张全梅, 黄润萍, 滕飞, 张海波, 周南. 融合异构信息的自动国际疾病分类编码方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2476-2482.
[15]	李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182.