Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (4): 1021-1028.DOI: 10.11772/j.issn.1001-9081.2022030460

• Artificial intelligence • Previous Articles    

User granularity-level personalized social text generation model

Yongbing GAO(), Juntian GAO, Rong MA, Lidong YANG   

  1. School of Information Engineering,Inner Mongolia University of Science and Technology,Baotou Inner Mongolia 014010,China
  • Received:2022-04-11 Revised:2022-06-12 Accepted:2022-06-22 Online:2023-04-11 Published:2023-04-10
  • Contact: Yongbing GAO
  • About author:GAO Juntian, born in 1996, M. S. candidate. His research interests include automatic generation of personalized text.
    MA Rong, born in 1997, M. S. candidate. Her research interests include automatic generation of personalized text.
    YANG Lidong, born in 1978, Ph. D., professor. His research interests include speech signal processing.
  • Supported by:
    National Natural Science Foundation of China(62161040);Natural Science Foundation of Inner Mongolia Autonomous Region(2021LHMS06004)

用户粒度级的个性化社交文本生成模型

高永兵(), 高军甜, 马蓉, 杨立东   

  1. 内蒙古科技大学 信息工程学院,内蒙古 包头 014010
  • 通讯作者: 高永兵
  • 作者简介:高军甜(1996—),男,山西吕梁人,硕士研究生,主要研究方向:个性化文本自动生成;
    马蓉(1997—),女,山西运城人,硕士研究生,主要研究方向:个性化文本自动生成;
    杨立东(1978—),男,内蒙古包头人,教授,博士,主要研究方向:语音信号处理。
  • 基金资助:
    国家自然科学基金资助项目(62161040);内蒙古自治区自然科学基金资助项目(2021LHMS06004)

Abstract:

In the field of open social text, the generated text content lacks personalized features. In order to solve the problem, a user-level fine-grained control generation model was proposed, namely PTG-GPT2-Chinese (Personalized Text Generation Generative Pre-trained Transformer 2-Chinese). In the proposed model, on the basis of the GPT2 (Generative Pre-trained Transformer 2.0) structure, an Encoder-Decoder model framework was designed. First, the static personalized information of a user was modeled and encoded on the Encoder side, a bidirectional independent attention module was added on the Decoder side to receive the static personalized feature vector, and the attention module in the original GPT2 structure was used for capturing the dynamic personalized features in the user’s text. Then, the scores of different attention modules were weighted and fused dynamically, and were participated in the subsequent decoding, thereby automatically generating social text constrained by the user’s personalized feature attributes. However, the semantic sparsity of the user’s basic information may cause conflicts between the generated text and some personalized features. Aiming at this problem, the BERT (Bidirectional Encoder Representations from Transformers) model was used to perform the secondary enhanced generation of consistent understanding between the output data of the Decoder side and the user’s personalized features, and finally the personalized social text generation was realized. Experimental results show that compared with the GPT2 model, the proposed model has the fluency improved by 0.36% to 0.72%, and on the basis of no loss of language fluency, the secondary generation makes the two evaluation indicators: personalization and consistency increase by 10.27% and 13.24% respectively. It is proved that the proposed model can assist user’s creation effectively and generate social text that is fluent and personalized for the user.

Key words: personalization, text generation, pre-trained language model, Generative Pre-trained Transformer 2 (GPT2)-Chinese, social text

摘要:

针对开放性的社交文本领域的文本生成技术生成的文本内容缺少个性化特征的问题,提出了一种用户级的细粒度控制生成模型,即PTG-GPT2-Chinese(Personalized Text Generation Generative Pre-trained Transformer 2 -Chinese)。所提模型基于GPT2(Generative Pre-trained Transformer 2.0)结构设计了Encoder-Decoder模型框架。首先在Encoder端对用户的静态个性化信息建模并编码,在Decoder端添加了双向独立的注意力模块,用于接收该静态的个性化特征向量,并利用原始GPT2结构中的注意力模块捕获用户文本中的动态个性化特征;然后,动态加权融合各注意力模块分数并参与后续解码,从而自动生成以用户个性化特征属性作为约束的社交文本;此外,为了解决用户基本信息的语义稀疏性导致的生成文本偶尔与某些个性化特征存在矛盾的问题,采用BERT模型对Decoder端输出数据与用户个性化特征进行一致性理解的二次增强生成,最终实现个性化的社交文本生成。实验结果表明,与GPT2模型相比,所提模型的流畅度提高了0.36%~0.72%,且在不损失语言流畅度的基础上,二次生成使个性化和一致性两个评价指标分别提高了10.27%和13.24%。这验证了所提模型能够有效辅助用户创作,生成流畅且符合用户个性的社交文本。

关键词: 个性化, 文本生成, 预训练语言模型, GPT2-Chinese, 社交文本

CLC Number: