Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (5): 1365-1371.DOI: 10.11772/j.issn.1001-9081.2022040626

• China Conference on Data Mining 2022 (CCDM 2022) • Previous Articles    

J-SGPGN: paraphrase generation network based on joint learning of sequence and graph

Zhirong HOU1,2(), Xiaodong FAN1, Hua ZHANG1, Xiaonan MA1   

  1. 1.ICBC Technology Company Limited,Beijing 100029,China
    2.School of Software and Microelectronics,Peking University,Beijing 102600,China
  • Received:2022-05-05 Revised:2022-05-13 Accepted:2022-06-02 Online:2023-05-08 Published:2023-05-10
  • Contact: Zhirong HOU
  • About author:HOU Zhirong, born in 1978, Ph. D. candidate. His research interests include intelligent optimization algorithm, natural language processing.
    FAN Xiaodong, born in 1992, M. S. Her research interests include question generation, natural language processing.
    ZHANG Hua, born in 1986, Ph. D. Her research interests include graph neural network, recommendation system.
    MA Xiaonan, born in 1997. His research interests include recommendation system, deep learning.

J-SGPGN:基于序列与图的联合学习复述生成网络

侯志荣1,2(), 范晓东1, 张华1, 马晓楠1   

  1. 1.工银科技有限公司,北京 100029
    2.北京大学 软件与微电子学院,北京 102600
  • 通讯作者: 侯志荣
  • 作者简介:侯志荣(1978—),男,四川南充人,博士研究生,CCF会员,主要研究方向:智能优化算法、自然语言处理 hou.zhirong@pku.edu.cn
    范晓东(1992—),女,河北秦皇岛人,硕士,主要研究方向:问题生成、自然语言处理
    张华(1986—),女,河北保定人,博士,主要研究方向:图神经网络、推荐系统
    马晓楠(1997—),男,河北石家庄人,主要研究方向:推荐系统、深度学习。

Abstract:

Paraphrase generation is a text data argumentation method based on Natural Language Generation (NLG). Concerning the problems of repetitive generation, semantic errors and poor diversity in paraphrase generation methods based on the Sequence-to-Sequence (Seq2Seq) framework, a Paraphrase Generation Network based on Joint learning of Sequence and Graph (J-SGPGN) was proposed. Graph encoding and sequence encoding were fused in the encoder of J-SGPGN for feature enhancement, and two decoding methods including sequence generation and graph generation were designed in the decoder of J-SGPGN for parallel decoding. Then the joint learning method was used to train the model, aiming to combine syntactic supervision with semantic supervision to simultaneously improve the accuracy and diversity of generation. Experimental results on Quora dataset show that the generation accuracy evaluation indicator METEOR (Metric for Evaluation of Translation with Explicit ORdering) of J-SGPGN is 3.44 percentage points higher than that of the baseline model with optimal accuracy — RNN+GCN, and the generation diversity evaluation indicator Self-BLEU (Self-BiLingual Evaluation Understudy) of J-SGPGN is 12.79 percentage points lower than that of the baseline model with optimal diversity — Back-Translation guided multi-round Paraphrase Generation (BTmPG) model. It is verified that J-SGPGN can generate paraphrase text with more accurate semantics and more diverse expressions.

Key words: paraphrase generation, encoder-decoder, self-attention network, sequence generation, graph generation, joint learning

摘要:

复述生成是一种基于自然语言生成(NLG)的文本数据增强方法。针对基于Seq2Seq (Sequence-to-Sequence)框架的复述生成方法中出现的生成重复、语意错误及多样性差的问题,提出一种基于序列与图的联合学习复述生成网络(J-SGPGN)。J-SGPGN的编码器融合了图编码和序列编码进行特征增强,而解码器中则设计了序列生成和图生成两种解码方式并行解码;然后采用联合学习方法训练模型,旨在兼顾句法监督与语义监督以同步提升生成的准确性和多样性。在Quora数据集上的实验结果表明,J-SGPGN的生成准确性指标METEOR (Metric for Evaluation of Translation with Explicit ORdering)较准确性最优基线模型——RNN+GCN提升了3.44个百分点,生成多样性指标Self-BLEU (Self-BiLingual Evaluation Understudy)较多样性最优基线模型——多轮回译复述生成(BTmPG)模型降低了12.79个百分点。J-SGPGN能够生成语义更准确、表达方式更多样的复述文本。

关键词: 复述生成, 编码器-解码器, 自注意力网络, 序列生成, 图生成, 联合学习

CLC Number: