Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (3): 699-705.DOI: 10.11772/j.issn.1001-9081.2020060837

Special Issue: 人工智能

• Artificial intelligence • Previous Articles     Next Articles

Reinforced automatic summarization model based on advantage actor-critic algorithm

DU Xixi, CHENG Hua, FANG Yiquan   

  1. Institute of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Received:2020-06-17 Revised:2020-10-08 Online:2021-03-10 Published:2021-01-15
  • Supported by:
    This work is partially supported by the CERNET Innovation Project (NGII20170520).


杜嘻嘻, 程华, 房一泉   

  1. 华东理工大学 信息科学与工程学院, 上海 200237
  • 通讯作者: 程华
  • 作者简介:杜嘻嘻(1994-),女,安徽亳州人,硕士研究生,主要研究方向:自然语言处理;程华(1975-),男,安徽黄山人,副研究员,博士,主要研究方向:智能信号处理、信息安全;房一泉(1975-),女,上海人,高级实验师,硕士,主要研究方向:信息安全。
  • 基金资助:

Abstract: The extractive summary model is relatively redundant and the abstractive summary model often loses key information and has inaccurate summary and repeated generated content in long text automatic summarization task. In order to solve these problems, a Reinforced Automatic Summarization model based on Advantage Actor-Critic algorithm (A2C-RLAS) for long text was proposed. Firstly, the key sentences of the original text were extracted by the extractor based on the hybrid neural network of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). Then, the key sentences were refined by the rewriter based on the copy mechanism and the attention mechanism. Finally, the Advantage Actor-Critic (A2C) algorithm in reinforcement learning was used to train the entire network, and the semantic similarity between the rewritten summary and the reference summary (BERTScore (Evaluating Text Generation with Bidirectional Encoder Representations from Transformers) value) was used as a reward to guide the extraction process, so as to improve the quality of sentences extracted by the extractor. The experimental results on CNN/Daily Mail dataset show that, compared with models such as Reinforcement Learning-based Extractive Summarization (Refresh) model, a Recurrent Neural Network based sequence model for extractive summarization (SummaRuNNer) and Distributional Semantics Reward (DSR) model, the A2C-RLAS has the final summary with content more accurate, language more fluent and redundant content effectively reduced, at the same time, A2C-RLAS has both the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BERTScore indicators improved. Compared to the Refresh model and the SummaRuNNer model, the ROUGE-L value of the A2C-RLAS model is increased by 6.3% and 10.2% respectively; compared with the DSR model, the F1 value of the A2C-RLAS model is increased by 30.5%.

Key words: automatic summary model, extractive summary model, abstractive summary model, encoder-decoder, reinforcement learning, Advantage Actor-Critic (A2C) algorithm

摘要: 针对长文本自动摘要任务中抽取式模型摘要较为冗余,而生成式摘要模型时常有关键信息丢失、摘要不准确和生成内容重复等问题,提出一种面向长文本的基于优势演员-评论家算法的强化自动摘要模型(A2C-RLAS)。首先,用基于卷积神经网络(CNN)和循环神经网络(RNN)的混合神经网络的抽取器(extractor)来提取原文关键句;然后,用基于拷贝机制和注意力机制的重写器(rewriter)来精炼关键句;最后,使用强化学习的优势演员-评论家(A2C)算法训练整个网络,把重写摘要和参考摘要的语义相似性(BERTScore值)作为奖励(reward)来指导抽取过程,从而提高抽取器提取句子的质量。在CNN/Daily Mail数据集上的实验结果表明,与基于强化学习的抽取式摘要(Refresh)模型、基于循环神经网络的抽取式摘要序列模型(SummaRuNNer)和分布语义奖励(DSR)模型等模型相比,A2C-RLAS的最终摘要内容更加准确、语言更加流畅,冗余的内容有效减少,且A2C-RLAS的ROUGE和BERTScore指标均有提升。相较于Refresh模型和SummaRuNNer模型,A2C-RLAS模型的ROUGE-L值分别提高了6.3%和10.2%;相较于DSR模型,A2C-RLAS模型的F1值提高了30.5%。

关键词: 自动摘要模型, 抽取式摘要模型, 生成式摘要模型, 编码器-解码器, 强化学习, 优势演员-评论家算法

CLC Number: