Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (7): 2100-2105.DOI: 10.11772/j.issn.1001-9081.2017.07.2100

Previous Articles     Next Articles

Single document automatic summarization algorithm based on word-sentence co-ranking

ZHANG Lu, CAO Jie, PU Chaoyi, WU Zhi'ang   

  1. Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing Jiangsu 210023, China
  • Received:2017-02-20 Revised:2017-02-27 Online:2017-07-10 Published:2017-07-18
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (71571093, 71372188), the National Center for International Joint Research on E-Business Information Processing (2013B01035), the Surface Projects of Natural Science Research in Jiangsu Provincial Colleges and Universities (15KJB520012), the Pre-Research Project of Nanjing University of Finance and Economics (YYJ201415).

基于词句协同排序的单文档自动摘要算法

张璐, 曹杰, 蒲朝仪, 伍之昂   

  1. 南京财经大学 江苏省电子商务重点实验室, 南京 210023
  • 通讯作者: 张璐
  • 作者简介:张璐(1983-),男,江苏滨海人,讲师,博士,CCF会员,主要研究方向:自然语言处理、数据挖掘;曹杰(1969-),男,江苏姜堰人,教授,博士,CCF会员,主要研究方向:数据挖掘、商务智能;蒲朝仪(1993-),女,贵州遵义人,硕士研究生,主要研究方向:自然语言处理;伍之昂(1982-),男,江苏宜兴人,副教授,博士,CCF会员,主要研究方向:数据挖掘、社会计算。
  • 基金资助:
    国家自然科学基金资助项目(71571093,71372188);国家电子商务信息处理国际联合研究中心项目(2013B01035);江苏省高校自然科学基金资助项目(15KJB520012);南京财经大学校预研究资助项目(YYJ201415)。

Abstract: Focusing on the issue that extractive summarization needs to automatically produce a short summary of a document by concatenating several sentences taken exactly from the original material. A single document automatic summarization algorithm based on word-sentence co-ranking was proposed, named WSRank for short, which integrated the word-sentence relationship into the graph-based sentences ranking model. The framework of co-ranking in WSRank was given, and then was converted to a quite concise form in the view of matrix operations, and its convergence was theoretically proved. Moreover, a redundancy elimination technique was presented as a supplement to WSRank, so that the quality of automatic summarization could be further enhanced. The experimental results on real datasets show that WSRank improves the performance of summarization by 13% to 30% in multiple Rouge metrics, which demonstrates the effectiveness of the proposed method.

Key words: automatic summarization, extractive summary, single document, graph-based ranking, word-sentence collaboration

摘要: 对于节录式自动摘要需要从文档中提取一定数量的重要句子,以生成涵盖原文主旨的短文的问题,提出一种基于词句协同排序的单文档自动摘要算法,将词句关系融入以图排序为基础的句子权重计算过程中。首先给出了算法中词句协同计算的框架;然后转化为简洁的矩阵表示形式,并从理论上证明了收敛性;最后进一步通过去冗余方法提高自动摘要的质量。真实数据集上的实验表明,基于词句协同排序的自动摘要算法较经典的TextRank算法在Rouge指标上提升13%~30%,能够有效提高摘要的生成质量。

关键词: 自动摘要, 节录式摘要, 单文档, 图排序, 词句协同

CLC Number: