Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (2): 432-436.DOI: 10.11772/j.issn.1001-9081.2016.02.0432

Previous Articles     Next Articles

Automatic short text summarization method based on multiple mapping

LU Ling, YANG Wu, CAO Qiong   

  1. College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400050, China
  • Received:2015-09-15 Revised:2015-09-22 Online:2016-02-03 Published:2016-02-10


卢玲, 杨武, 曹琼   

  1. 重庆理工大学 计算机科学与工程学院, 重庆 400054
  • 通讯作者: 卢玲(1975-),女,重庆人,副教授,硕士,CCF会员,主要研究方向:机器学习、数据挖掘、并行计算。
  • 作者简介:杨武(1965-),男,重庆人,教授,CCF会员,主要研究方向:信息检索、机器学习;曹琼(1979-),女,重庆人,讲师,硕士,主要研究方向:信息检索、数据挖掘。
  • 基金资助:

Abstract: Traditional automatic text summarization has generally no word count requirements while many social network platforms have word count limitation. Balanced performance is hardly obtained in short text summarization by traditional digest technology because of the limitation of word count. In view of this problem, a new automatic short text summarization method was proposed. Firstly, the values of relationship mapping, length mapping, title mapping and position mapping were calculated to respectively form some sets of candidate sentences. Secondly, the candidate sentences sets were mapped to abstract sentences set by multiple mapping strategies according to series of multiple mapping rules, and the recall ratio was increased by putting central sentences into the set of abstract sentences. The experimental results show that multiple mappings can obtain stable performance in short text summarization, the F measures of ROUGE-1 and ROUGE-2 tests are 0.49 and 0.35 respectively, which are better than the average level of NLP&CC2015 evaluation, proving the effectiveness of the method.

Key words: automatic summarization, short text summarization, multiple mapping, mapping rule

摘要: 传统自动文摘一般对字数没有明确限制,运用传统技术进行短文摘提取时,受字数限制,难以获取均衡的性能。针对该问题,提出一种多重映射的自动短文摘方法。通过计算关联度映射值、长度映射值、标题映射值和位置映射值,分别形成多个候选文摘句子集;再运用多重映射策略,将多个候选子集映射到文摘句子集中,同时使用提取文本中心句的方法提高召回率。实验表明,多重映射可在短文摘提取上获得稳定的性能。在NLP&CC2015评测中,该方法的ROUGE-1测试F值达到0.49,ROUGE-2测试F值达到0.35,均优于评测的平均水平,表明了该方法的有效性。

关键词: 自动文摘, 短文摘, 多重映射, 映射规则

CLC Number: