计算机应用 ›› 2010, Vol. 30 ›› Issue (06): 1673-1675.

• 软件过程技术与中文信息处理 • 上一篇    下一篇

基于语义的单文档自动摘要算法

章芝青   

  1. 浙江大学
  • 收稿日期:2009-12-28 修回日期:2010-02-10 发布日期:2010-06-01 出版日期:2010-06-01
  • 通讯作者: 章芝青

Single-document summarization based on semantics

Zhiqing Zhang   

  • Received:2009-12-28 Revised:2010-02-10 Online:2010-06-01 Published:2010-06-01
  • Contact: Zhiqing Zhang

摘要: 单文档自动摘要的目的是在原始的文本中通过摘取、提炼主要信息,提供一篇简洁全面的摘要。自动摘要的主流方法是通过统计和机器学习的技术从文本中直接提取出句子,而单文档由于篇章有限,统计的方法无效。针对此问题,提出了基于语义的单文本自动摘要方法。该方法首先将文档划分为句子,然后计算每一对句子的语义相似度,通过运用改进型K-Medoids聚类算法将相似的句子归类,在每一类中选出最具代表性的句子,最后将句子组成文档摘要。实验结果表明,通过融合语义信息,该方法提高了摘要的质量。

关键词: 语义, 知网, 改进型K-Medoids, 单文档, 自动摘要

Abstract: Single-document summarization goals to create a compressed summary while retaining the theme of the original document. Many approaches use statistics and machine learning techniques to extract sentences from a document. Because single document has limited information, the main approaches are of no effect. Therefore, a new single-document summarization framework based on semantics was proposed. First, the sentence-sentence similarity was calculated. After that modified K-Medoids clustering algorithm was used to cluster the sentences. Finally, the most informative sentence was chosen from each cluster to form the summary. The experimental results demonstrate the improvement of the summary quality by using semantics information.

Key words: Semantics, HowNet, modified K-Medoids clustering, single document, summarization