计算机应用 ›› 2011, Vol. 31 ›› Issue (12): 3275-3277.

• 数据库技术 • 上一篇    下一篇

自动文摘中的冗余句消除方法

程传鹏1,杨要科2   

  1. 1. 中原工学院 计算机学院,郑州 450007
    2. (中原工学院 计算机学院, 郑州 450007)
  • 收稿日期:2011-06-02 修回日期:2011-08-06 发布日期:2011-12-12 出版日期:2011-12-01
  • 通讯作者: 程传鹏
  • 基金资助:
    河南省教育厅自然科学项目

Method for elimination of redundant sentences in automatic abstraction

CHEN Chuan-peng,YANG Yao-ke   

  1. School of Computer Science, Zhongyuan Institute of Technology,Zhengzhou Henan 450007,China
  • Received:2011-06-02 Revised:2011-08-06 Online:2011-12-12 Published:2011-12-01
  • Contact: CHEN Chuan-peng

摘要: 针对自动文摘的信息冗余问题,提出了一种冗余语句消除的方法。利用《同义词词林》来定义词语语义距离计算公式,根据词语的相似度,建立主题词和主题句之间的一一对应关系,借用编码理论中海明距离的理论,得到了文摘中主题句的相似度,设置阈值过滤掉相似度较高的主题句,从而实现了主题句的约简。实验结果证明,该方法提高了文摘的精度。

关键词: 自动文摘, 信息冗余, 同义词词林, 语义距离, 海明距离

Abstract: To solve the problem of information redundancy in automatic abstraction, this paper proposed a method for eliminating redundant sentences in automatic abstraction. Firstly, similarity of words was defined based on TongYiCi CiLin. And then, correspondence between topic words and subject sentence was established based on the similarity of words, the similarity of subject sentence was got based on the theory of Hamming distance in encoding theory, and high similarity sentences were reduced by threshold. The experimental results show that the method greatly improves the accuracy of abstraction.

Key words: automatic text summarization, Information redundancy, TongYiCi CiLin, Semantic distance, Hamming distance

中图分类号: