计算机应用 ›› 2005, Vol. 25 ›› Issue (09): 2022-2024.DOI: 10.3724/SP.J.1087.2005.02022

• 人工智能 • 上一篇    下一篇

基于连续段落相似度的主题划分算法

傅间莲,陈群秀   

  1. 清华大学计算机科学与技术系智能技术与系统国家重点实验室
  • 发布日期:2011-04-11 出版日期:2005-09-01

Study on topic partition based on sequential paragraphic similarity

 FU Jian-lian,CHEN Qun-xiu   

  1. State Key Lab of Intelligent Technology and System,Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China
  • Online:2011-04-11 Published:2005-09-01

摘要: 主题划分是自动文摘系统中文本结构分析阶段所要解决的一个重要问题。文中提出了一个通过建立段落向量空间模型,根据连续段落相似度进行文本主题划分的算法,解决了文章的篇章结构分析问题,使得多主题文章的文摘更具内容全面性与结构平衡性。实验结果表明,该算法对多主题文章的主题划分准确率为92.4%,对单主题文章的主题划分准确率为99.1%。

关键词: 自动文摘, 向量空间模型, 段落相似度, 主题划分

Abstract: Topic partition is a significant problem during text structuring in automatic abstracting system.VSM was established for the whole article based on paragraph,and then algorithms for multi-topic text partitioning based on sequential paragraphic similarity were proposed.It solved the problem of chapter structural analysis in multi-topic article and made the abstract of the multi-topic to have more general content and more balanced structure.Experiments on close test show that the precision of topic partition for multi-topic text and single-topic text reaches 92.4% and 99.1% respectively.

Key words: automatic abstraction, VSM, paragraphic similarity, topic partition

中图分类号: