计算机应用 ›› 2013, Vol. 33 ›› Issue (01): 186-188.DOI: 10.3724/SP.J.1087.2013.00186

• 人工智能 • 上一篇    下一篇

基于文本聚类与分布式Lucene的知识检索

冯汝伟,谢强,丁秋林   

  1. 南京航空航天大学 计算机科学与技术学院, 南京 210016
  • 收稿日期:2012-07-23 修回日期:2012-08-22 出版日期:2013-01-01 发布日期:2013-01-09
  • 通讯作者: 冯汝伟
  • 作者简介:冯汝伟(1988-),男,江苏江阴人,硕士研究生,主要研究方向:分布式计算;谢强(1972-),男,四川自贡人,副教授,博士,主要研究方向:知识工程、信息系统、信息安全;丁秋林(1935-),男,江西抚州人,教授,博士生导师,主要研究方向:航空宇航制造工程、管理与信息化。

Knowledge retrieval based on text clustering and distributed Lucene

FENG Ruwei,XIE Qiang,DING Qiulin   

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing Jiangsu 210016, China
  • Received:2012-07-23 Revised:2012-08-22 Online:2013-01-01 Published:2013-01-09
  • Contact: FENG Ruwei

摘要: 针对传统集中式索引处理大规模数据的性能和效率问题,提出了一种基于文本聚类的检索算法。利用文本聚类算法改进现有的索引划分方案,根据查询与聚类结果的距离计算判断查询意图,缩减查询范围。实验结果表明,所提方案能够有效地缓解大规模数据建索引和检索的压力,大幅提高分布式检索性能,同时保持着较高的准确率和查全率。

关键词: 非结构化知识, 分布式索引, 文本聚类, 全文检索, 并行检索

Abstract: To solve the low performance and efficiency issues of the traditional centralized index when processing large-scale unstructured knowledge, the authors proposed the retrieval algorithm based on text clustering. The algorithm used text clustering algorithm to improve the existing index distribution method, and reduced the search range by judging the query intent through the distance of query and clusters. The experimental results show that the proposed scheme can effectively alleviate the pressure of indexing and retrieval in handling large-scale data. It greatly improves the performance of distributed retrieval, and it still maintains relatively high accuracy rate and recall rate.

Key words: unstructured knowledge, distributed index, text clustering, full-text search, parallel retrieval

中图分类号: