计算机应用 ›› 2010, Vol. 30 ›› Issue (06): 1655-1657.

• 软件过程技术与中文信息处理 • 上一篇    下一篇

基于领域本体和Lucene的语义检索系统研究

王欢1,孙瑞志2   

  1. 1. 北京市中国农业大学
    2. 中国农业大学
  • 收稿日期:2009-12-07 修回日期:2010-01-30 发布日期:2010-06-01 出版日期:2010-06-01
  • 通讯作者: 王欢

Research of semantic retrieval system based on domain-ontology and Lucene

  • Received:2009-12-07 Revised:2010-01-30 Online:2010-06-01 Published:2010-06-01

摘要: 语义相似度是影响语义检索系统查准率和查全率的重要因素。设计了一种改进的语义相似度模型,用于量化概念间的关联程度,通过对相似度阈值的控制来调整查询扩展时扩展概念集的范围。在Lucene的基础上设计了一个基于领域本体的语义检索系统,该系统对提交的关键词组进行查询扩展后,将扩展关键词组导入文本检索引擎Lucene中,并把语义相似度作为检索结果排序算法的关键因素。实验结果表明,该语义相似度模型得出的相似度值更加接近专家经验值,系统的查询准确率与未加入查询扩展的Lucene系统相比有较大的提高。

关键词: 查询扩展, 本体, Lucene, 语义相似度, 语义检索

Abstract: Semantic similarity is the crucial factor affecting the precision rate and recall rate of semantic information retrieval system. This paper put forward an improved semantic similarity computation model, which was used to quantify the association between concepts , and then the scope of expanded concept set was adjusted by the similarity threshold. In this paper a domain-ontology-based semantic information retrieval system based on the open source full text search engine: Lucene was designed. It extended the original query terms before entering this query expansion terms into Lucene, and used semantic similarity as the key factor of sorting algorithm between searching results. The experimental results show that the semantic similarity of this model is closer to the empirical value of experts, and the precision rate of this system is greatly improved compared with the original Lucene system.

Key words: Query Expansion, Ontology, Lucene, Semantic Similarity, Semantic Search