计算机应用 ›› 2016, Vol. 36 ›› Issue (8): 2099-2102.DOI: 10.11772/j.issn.1001-9081.2016.08.2099

• 第六届中国数据挖掘会议(CCDM 2016) • 上一篇    下一篇

基于检索结果排序的伪相关反馈

闫蓉, 高光来   

  1. 内蒙古大学 计算机学院, 呼和浩特 010021
  • 收稿日期:2016-03-01 修回日期:2016-05-03 出版日期:2016-08-10 发布日期:2016-08-10
  • 通讯作者: 闫蓉
  • 作者简介:闫蓉(1979-),女,内蒙古鄂尔多斯人,讲师,博士研究生,CCF会员,主要研究方向:信息检索、自然语言处理;高光来(1964-),男,内蒙古扎赉特人,教授,硕士,CCF会员,主要研究方向:智能信息处理。
  • 基金资助:
    国家自然科学基金资助项目(61263037);内蒙古自然科学基金资助项目(2014BS0604,2014MS0603)。

Pseudo relevance feedback based on sorted retrieval result

YAN Rong, GAO Guanglai   

  1. College of Computer Science, Inner Mongolia University, Hohhot Nei Mongolia 010021, China
  • Received:2016-03-01 Revised:2016-05-03 Online:2016-08-10 Published:2016-08-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61263037), the Natural Science Foundation of Inner Mongolia Autonomous Region (2014BS0604, 2014MS0603).

摘要: 针对传统伪相关反馈(PRF)算法扩展源质量不高使得检索效果不佳的问题,提出一种基于检索结果的排序模型(REM)。首先,该模型从初检结果中选择排名靠前的文档作为伪相关文档集;然后,以用户查询意图与伪相关文档集中各文档的相关度最大化、并且各文档之间相似性最小化作为排序原则,将伪相关文档集中各文档进行重排序;最后,将排序后排名靠前的文档作为扩展源进行二次反馈。实验结果表明,与两种传统伪反馈方法相比,该排序模型能获得与用户查询意图相关的反馈文档,可有效地提高检索效果。

关键词: 伪相关反馈, 潜在狄里克雷分配, 主题模型, 查询扩展

Abstract: Focusing on the low quality of expansion source of traditional Pseudo Relevance Feedback (PRF) algorithms, which lead to low retrieval performance, a retrieval result based sorting model, namely REM, was proposed. Firstly, the first-pass retrieval result was considered as a pseudo relevant set. Secondly, documents in the pseudo relevant set were re-ranked based on rules of maximizing the relevance between the user query intention and the documents of pseudo relevant set and minimizing the similarity between documents. Finally, the top ranked documents of the re-ranking were regarded as the expansion source to the second-retrieval. The experimental results show that, compared with two classical PRF methods, the proposed model can improve the performance of retrieval and obtain more relevant feedback document to the user query intention.

Key words: Pseudo Relevance Feedback(PRF), Latent Dirichlet Allocation(LDA), topic model, query expansion

中图分类号: