基于检索结果排序的伪相关反馈

doi:10.11772/j.issn.1001-9081.2016.08.2099

计算机应用 ›› 2016, Vol. 36 ›› Issue (8): 2099-2102.DOI: 10.11772/j.issn.1001-9081.2016.08.2099

• 第六届中国数据挖掘会议(CCDM 2016) • 上一篇下一篇

基于检索结果排序的伪相关反馈

闫蓉, 高光来

内蒙古大学计算机学院, 呼和浩特 010021

收稿日期:2016-03-01 修回日期:2016-05-03 出版日期:2016-08-10 发布日期:2016-08-10
通讯作者: 闫蓉
作者简介:闫蓉(1979-),女,内蒙古鄂尔多斯人,讲师,博士研究生,CCF会员,主要研究方向:信息检索、自然语言处理;高光来(1964-),男,内蒙古扎赉特人,教授,硕士,CCF会员,主要研究方向:智能信息处理。
基金资助:
国家自然科学基金资助项目（61263037）；内蒙古自然科学基金资助项目（2014BS0604,2014MS0603）。

Pseudo relevance feedback based on sorted retrieval result

YAN Rong, GAO Guanglai

College of Computer Science, Inner Mongolia University, Hohhot Nei Mongolia 010021, China

Received:2016-03-01 Revised:2016-05-03 Online:2016-08-10 Published:2016-08-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61263037), the Natural Science Foundation of Inner Mongolia Autonomous Region (2014BS0604, 2014MS0603).

摘要/Abstract

摘要： 针对传统伪相关反馈（PRF）算法扩展源质量不高使得检索效果不佳的问题，提出一种基于检索结果的排序模型（REM）。首先，该模型从初检结果中选择排名靠前的文档作为伪相关文档集；然后，以用户查询意图与伪相关文档集中各文档的相关度最大化、并且各文档之间相似性最小化作为排序原则，将伪相关文档集中各文档进行重排序；最后，将排序后排名靠前的文档作为扩展源进行二次反馈。实验结果表明，与两种传统伪反馈方法相比，该排序模型能获得与用户查询意图相关的反馈文档，可有效地提高检索效果。

关键词: 伪相关反馈, 潜在狄里克雷分配, 主题模型, 查询扩展

Abstract: Focusing on the low quality of expansion source of traditional Pseudo Relevance Feedback (PRF) algorithms, which lead to low retrieval performance, a retrieval result based sorting model, namely REM, was proposed. Firstly, the first-pass retrieval result was considered as a pseudo relevant set. Secondly, documents in the pseudo relevant set were re-ranked based on rules of maximizing the relevance between the user query intention and the documents of pseudo relevant set and minimizing the similarity between documents. Finally, the top ranked documents of the re-ranking were regarded as the expansion source to the second-retrieval. The experimental results show that, compared with two classical PRF methods, the proposed model can improve the performance of retrieval and obtain more relevant feedback document to the user query intention.

Key words: Pseudo Relevance Feedback(PRF), Latent Dirichlet Allocation(LDA), topic model, query expansion

中图分类号:

TP391.3

闫蓉, 高光来. 基于检索结果排序的伪相关反馈[J]. 计算机应用, 2016, 36(8): 2099-2102.

YAN Rong, GAO Guanglai. Pseudo relevance feedback based on sorted retrieval result[J]. Journal of Computer Applications, 2016, 36(8): 2099-2102.

参考文献

[1] CARPINETO C,ROMANO G.A survey of automatic query expansion in information retrieval[J].ACM Computing Surveys,2012,44(1):Article No.1.
[2] VARGAS S,SANTOS R L T,MACDONALD C,et al.Selecting effective expansion terms for diversity[C]//OAIR2013:Proceedings of the 10th Conference on Open Research Areas in Information Retrieval.Paris:Le Centre de Hautes Etudes Internationales D'Informatique Documentaire,2013:69-76.
[3] TEEVAN J,DUMAIS S T,HORVITZ E.Characterizing the value of personalizing search[C]//SIGIR2007:Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2007:757-758.
[4] COLLINS-THOMPSOM K.Reducing the risk of query expansion via robust constrained optimization[C]//CIKM2009:Proceedings of the 18th ACM Conference on Information and Knowledge Management.New York:ACM,2009:837-846.
[5] RAMAN K,UDUPA R,BHATTACHARYA P,et al.On improving pseudo-relevance feedback using pseudo-irrelevant documents[C]//ECIR 2010:Proceedings of the 32nd European Conference on IR Research,LNCS 5993.Berlin:Springer-Verlag,2010:573-576.
[6] ZHAI C,LAFFERTY J.Model-based feedback in the language modeling approach to information retrieval[C]//CIKM2001:Proceedings of the 10th International Conference on Information and Knowledge Management.New York:ACM,2001:403-410.
[7] HUANG Q,SONG D,RüGER S.Robust query-specific pseudo feedback document selection for query expansion[C]//ECIR 2008:Proceedings of the 30th European Conference on IR Research,LNCS 4956.Berlin:Springer-Verlag,2008:547-554.
[8] HE B,OUNIS I.Studying query expansion effectiveness[C]//ECIR 2009:Proceedings of the 31th European Conference on IR Research,LNCS 5478.Berlin:Springer-Verlag,2009:611-619.
[9] MITRA M,SINGHAL A,BUCKLEY C.Improving automatic query expansion[C]//SIGIR1998:Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,1998:206-214.
[10] AMO P,FERRERAS F L,CRUZ F,et al.Smoothing functions for automatic relevance feedback in information retrieval[C]//DEXA 2000:Proceedings of the 11th International Workshop on Database and Expert Systems Applications.Washington,DC:IEEE Computer Society,2000:115-119.
[11] 叶正.基于网络挖掘与机器学习技术的相关反馈研究[D].大连:大连理工大学,2011:51-55.(YE Z.The research of machine learning techniques and external Web resources for relevance feedback[D].Dalian:Dalian University of Technology,2011:51-55.
[12] PU Q,HE D.Pseudo relevance feedback using semantic clustering in retrieval language model[C]//CIKM2009:Proceedings of the 18th ACM Conference on Information and Knowledge Management.New York:ACM,2009:1931-1934.
[13] CARBONELL J,GOLDSTEIN J.The use of MMR,diversity-based reranking for reordering documents and producing summaries[C]//SIGIR 1998:Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,1998:335-336.
[14] SALTON G,WONG A,YANG C S.A vector space model for automatic indexing[J].Communications of the ACM,1975,18(11):613-620.
[15] BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022.
[16] ZHOU D,DING Y,YOU Q,et al.Learning to rank documents using similarity information between objects[C]//ICONIP 2011:Proceedings of the 18th International Conference on Neural Information Processing,LNCS 7063.Berlin:Springer-Verlag,2011:374-381.
[17] HONG L,DAVISON B D.Empirical study of topic modeling in twitter[C]//SOMA'10:Proceedings of the First Workshop on Social Media Analytics.New York:ACM,2010:80-88.
[18] JONES K S, WALKER S, ROBERTSON S E. A probabilistic model of information retrieval:development and comparative experiments:Part 1[J]. Information Processing & Management, 2000, 36(6):779-808.
[19] LIN J. Divergence measures based on Shannon entropy[J]. IEEE Transactions on Information Theory, 1991, 37(14):145-151.
[20] GRIFFITHS T L, STEYVERS M. Finding scientific topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Supp 1):5228-5235.
[21] BLEI D B, LAFFERTY J D. Correlated topic models[C]//NIPS 2005:Advances in Neural Information Processing Systems 18. Cambridge, MA:MIT Press, 2005, 18:147-155.
[22] OGILVIE P, VOORHEES E, CALLAN J. On the number of terms used in automatic query expansion[J]. Information Retrieval, 2009, 12(6):666-679.

基于检索结果排序的伪相关反馈

Pseudo relevance feedback based on sorted retrieval result

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	杨丰瑞, 霍娜, 张许红, 韦巍. 基于注意力机制的主题扩展情感对话生成[J]. 计算机应用, 2021, 41(4): 1078-1083.
[2]	杨威亚, 余正涛, 高盛祥, 宋燃. 基于跨语言神经主题模型的汉越新闻话题发现方法[J]. 计算机应用, 2021, 41(10): 2879-2884.
[3]	朱思淼, 魏世伟, 魏思恒, 余敦辉. 基于弹幕情感分析和主题模型的视频推荐算法[J]. 计算机应用, 2021, 41(10): 2813-2819.
[4]	尹春勇, 章荪. 面向短文本情感分类的端到端对抗变分贝叶斯方法[J]. 计算机应用, 2020, 40(9): 2536-2542.
[5]	田保军, 刘爽, 房建东. 融合主题信息和卷积神经网络的混合推荐算法[J]. 计算机应用, 2020, 40(7): 1901-1907.
[6]	刘高军, 方晓, 段建勇. 基于深度语义信息的查询扩展[J]. 计算机应用, 2020, 40(11): 3192-3197.
[7]	杨飞, 罗建桥, 李柏林. 结合全局和局部约束的sLDA铁路扣件分类模型[J]. 计算机应用, 2019, 39(3): 888-893.
[8]	徐红艳, 王丹, 王富海, 王嵘冰. 融合潜在狄利克雷分布与元路径分析的用户相关性度量方法[J]. 计算机应用, 2019, 39(11): 3288-3292.
[9]	余慧, 冯旭鹏, 刘利军, 黄青松. 聊天机器人中用户就医意图识别方法[J]. 计算机应用, 2018, 38(8): 2170-2174.
[10]	许银洁, 孙春华, 刘业政. 考虑用户特征的主题情感联合模型[J]. 计算机应用, 2018, 38(5): 1261-1266.
[11]	李琰, 刘嘉勇. 基于作者主题模型和辐射模型的用户位置预测模型[J]. 计算机应用, 2018, 38(4): 939-944.
[12]	徐立洋, 黄瑞章, 陈艳平, 钱志森, 黎万英. 基于狄利克雷多项分配模型的多源文本主题挖掘模型[J]. 计算机应用, 2018, 38(11): 3094-3099.
[13]	邓扬, 张晨曦, 李江峰. 基于弹幕情感分析的视频片段推荐模型[J]. 计算机应用, 2017, 37(4): 1065-1070.
[14]	褚征, 于炯, 王佳玉, 王跃飞. 基于LDA主题模型的移动应用相似度构建方法[J]. 计算机应用, 2017, 37(4): 1075-1082.
[15]	唐黎哲, 冯大为, 李东升, 李荣春, 刘锋. 以LDA为例的大规模分布式机器学习系统分析[J]. 计算机应用, 2017, 37(3): 628-634.