Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (9): 2526-2530.DOI: 10.11772/j.issn.1001-9081.2016.09.2526

Previous Articles     Next Articles

Query expansion with semantic vector representation

LI Yan1, ZHANG Bowen1, HAO Hongwei2   

  1. 1. School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China;
    2. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2016-03-18 Revised:2016-04-23 Online:2016-09-10 Published:2016-09-08
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (U1135005).


李岩1, 张博文1, 郝红卫2   

  1. 1. 北京科技大学 计算机与通信工程学院, 北京 100083;
    2. 中国科学院 自动化研究所, 北京 100190
  • 通讯作者: 李岩
  • 作者简介:李岩(1987-),男,黑龙江牡丹江人,博士研究生,主要研究方向:信息检索、深度学习;张博文(1992-),男,北京人,博士研究生,主要研究方向:信息检索;郝红卫(1967-),男,河北永年人,教授,博士,主要研究方向:模式识别、机器学习。
  • 基金资助:

Abstract: To solve the problem that the traditional query expansion used in professional domains suffers from the lack of semantic relations between expansion terms and original queries, a query expansion approach based on semantic vector representation was proposed. First, a semantic vector representation model was designed to learn the semantic vector representations of words from their contexts in corpus. Then, the similarities between words were computed with their semantic representations. Afterwards, the most similar words were selected from the corpus as the expansion terms to enrich the queries. Finally, a search system of biomedical literatures was built based on this expansion approach and compared with the traditional query expansion approaches based on Wikipedia or WordNet and the BioASQ participants along with the significant difference analysis. The comparison experimental results indicate that the proposed query expansion approach based on semantic vector representations outperforms the baselines, and the mean average precision increases by at least one percentage point; furthermore, the search system performs better than the BioASQ participants significantly.

Key words: query expansion, semantic representation learning, biomedical document, information retrieval, natural language processing

摘要: 针对传统查询扩展方法在专业领域中扩展词与原始查询之间缺乏语义关联的问题,提出一种基于语义向量表示的查询扩展方法。首先,构建了一个语义向量表示模型,通过对语料库中词的上下文语义进行学习,得到词的语义向量表示;其次,根据词语义向量表示,计算词之间的语义相似度;然后,选取与查询中词汇的语义最相似的词作为查询的扩展词,扩展原始查询语句;最后,基于提出的查询扩展方法构建了生物医学文档检索系统,针对基于维基百科或WordNet的传统查询扩展方法和BioASQ 2014—2015参加竞赛的系统进行对比实验和显著性差异指标分析。实验结果表明,基于语义向量表示查询扩展的检索方法所得到结果优于传统查询扩展方法的结果,平均准确率至少提高了1个百分点,在与竞赛系统的对比中,系统的效果均有显著性提高。

关键词: 查询扩展, 语义表示学习, 生物医学文档, 信息检索, 自然语言处理

CLC Number: