传统的查询扩展方法由于忽略了词之间的语义关系,在不规范的短小关键字上补充扩展的词已经无法达到预期目标。Linked Data技术利用资源描述框架(RDF)图模型形成Linked Open Data Cloud,能提供更多语义信息。针对查询扩展忽略语义的问题,提出了一种基于语义属性特征图的查询扩展方法。该方法将语义网与图的思想融合,利用以DBpedia资源为顶点的属性图加以扩展。首先,通过有监督的学习训练出15种语义属性特征的权重,用于表达扩展资源的有用性;然后,在整个DBpedia图上通过标签属性实现查询关键字到DBpedia匹配资源的映射;再根据属性特征广度搜索出邻接点,并将其作为扩展候选词,最后筛选出词相关行分值最高的作为最终扩展词。实验表明,与LOD Keyword Expansion方法相比,基于语义属性特征图的扩展方法召回率达到0.89,平均逆排序(MRR)提高4个百分点,与用户查询更匹配。
Because of ignoring the semantic relations between words, traditional query expansion methods cannot achieve the desired goals to expand right keywords in the nonstandard short term. Linked Data technology exploits the graph structure of RDF (Resource Description Framework) to form Linked Open Data Cloud, and provides more semantic information. In order to take into account the semantic relationships, a new query expansion method based on semantic property feature graph was proposed by combining semantic Web and graph. It used DBpedia resources as nodes to build a RDF attribute graph in which the relevance of a node was given by its relations. First, 15 kinds of semantic property weights for expressing semantic similarities between resources were obtained by supervised learning. Then, the query keywords were mapped to DBpedia resources based on the labelling properties in the whole graph of DBpedia. According to semantic features, the neighbor nodes were found out by breadth-first search and used as expansion candidate words. Eventually, the word sets with the highest relevance score values were selected as the query expansion terms. The experimental results show that compared with LOD Keyword Expansion, the proposed method based on semantic graph achieves recall of 0.89 and provides an increase of 4% in Mean Reciprocal Rank (MRR), which offers a better matching result to users.
[1] TANG X, FANG X. Microblog retrieval based on semantic query expansion [J]. Information and Documentation Services, 2014, 35(2): 34-38. (唐晓波,房小可. 基于语义查询扩展的微博检索[J]. 情报资料工作, 2014, 35(2): 34-38.) [2] ZOU Y. Topic detection and tracking based on semantic framework [D]. Beijing: Beijing University of Posts and Telecommunications, 2012: 1-4. (邹扬. WAF改进算法在基于语义分析的查询扩展上的应用[D].北京:北京邮电大学,2012: 1-4.) [3] CARPINETO C, ROMANO G. A survey of automatic query expansion in information retrieval [J]. ACM Computing Surveys, 2012, 44(1): Article No. 1. [4] SONG M, SONG I-Y, HU X, et al. Integration of association rules and ontologies for semantic query expansion [J]. Data & Knowledge Engineering, 2007, 63(1): 63-75. [5] BAST H, MAJUMDAR D, WEBER I. Efficient interactive query expansion with complete search [C]//CIKM '07: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management. New York: ACM, 2007: 857-860. [6] YANG H. Query expansion based on semantic concept tree and local context analysis [J]. Journal of Wuhan University of Technology: Information & Management Engineering,2011,33(1):79-82. (杨海南. 基于语义概念树和局部上下文分析的查询扩展 [J]. 武汉理工大学学报:信息与管理工程版,2011,33(1):79-82.) [7] YIN Z, SHOKOUHI M, CRASWELL N. Query expansion using external evidence [M]//ECIR 2009: Proceedings of the 31th European Conference on Information Retrieval Research, LNCS 5478. Berlin: Springer-Verlag, 2009: 362-374. [8] ARGUELLO J, ELSAS J L, CALLAN J, et al. Document representation and query expansion models for blog recommendation [C]//ICWSM 2008:Proceedings of the 2nd Intenational Conference on Weblogs and Social Media.Menlo Park:AAAI Press,2008:10-18. [9] XU Y, JONES G J F, WANG B. Query dependent pseudo-relevance feedback based on Wikipedia [C]//SIGIR '09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009: 59-66. [10] SHEKARPOUR S, HOFFNER K, LEHMANN J, et al. Keyword query expansion on linked data using linguistic and semantic features [C]//ICSC 2013: Proceedings of the 2013 IEEE Seventh International Conference on Semantic Computing. Piscataway: IEEE, 2013: 191-197. [11] LIU W, XIA C, ZHANG C. Big data and linked data: the emerging data technology for the future of librarianship [J]. New Technology of Library and Information Service,2013(4):2-9. (刘炜,夏翠娟,张春景. 大数据与关联数据:正在到来的数据技术革命[J]. 现代图书情报技术,2013(4):2-9.) [12] BIZER C, HEATH T, BERNERS-LEE T. Linked data — the story so far [J]. International Journal on Semantic Web and Information Systems, 2009, 5(3): 1-22. [13] FREITAS A, CURRY E, O'RIAIN S. A distributional approach for terminological semantic search on the linked data Web [C]//SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing. New York: ACM, 2012: 384-391. [14] AUGENSTEIN I, GENTILE A L, NORTON B, et al. Mapping keywords to linked data resources for automatic query expansion [C]//Proceedings of the Semantic Web: ESWC 2013 Satellite Events, LNCS 7955. Berlin: Springer-Verlag, 2013: 101-112. [15] LEHMANN J, ISELE R, JAKOB M, et al. DBpedia — A large-scale, multilingual knowledge base extracted from Wikipedia [J]. Semantic Web, 2012, 1: 1-5. [16] SUN H. The research on the indexing for rdf [D]. Changchun: Jilin University, 2010: 3-7. (孙航. RDF索引结构研究[D]. 长春: 吉林大学, 2010: 3-7.) [17] LIU C. Research on key technology of patent information acquisition and analysis system [D]. Beijing: Beijing University of Technology,2009:12-14. (刘晨. 专利信息获取与分析系统关键技术研究[D].北京: 北京工业大学, 2009:12-14.) [18] VIDAL J C, LAMA M, OTERO-GARCIA E, et al. Graph-based semantic annotation for enriching educational content with linked data [J]. Knowledge-Based Systems, 2014, 55: 29-42. [19] WANG Z, HUO Y, DENG W. Personalized query expansion based on environment information for mobile search [J]. Computer Science, 2013, 40(9):182-184,189. (王忠民,霍艺伟,邓万宇.基于环境信息的移动搜索个性化查询扩展[J].计算机科学,2013,40(9):182-184,189.)