Journal of Computer Applications ›› 2012, Vol. 32 ›› Issue (04): 1090-1093.DOI: 10.3724/SP.J.1087.2012.01090

• Database technology • Previous Articles     Next Articles

XML keyword search algorithm based on smallest lowest entity sub-tree interrelated

YAO Quan-zhu,YU Xun-bin   

  1. School of Computer Science and Engineering, Xi’an University of Technology, Xi’an Shaanxi 710048, China
  • Received:2011-09-14 Revised:2011-11-16 Online:2012-04-20 Published:2012-04-01
  • Contact: YU Xun-bin

基于最小相关实体子树的XML关键字查询算法

姚全珠,余训滨   

  1. 西安理工大学 计算机科学与工程学院,西安 710048
  • 通讯作者: 余训滨
  • 作者简介:姚全珠(1960-),男,陕西周至人,教授,博士,主要研究方向:数据库、软件工程方法学、自然语言处理、机器学习;余训滨(1987-),男,江西九江人,硕士研究生,主要研究方向:数据库。

Abstract: A query algorithm of semantic relativity was proposed in this paper, with regard to many meaningless nodes contained in the present results of XML keywords retrieval. Based on the characteristics of semi-structure and self-description of XML files, the concept of Smallest Lowest Entity Sub-Tree (SLEST), in which only physical connection exists between keywords, was put forward by making full use of semantic correlation between nodes. Based on Smallest Interrelated Entity Sub-Tree (SIEST), an algorithm, in which the result was represented by SLEST and SIEST instead of Smallest Lowest Common Ancestor (SLCA), was proposed to capture the IDREF relation between keywords. The result shows that the algorithm proposed in this paper can increase the precision of XML keyword retrieval.

Key words: Smallest Lowest Entity Sub-Tree (SLEST), Smallest Interrelated Entity Sub-Tree (SIEST), XML keyword query, XML database, semantic relativity

摘要: 针对目前XML关键字查询结果中包含了许多无意义的节点的问题,提出了一种语义相关的查询算法。由于XML文档具有半结构化和自描述的特点,通过充分利用节点间的语义相关性,提出了最小最低实体子树(SLEST)的概念,在这个概念中,关键字之间仅存在物理连接关系;为了捕获关键字之间的IDREF引用关系,提出基于最小相关实体子树(SIEST)的算法,并利用最小最低实体子树和最小相关实体子树代替最小最低公共祖先(SLCA)作为查询结果。实验结果表明,提出的算法能有效提高XML关键字查询结果的查准率。

关键词: 最小最低实体子树, 最小相关实体子树, XML关键字查询, XML数据库, 语义相关性

CLC Number: