计算机应用 ›› 2019, Vol. 39 ›› Issue (2): 360-369.DOI: 10.11772/j.issn.1001-9081.2018061262

• 人工智能 • 上一篇    下一篇

基于资源描述框架图切分与顶点选择性的高效子图匹配方法

关皓元, 朱斌, 李冠宇, 蔡永嘉   

  1. 大连海事大学 信息科学技术学院, 辽宁 大连 116026
  • 收稿日期:2018-06-19 修回日期:2018-09-11 出版日期:2019-02-10 发布日期:2019-02-15
  • 通讯作者: 朱斌
  • 作者简介:关皓元(1993-),男,辽宁沈阳人,硕士研究生,CCF会员,主要研究方向:智能信息处理;朱斌(1969-),男,辽宁大连人,副教授,硕士,CCF会员,主要研究方向:智能信息处理;李冠宇(1963-),男,辽宁丹东人,教授,博士,CCF会员,主要研究方向:智能信息处理;蔡永嘉(1995-),男,江西乐平人,硕士研究生,CCF会员,主要研究方向:智能信息处理。
  • 基金资助:
    国家自然科学基金资助项目(61371090)。

Efficient subgraph matching method based on resource description framework graph segmentation and vertex selectivity

GUAN Haoyuan, ZHU Bin, LI Guanyu, CAI Yongjia   

  1. Information Science and Technology College, Dalian Maritime University, Dalian Liaoning 116026, China
  • Received:2018-06-19 Revised:2018-09-11 Online:2019-02-10 Published:2019-02-15
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61371090).

摘要: 在SPARQL查询过程中,含有复杂结构的资源描述框架(RDF)图的查询效率低下。为此,通过分析几种RDF图的基本结构与RDF顶点的选择性,提出RDF三元组模式选择性(RTPS)——一种基于RDF顶点选择性的图结构切分规则,以提高面向RDF图的子图匹配效率。首先,根据谓词结构在数据图与查询图中的通性建立RDF相邻谓词路径(RAPP)索引,将数据图结构转化为传入-传出双向谓词路径结构以确定查询顶点的搜索空间,并加快顶点的过滤;接着,通过整数线性规划(ILP)问题计算建模将复杂RDF查询图结构分解为若干结构简单的查询子图,通过分析RDF顶点在查询图中的相邻子图结构与特征,确立查询顶点的选择性以确定最优切分方式;然后,通过RDF顶点选择性与相邻子图的结构特征来缩小查询顶点的搜索空间范围,并在数据图中找到符合条件的RDF顶点;最后,遍历数据图以找到与查询子图结构相匹配的子图结构,将得到的子图进行连接并将其作为查询结果输出。实验采用控制变量法,比较了RTPS、RDF子图匹配(RSM)、RDF-3X、GraSS与R3F的查询响应时间。实验结果充分表明,与其他4种方法相比,当查询图复杂度高于9时,RTPS的查询响应时间更短,具有更高的查询效率。

关键词: SPARQL查询处理, 资源描述框架, 子图匹配, 图结构切分, 顶点选择性

Abstract: As the graph-based query in SPARQL query processing becames more and more inefficient due to the increasing structure complexity of Resource Description Framework (RDF) in the graph, by analyzing the basic structure of RDF graphs and the selectivity of the RDF vertices, RDF Triple Patterns Selectivity (RTPS) was proposed to improve the efficienccy of subgraph matching for graph with RDF, which is a graph structure segmentation rule based on selectivity of RDF vertices. Firstly, according the commonality of the predicate structure in the data graph and the query graph, an RDF Adjacent Predicate Path (RAPP) index was built, and the data graph structure was transformed into incoming-outgoing predicate path structure to determine the search space of query vertices and speed up the filtering of RDF vertices. Secondly, the model of Integer Linear Programming (ILP) problem was built to divide a RDF query graph with complicated structure into several query subgraphs with simple structure. By analyzing the structure characteristics of the RDF vertices in the adjacent subgraphs, the selectivity of the query vertices was established and the optimal segmentation method was determined. Thirdly, with the searching space narrowed down by the RDF vertex selectivity and structure characteristics of adjacent subgraphs, the matchable RDF vertices in the data graph were found. Finally, the RDF data graph was traversed to find the subgraphs whose structure matched the structure of query subgraphs. Then, the result graph was output by joining the subgraphs together. The controlling variable method was used in the experiment to compare the query response time of RTPS, RDF Subgraph Matiching (RSM), RDF-3X, GraSS and R3F. The experimental results show that, compared with the other four methods, when the number of triple patterns in a query graph is more than 9, RTPS has shorter query response time and higher query efficiency.

Key words: SPARQL query processing, Resource Description Framework (RDF), subgraph matching, graph structure segmentation, vertex selectivity

中图分类号: