Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (8): 2464-2469.DOI: 10.11772/j.issn.1001-9081.2024081164

• The 21th CCF Conference on Web Information Systems and Applications (WISA 2024) • Previous Articles    

Diversity semantic query on resource description framework graphs based on multi-level neighborhood predicate label tree encoding index

Jiantao JIANG, Baoyan SONG, Xiaohuan SHAN()   

  1. Faculty of Information,Liaoning University,Shenyang Liaoning 110036,China
  • Received:2024-08-19 Revised:2024-09-09 Accepted:2024-09-11 Online:2024-11-07 Published:2025-08-10
  • Contact: Xiaohuan SHAN
  • About author:JIANG Jiantao,born in 1999, M. S. candidate. His research interests include graph data management.
    SONG Baoyan, born in 1965, Ph. D., professor. Her research interests include database, graph data management.
  • Supported by:
    National Key Research and Development Program of China(2023YFC3304900)

多级邻域谓语标签树编码索引的资源描述框架图多元语义查询

蒋建涛, 宋宝燕, 单晓欢()   

  1. 辽宁大学 信息学部,沈阳 110036
  • 通讯作者: 单晓欢
  • 作者简介:蒋建涛(1999—),男,山东东营人,硕士研究生,主要研究方向:图数据管理
    宋宝燕(1965—),女,辽宁开原人,教授,博士,CCF高级会员,主要研究方向:数据库、图数据管理
  • 基金资助:
    国家重点研发计划项目(2023YFC3304900);辽宁省应用基础研究计划项目(2022JH2/101300250)

Abstract:

Knowledge graphs is a semantic network to reveal the relationships between entities, which is often expressed in the form of Resource Description Framework (RDF). Faced with the explosive growth of information, diversity semantic query requirements are ignored by the existing semantic query algorithm on RDF graphs. Therefore, considering the rich semantic information of RDF graphs, a Diversity Semantic Query method with distributed processing based on multi-level Neighborhood Predicate Label Tree Encoding index (NPLTE) on RDF graphs (DSQ-NPLTE) was proposed. Firstly, to avoid wasting storage space and assist subsequent parallel queries, a frequency-based predicate encoding and mapping strategy was designed to map the predicates represented by long strings to unique natural number representation. Secondly, after partitioning the RDF graph, the obtained vertices were classified according to their adjacent edge properties, and the corresponding storage modes were given. Thirdly, a multi-level NPLTE was proposed to filter invalid vertices and edges by the use of predicate feature information. Finally, for diversity semantic queries with known predicate, known subject (object) and known mixture, the corresponding matching strategies were given, and an optimal connection based on common vertex was proposed to reduce Cartesian product number and thereby decreasing the cost of connection. Experimental results show that compared with the method without preprocessing, the query efficiency of the proposed method can be improved by 5 to 9 times through using the constructed index for pruning optimization; compared with FAST method on three LUBM standard synthetic datasets of different sizes, the proposed method has the query efficiency improved by 43% on average. It can be seen that the proposed index and query strategy can deal with diversity semantic queries on large-scale RDF graphs effectively.

Key words: Resource Description Framework (RDF) graph, variable length encoding, predicate label tree, diversity semantic query

摘要:

知识图谱是揭示实体之间关系的语义网络,常以资源描述框架(RDF)的形式表示。面对爆炸式增长的海量信息,现有的RDF图上的语义查询算法忽略了多元化的语义查询需求,因此,充分考虑RDF图丰富的语义信息,提出一种分布式处理的多级邻域谓语标签树编码索引(NPLTE)的RDF图多元语义查询方法(DSQ-NPLTE)。首先,为了避免存储空间的浪费且辅助后续的并行查询,设计基于频度的谓语编码映射策略,从而将较长字符串表示的谓语映射为唯一的自然数表示;其次,将RDF图分割后,将得到的顶点按它的邻边特性进行分类,并给出相应的存储模式;再次,构建多级NPLTE,利用谓语特征信息过滤无效顶点及边;最后,针对谓语已知、主语(宾语)已知和混合已知的多元语义查询,给出相应的匹配策略,并提出基于公共点的优化连接以减少笛卡儿积的数量,从而降低连接代价。实验结果表明,相较于无预处理方式,通过利用构建的索引进行剪枝优化,所提方法的查询效率可提高5~9倍;在3个不同规模的LUBM标准合成数据集上,与查询性能较好的FAST方法相比,所提方法的查询效率平均提高了43%。可见,构建的索引及查询策略可有效处理大规模RDF图上的多元化语义查询。

关键词: 资源描述框架图, 不等长编码, 谓语标签树, 多元语义查询

CLC Number: