《计算机应用》唯一官方网站

• •    下一篇

多级邻域谓语标签树编码索引的RDF图多元语义查询

单晓欢1,蒋建涛1,宋宝燕2   

  1. 1. 辽宁大学
    2. 辽宁大学 信息学院, 沈阳 110036
  • 收稿日期:2024-08-19 修回日期:2024-09-09 发布日期:2024-11-07 出版日期:2024-11-07
  • 通讯作者: 宋宝燕
  • 基金资助:
    辽宁省应用基础研究计划;国家重点研发计划项目

Diversity semantic query on RDF graphs based on multilevel neighborhood predicate label tree encoding index

  • Received:2024-08-19 Revised:2024-09-09 Online:2024-11-07 Published:2024-11-07
  • Contact: SONG Baoyan

摘要: 知识图谱是揭示实体之间关系的语义网络,常以资源描述框架( Resource Description Framework,RDF)的形式表示。面对爆炸式增长的海量信息,现有RDF图上的语义查询算法忽略了多元化的语义查询需求。为此,充分考虑RDF图丰富的语义信息,提出一种分布式处理的多级邻域谓语标签树编码索引的RDF图多元语义查询方法。首先,为避免存储空间浪费及辅助后续并行查询,设计基于频度的谓语编码映射策略,将较长字符串表示的谓语映射为唯一自然数表示;其次,将RDF图分割,分割后顶点按其邻边特性进行分类,并给出相应的存储模式;然后构建多级邻域谓语标签树编码索引,利用谓语特征信息过滤无效顶点及边;针对谓语已知、主语(宾语)已知以及混合已知的多元语义查询,给出相应的匹配策略,并提出基于公共点的优化连接以减少笛卡尔积数量,从而降低连接代价。实验结果表明,查询时间相较于无预处理方式,利用构建的索引进行剪枝优化可提高5 ~9倍;与查询性能较好的FAST算法相比,查询效率平均提高43%。因此,构建的索引及查询策略可有效处理大规模RDF图上的多元化语义查询。

关键词: RDF图, 不等长编码, 谓语标签树, 多元语义查询

Abstract: Knowledge graphs are semantic networks to reveal the relationships between entities, which are often expressed in the form of Resource Description Framework (RDF). Faced with the explosive growth of information, the existing approaches have the problem of ignoring the diversified semantic query requirements. Therefore, considering the rich se-mantic information of RDF graphs, a diversity semantic query with distributed processing based on multilevel neighborhood predicate label tree encoding index on RDF graphs is proposed. To avoid wasting storage space and assist subsequent parallel queries, a frequency-based predicate encoding mapping strategy is designed to map the predicates represented by long strings to unique natural number. Then the RDF graph is partition, the vertices are classified according to their adjacent edge properties and the corresponding storage mode is designed. After that, a multi-level neighborhood predicate label tree encoding index is proposed to filter invalid vertices and edges. For diversity semantic query with known predicate, known subject/object and known mixture, the corresponding matching strategies are given. An optimal connection based on common vertex is proposed to reduce cartesian product number and the cost of connection. The experimental results show that the query time can be improved by 5 to 9 times by using the constructed index for pruning optimization compared with without preprocessing. And compared with FAST algorithm, the query efficiency is improved by 43% on average. So the proposed index and query strategy can effectively deal with diversity semantic query on large-scale RDF graphs.

Key words: RDF graph, variable length encoding, predicate label tree, diversity semantic query

中图分类号: