计算机应用 ›› 2019, Vol. 39 ›› Issue (1): 46-50.DOI: 10.11772/j.issn.1001-9081.2018071594

• 2018年全国开放式分布与并行计算学术年会(DPCS 2018)论文 • 上一篇    下一篇

SQM:基于Spark的大规模单图上的子图匹配算法

李龙洋1, 董一鸿1, 施炜杰1, 潘剑飞1,2   

  1. 1. 宁波大学 信息科学与工程学院, 浙江 宁波 315211;
    2. 北京百度在线科技有限公司, 北京 100084
  • 收稿日期:2018-07-19 修回日期:2018-08-09 出版日期:2019-01-10 发布日期:2019-01-21
  • 通讯作者: 董一鸿
  • 作者简介:李龙洋(1993-),男,江苏南京人,硕士研究生,主要研究方向:数据挖掘、机器学习;董一鸿(1969-),男,浙江宁波人,教授,博士,主要研究方向:大数据、数据挖掘、人工智能;施炜杰(1993-),男,浙江宁波人,硕士研究生,主要研究方向:数据挖掘、机器学习;潘剑飞(1991-),男,安徽黄山人,工程师,硕士,主要研究方向:大数据、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61572266);浙江省自然科学基金资助项目(LY16F020003);宁波市自然科学基金资助项目(2017A610114)。

SQM: subgraph matching algorithm for single large-scale graphs under Spark

LI Longyang1, DONG Yihong1, SHI Weijie1, PAN Jianfei1,2   

  1. 1. College of Information Science and Engineering, Ningbo University, Ningbo Zhejiang 315211, China;
    2. Baidu Online Technology Company Limited, Beijing 100084, China
  • Received:2018-07-19 Revised:2018-08-09 Online:2019-01-10 Published:2019-01-21
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61572266), the Natural Science Foundation of Zhejiang Province (LY16F020003), the Natural Science Foundation of Ningbo City (2017A610114).

摘要: 针对大规模数据图下基于回溯法的子图查询算法的准确率低、开销大等问题,为提高查询准确率,降低大图下的查询开销,提出一种基于Spark的子图匹配(SQM)算法。首先根据结构信息过滤数据图,再将查询图分割成基本查询单元;然后对每一个基本查询单元分别匹配后进行Join操作;最后运用并行化提高了算法的运行效率,减小了搜索空间。实验结果表明,与Stwig、TurboISO算法相比,SQM算法在保证查询结果不变的情况下,速度提高了50%。

关键词: 子图匹配, 图分割, 大规模单图, 并行化, Spark

Abstract: Focusing on low accuracy and high costs of backtracking-based subgraph query algorithm applied to large-scale graphs, a Spark-based Subgraph Query Matching (SQM) algorithm was proposed to improve query accuracy and reduce query overhead for large graphs. The data graph was firstly filtered according to structure information, and the query graph was divided into basic query units. Then each basic query unit was matched and joined together. Finally, the algorithm's efficiency was improved and search space was reduced by parallelization. The experimental results show that compared with Stwig (Sub twig) algorithm and TurboISO algorithm, SQM algorithm can increase the speed by 50% while ensuring the same query results.

Key words: subgraph matching, graph segmentation, single large-scale graph, parallelization, Spark

中图分类号: