计算机应用 ›› 2017, Vol. 37 ›› Issue (12): 3397-3400.DOI: 10.11772/j.issn.1001-9081.2017.12.3397

• 先进计算 • 上一篇    下一篇

三角形的并行枚举算法

王卓, 索勃, 潘巍   

  1. 西北工业大学 计算机学院, 西安 710129
  • 收稿日期:2017-07-04 修回日期:2017-09-05 出版日期:2017-12-10 发布日期:2017-12-18
  • 通讯作者: 潘巍
  • 作者简介:王卓(1984-),男,河南濮阳人,博士研究生,CCF会员,主要研究方向:图数据管理、分布式计算平台;索勃(1987-),男(满族),辽宁锦州人,博士研究生,主要研究方向:图数据管理;潘巍(1978-),男,陕西西安人,副教授,博士,CCF会员,主要研究方向:内存计算。
  • 基金资助:
    中国科技部国家重点研发计划项目(2016YFB1000703);国家863重大项目(2015AA015307);国家自然科学基金重点项目(61332006);国家自然科学基金面上项目(61672432,61472321);国家自然科学基金青年项目(61502390)。

Parallel algorithm for triangle enumeration

WANG Zhuo, SUO Bo, PAN Wei   

  1. School of Computer Science, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China
  • Received:2017-07-04 Revised:2017-09-05 Online:2017-12-10 Published:2017-12-18
  • Supported by:
    This work is partially supported by the Ministry of Science and Technology of China, National Key Research and Development Program (2016YFB1000703), the National High Technology Research and Development Program of China (863 Program) (2015AA015307), the Key Program of National Natural Science Foundation of China (61332006), the General Program of National Natural Science Foundation of China (61672432, 61472321), the National Natural Science Foundation of China for Young Scholars (61502390).

摘要: 经典GT算法是三角形并行枚举算法的MapReduce实现,然而该算法只能枚举全图的三角形结构,对部分顶点构成的三角形结构无法直接进行枚举。针对此问题,提出一种直接枚举部分顶点构成三角形结构的并行算法。首先,通过分析被选点的分布,给出被选点构成三角形的所有组合集合;然后,通过对该集合的筛选,实现对部分点构成三角形结构的直接枚举;最后,将该算法在Spark系统实现,以实现该算法的高效性和广泛性。在人工生成数据集和真实数据集上与GT算法进行对比实验,实验结果表明,所提改进算法的运行时间只有GT算法运行时间的1/3,在Spark上的运行时间仅是Hadoop上运行时间的1/7。该算法可用于更高效地直接生成图中任意点所构成的三角形数据集。

关键词: 三角形枚举, 大规模图数据, MapReduce, 部分点枚举, Spark

Abstract: The classical Graph Twiddling (GT) algorithm is the MapReduce implementation of triangle parallel enumeration algorithm. However, the GT algorithm can only enumerate the triangle structure of whole graph and can not enumerate the triangle structure of candidate vertexes directly. To solve the problem, a parallel algorithm was proposed for directly enumerating the triangle structure of candidate vertexes. Firstly, the set of all the combinations of candidate vertexes for forming triangle was given by analyzing the distribution of candidate vertexes. Then, through the screening of the set, the triangle structure of candidate vertexes was directly enumerated. Finally, the proposed algorithm was implemented on Spark to achieve high efficiency and popularity. The contrast experiment was completed on artificial datasets and real datasets. The experimental results show that, compared with the GT algorithm, the running time of the proposed algorithm is only 1/3 of the running time of GT algorithm, and the running time on Spark is only 1/7 of the running time on Hadoop. The proposed algorithm can be used to generate the triangle dataset of any candidate vertex directly and efficiently.

Key words: triangle enumeration, large-scale graph data, MapReduce, candidate vertex enumeration, Spark

中图分类号: