《计算机应用》唯一官方网站 ›› 2020, Vol. 40 ›› Issue (2): 398-403.DOI: 10.11772/j.issn.1001-9081.2019081387

• 2019年全国开放式分布与并行计算学术年会(DPCS 2019)论文 • 上一篇    下一篇

Graphlet Degree Vector方法的优化与并行

宋祥帅1, 杨伏长1, 谢江1(), 张武1,2   

  1. 1.上海大学 计算机工程与科学学院,上海 200444
    2.上海大学 上海市应用数学与力学研究所,上海 200444
  • 收稿日期:2019-07-31 修回日期:2018-08-13 接受日期:2019-09-17 发布日期:2019-09-29 出版日期:2020-02-10
  • 通讯作者: 谢江
  • 作者简介:宋祥帅(1995—),男,山东聊城人,硕士研究生,CCF会员,主要研究方向:生物信息学,机器学习
    杨伏长(1994—),男,福建宁德人,硕士研究生,CCF会员,主要研究方向:生物信息学
    张武(1957—),男,江西武宁人,教授,博士,CCF杰出会员,主要研究方向:高性能计算、生物信息学、计算流体力学。
  • 基金资助:
    国家自然科学基金面上项目(61873156)

Optimization and parallelization of Graphlet Degree Vector method

Xiangshuai SONG1, Fuzhang YANG1, Jiang XIE1(), Wu ZHANG1,2   

  1. 1.School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China
    2.Shanghai Institute of Applied Mathematics and Mechanics (Shanghai University),Shanghai 200444,China
  • Received:2019-07-31 Revised:2018-08-13 Accepted:2019-09-17 Online:2019-09-29 Published:2020-02-10
  • Contact: Jiang XIE
  • About author:SONG Xiangshuai, born in 1995, M. S. candidate. His research interests include bioinformatics, machine learning.
    YANG Fuzhang, born in 1994, M. S. candidate. His research interests include bioinformatics.
    ZHANG Wu, born in 1957, Ph. D., professor. His research interests include high performance computing, bioinformatics, computational fluid mechanics.
  • Supported by:
    Surface Program of National Natural Science Foundation of China(61873156)

摘要:

Graphlet Degree Vector (GDV)是一种研究生物网络的重要方法,能揭示生物网络中各节点与其局部网络结构的相关性,但随着需要挖掘的自同构轨道数量的增加以及生物网络规模的增大,GDV方法的时间复杂度会呈指数级增长。针对这个问题,在现有串行GDV方法的基础上,实现了基于消息传递接口(MPI)的GDV方法并行化;此外又将GDV方法进行了改进并将改进后的方法实现了并行优化,改进后的方法在寻找不同节点自同构轨道的过程中优化了计算过程以解决重复计算的问题,同时结合负载均衡策略合理分配任务。模拟网络数据和真实生物网络数据上的实验结果表明,并行化的GDV方法与改进后的并行化GDV方法都具有较好的并行性能,并且对不同类型不同规模的网络都具有较强的适用性,扩展性强,可有效地保持寻找网络中自同构轨道的高效率。

关键词: Graphlet Degree Vector方法, 生物网络, 自同构轨道, 子图枚举, 并行化, 消息传递接口

Abstract:

Graphlet Degree Vector (GDV) is an important method for studying biological networks, and can reveal the correlation between nodes in biological networks and their local network structures. However, with the increasing number of automorphic orbits that need to be researched and the expanding biological network scale, the time complexity of the GDV method will increase exponentially. To resolve this problem, based on the existing serial GDV method, the parallelization of GDV method based on Message Passing Interface (MPI) was realized. Besides, the GDV method was improved and the parallel optimization of the optimized method was realized. The calculation process was optimized to solve the problem of double counting when searching for automorphic orbits of different nodes by the improved method, at the same time, the tasks were allocated reasonably combining with the load balancing strategy. Experimental results of simulated network data and real biological network data indicate that parallel GDV method and the improved parallel GDV method both obtain better parallel performance, they can be widely applied to different types of networks with different scales, and have good scalability. As a result, they can effectively maintain the high efficiency of searching for automorphic orbits in the network.

Key words: Graphlet Degree Vector (GDV) method, biological network, automorphic orbit, subgraph enumeration, parallelization, Message Passing Interface (MPI)

中图分类号: