计算机应用 ›› 2019, Vol. 39 ›› Issue (1): 72-77.DOI: 10.11772/j.issn.1001-9081.2018071655

• 2018年全国开放式分布与并行计算学术年会(DPCS 2018)论文 • 上一篇    下一篇

生物复杂网络motif发现的并行算法

杨伏长, 朱嘉富, 孙佳敏, 谢江   

  1. 上海大学 计算机工程与科学学院, 上海 200444
  • 收稿日期:2018-07-19 修回日期:2018-08-10 出版日期:2019-01-10 发布日期:2019-01-21
  • 通讯作者: 谢江
  • 作者简介:杨伏长(1994-),男,福建宁德人,硕士研究生,CCF会员,主要研究方向:生物信息学;朱嘉富(1997-),男,湖南衡阳人,主要研究方向:生物信息学;孙佳敏(1995-),女,江苏兴化人,硕士研究生,CCF会员,主要研究方向:生物信息学;谢江(1971-),女,湖北恩施人,副教授,博士,CCF高级会员,主要研究方向:生物信息学、高性能计算。
  • 基金资助:
    国家重点研发计划重点专项(2017YFB0701501);上海市自然科学基金资助项目(17ZR1409900)。

Parallel algorithm of biological complex network motifs discovery

YANG Fuzhang, ZHU Jiafu, SUN Jiamin, XIE Jiang   

  1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
  • Received:2018-07-19 Revised:2018-08-10 Online:2019-01-10 Published:2019-01-21
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2017YFB0701501), the Natural Science Foundation of Shanghai (17ZR1409900).

摘要: 生物复杂网络motif发现是一种研究生物网络的重要方法,它基于复杂网络的理论研究,以新的视角来研究生命现象和生命机制,但是在处理较大的网络规模或者需挖掘较大的motif时计算效率低。针对这个问题,在现有串行网络motif发现算法ESU的基础上,提出一种基于消息传递接口(MPI)的并行化ESU算法。该方法在ESU计算过程中优化了节点值以解决节点值依赖问题,并以ESU算法的子图发现策略统计各节点子图数,利用动态规划策略寻找最佳节点分配策略以解决负载不均衡问题。模拟网络数据和真实生物网络数据的实验结果表明,并行化ESU算法优化了节点值依赖问题,实现了基于动态规划的负载均衡策略,其运行时间比串行算法缩短了90%,并且该并行算法对不同类型不同规模的网络都具有较强的适用性,有效地提高了网络motif发现问题的计算效率。

关键词: 网络motif发现, 子图枚举, 同构比较, 并行化, 消息传递接口

Abstract: Biological complex network motifs discovery, based on theoretical research of complex networks, is an important method for studying biological networks, which provides a new perspective on life phenomena and life mechanisms. However, it computes inefficiently when dealing with large networks or mining big motifs. On the basis of existing serial ESU (Enumerate SUbgraph) algorithm of network motifs discovery, a parallelized ESU algorithm based on Message Passing Interface (MPI) was proposed. The node values in ESU algorithm were optimized to solve the problem of node value dependency, the number of subgraphs was counted by using subgraph discovery strategy of ESU algorithm, and a dynamic programming method was used to determine optimal node allocation strategy to satisfy load balancing. The experiments on simulated and biological networks show that the parallelized ESU algorithm addresses node value dependency and realizes a load balancing strategy, which saves more than 90% running time compared to serial algorithm. Furthermore, the parallel algorithm is suitable for different types and different scales of networks, and effectively improves computation efficiency of network motifs discovery.

Key words: network motifs discovery, Enumerate SUbgraph (ESU), homogeneous comparison, parallelization, Message Passing Interface (MPI)

中图分类号: