计算机应用 ›› 2013, Vol. 33 ›› Issue (01): 215-218.DOI: 10.3724/SP.J.1087.2013.00215

• 先进计算 • 上一篇    下一篇

基于Sector/Sphere的气相色谱-质谱联用多样本并行对齐算法

杨辉华1,任洪军1,李灵巧1,段礼新2,郭拓1,杜玲玲1,漆小泉2   

  1. 1. 桂林电子科技大学 电子工程与自动化学院,广西 桂林 541004
    2. 中国科学院植物研究所 植物分子生理学重点实验室, 北京 100093
  • 收稿日期:2012-07-05 修回日期:2012-08-12 出版日期:2013-01-01 发布日期:2013-01-09
  • 通讯作者: 杨辉华
  • 作者简介:杨辉华(1972-),男,湖南常德人,教授,主要研究方向:机器学习与数据挖掘、云计算与物联网;任洪军(1988-),男,山东威海人,硕士,主要研究方向:高性能数据挖掘及算法;李灵巧(1986-),男,四川达州人,硕士,主要研究方向:机器学习、数据挖掘;段礼新(1980-),男,湖北武汉人,博士,主要研究方向:植物代谢组学。
  • 基金资助:

    国家自然科学基金资助项目(30860381, 31200227);广西自然科学基金资助项目(2012GXNSFAA053230);国家863计划项目(2012AA10A304);广西高等学校优秀人才资助计划项目(桂教人[2011]40号);广西可信软件重点实验室开放基金资助项目(kx201121)

Multiple samples alignment for GC-MS data in parallel on Sector/Sphere

YANG Huihua1,REN Hongjun1,LI Lingqiao1,DUAN Lixin2,GUO Tuo1,DU Lingling1,QI Xiaoquan2   

  1. 1. School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
    2. Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
  • Received:2012-07-05 Revised:2012-08-12 Online:2013-01-01 Published:2013-01-09
  • Contact: YANG Huihua

摘要: 针对气相色谱-质谱联用(GC-MS)数据处理过程复杂且计算量大、处理时间过长而严重拖延实验进度的问题,以多样本保留时间对齐为例,设计了基于分布式平台Sector/Sphere的GC-MS数据处理并行框架,实现了多样本并行对齐算法。首先分布式计算所有样本的相似度矩阵;然后依据层次聚类原理将原样本集划分为小样本集,分布式对齐各小样本集内部的样本;最后以各小样本集的平均样本作为对齐依据合并各样本集的对齐结果。实验结果表明:多样本并行对齐算法的错误率为2.9%,由4台PC组成的集群处理大量样本时,最高加速比达到3.29;能够在保证较高正确率的前提下提升计算速度,解决处理时间过长的问题。

关键词: Sector/Sphere平台, 分布式计算, 并行框架, 多样本对齐

Abstract: To deal with the problem that the process of Gas Chromatography-Mass Spectrography (GC-MS) data is complex and time consuming which delays the whole experimental progress, taking the alignment of multiple samples as an example, a parallel framework for processing GC-MS data on Sector/Sphere was proposed, and an algorithm of aligning multiple samples in parallel was implemented. First, the similarity matrix of all the samples was computed, then the sample set was divided into small sample sets according to hierarchical clustering and samples in each set were aligned respectively, finally the results of each set were merged according to the average sample of the set. The experimental results show that the error rate of the parallel alignment algorithm is 2.9% and the speedup ratio reaches 3.29 using the cluster with 4 PC, which can speed up the process at a high accuracy, and handle the problem that the processing time is too long.

Key words: Sector/Sphere platform, distributed computation, parallel framework, multiple samples alignment

中图分类号: