基于Sector/Sphere的气相色谱-质谱联用多样本并行对齐算法

doi:10.3724/SP.J.1087.2013.00215

计算机应用 ›› 2013, Vol. 33 ›› Issue (01): 215-218.DOI: 10.3724/SP.J.1087.2013.00215

基于Sector/Sphere的气相色谱-质谱联用多样本并行对齐算法

杨辉华¹,任洪军¹,李灵巧¹,段礼新²,郭拓¹,杜玲玲¹,漆小泉²

1. 桂林电子科技大学电子工程与自动化学院，广西桂林 541004
2. 中国科学院植物研究所植物分子生理学重点实验室，北京 100093

收稿日期:2012-07-05 修回日期:2012-08-12 出版日期:2013-01-01 发布日期:2013-01-09
通讯作者: 杨辉华
作者简介:杨辉华(1972-)，男，湖南常德人，教授，主要研究方向：机器学习与数据挖掘、云计算与物联网；任洪军(1988-)，男，山东威海人，硕士，主要研究方向：高性能数据挖掘及算法；李灵巧(1986-)，男，四川达州人，硕士，主要研究方向：机器学习、数据挖掘；段礼新(1980-)，男，湖北武汉人，博士，主要研究方向：植物代谢组学。
基金资助:
国家自然科学基金资助项目(30860381, 31200227);广西自然科学基金资助项目(2012GXNSFAA053230);国家863计划项目(2012AA10A304);广西高等学校优秀人才资助计划项目(桂教人［2011］40号);广西可信软件重点实验室开放基金资助项目(kx201121)

Multiple samples alignment for GC-MS data in parallel on Sector/Sphere

YANG Huihua¹,REN Hongjun¹,LI Lingqiao¹,DUAN Lixin²,GUO Tuo¹,DU Lingling¹,QI Xiaoquan²

1. School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
2. Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China

Received:2012-07-05 Revised:2012-08-12 Online:2013-01-01 Published:2013-01-09
Contact: YANG Huihua

摘要/Abstract

摘要： 针对气相色谱-质谱联用(GC-MS)数据处理过程复杂且计算量大、处理时间过长而严重拖延实验进度的问题，以多样本保留时间对齐为例，设计了基于分布式平台Sector/Sphere的GC-MS数据处理并行框架，实现了多样本并行对齐算法。首先分布式计算所有样本的相似度矩阵；然后依据层次聚类原理将原样本集划分为小样本集，分布式对齐各小样本集内部的样本；最后以各小样本集的平均样本作为对齐依据合并各样本集的对齐结果。实验结果表明：多样本并行对齐算法的错误率为2.9%，由4台PC组成的集群处理大量样本时，最高加速比达到3.29；能够在保证较高正确率的前提下提升计算速度，解决处理时间过长的问题。

关键词: Sector/Sphere平台, 分布式计算, 并行框架, 多样本对齐

Abstract: To deal with the problem that the process of Gas Chromatography-Mass Spectrography (GC-MS) data is complex and time consuming which delays the whole experimental progress, taking the alignment of multiple samples as an example, a parallel framework for processing GC-MS data on Sector/Sphere was proposed, and an algorithm of aligning multiple samples in parallel was implemented. First, the similarity matrix of all the samples was computed, then the sample set was divided into small sample sets according to hierarchical clustering and samples in each set were aligned respectively, finally the results of each set were merged according to the average sample of the set. The experimental results show that the error rate of the parallel alignment algorithm is 2.9% and the speedup ratio reaches 3.29 using the cluster with 4 PC, which can speed up the process at a high accuracy, and handle the problem that the processing time is too long.

Key words: Sector/Sphere platform, distributed computation, parallel framework, multiple samples alignment

中图分类号:

TP399

杨辉华任洪军李灵巧段礼新郭拓杜玲玲漆小泉. 基于Sector/Sphere的气相色谱-质谱联用多样本并行对齐算法[J]. 计算机应用, 2013, 33(01): 215-218.

YANG Huihua REN Hongjun LI Lingqiao DUAN Lixin GUO Tuo DU Lingling QI Xiaoquan. Multiple samples alignment for GC-MS data in parallel on Sector/Sphere[J]. Journal of Computer Applications, 2013, 33(01): 215-218.

参考文献

［1］TRYGG J, HOLMES E, LUNDSTEDT T. Chemometrics in metabonomics ［J］. Journal of Proteome Research, 2007, 6(2): 469-479.

［2］KOH Y, PASIKANTI K K, YAP C W, et al. Comparative evaluation of software for retention time alignment of gas chromatography/time-of-flight mass spectrometry-based metabonomic data ［J］. Journal of Chromatography A, 2010, 1217(52): 8308-8316.

［3］LOMMEN A. MetAlign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing ［J］. Analytical Chemistry, 2009, 81(8): 3079-3086.

［4］KATAJAMAA M, MIETTINEN J, ORESIC M. MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data ［J］. Bioinformatics, 2006, 22(5): 634-636.

［5］LUEDEMANN A, STRASSBURG K, ERBAN A, et al. TagFinder for the quantitative analysis of gas chromatography-mass spectrometry (GC-MS)-based metabolite profiling experiments ［J］. Bioinformatics, 2008, 24(5): 732-737.

［6］LANGMEAD B, SCHATZ M C, LIN J, et al. Searching for SNPs with cloud computing ［J］. Genome Biology, 2009, 10(11): 134.

［7］GU Y H, GROSSMAN R. Sector and sphere: the design and implementation of a high performance data cloud ［J］. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2009, 367(1897): 2429-2445.

［8］DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters ［J］. Communications of the ACM, 2005, 51(1): 107-113.

［9］ROBINSON M D, de SOUZA D P, KEEN W W, et al. A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments ［J］. BMC Bioinformatics, 2007, 8:419.

［10］de SOUZA D P, SAUNDERS E C, McCONVILLE M J. Progressive peak clustering in GC-MS Metabolomic experiments applied to Leishmania parasites ［J］. Bioinformatics, 2006, 22(11):1391-1396.

［11］ZHAO W Z, MA H F, HE Q. Parallel K-means clustering based on MapReduce ［C］// CloudCom'09: Proceedings of the 1st International Conference on Cloud Computing. Berlin: Springer-Verlag, 2009: 674-679.

［12］XU X W, JAGER J, KRIEGEL H P. A fast parallel clustering algorithm for large spatial databases ［J］. Data Mining and Knowledge Discovery, 1999, 3(3): 263-290.

[1]	王周恺, 张炯, 马维纲, 王怀军. 面向高速列车监测数据的并行解压缩算法[J]. 计算机应用, 2021, 41(9): 2586-2593.
[2]	赵永柱, 黎卫东, 唐斌, 梅峰, 卢文达. 面向期限感知分布式矩阵相乘的高效存储方案[J]. 计算机应用, 2020, 40(2): 311-315.
[3]	韩俊樱, 张振宇, 孔德仕. 移动群智感知中面向用户区域的分布式多任务分配方法[J]. 计算机应用, 2020, 40(2): 358-362.
[4]	潘鸣宇, 张禄, 龙国标, 李香龙, 马冬雪, 徐亮. 用于重复充电运营记录的基于块采样的高效聚集查询算法[J]. 计算机应用, 2018, 38(6): 1596-1600.
[5]	曾沁, 李永生. 基于分布式计算框架的风暴三维追踪方法[J]. 计算机应用, 2017, 37(4): 941-944.
[6]	丁梦苏, 陈世敏. 轻量级大数据运算系统Helius[J]. 计算机应用, 2017, 37(2): 305-310.
[7]	孙霞, 禹龙, 田生伟, 闫奕霖, 林江丽. 基于一致性Hash的分布式海量分子检索模型[J]. 计算机应用, 2015, 35(4): 956-959.
[8]	曹波, 韩燕波, 王桂玲. 基于车牌识别大数据的伴随车辆组发现方法[J]. 计算机应用, 2015, 35(11): 3203-3207.
[9]	张钰陈靖王涌天周琪. 增强现实浏览器的密集热点定位与显示[J]. 计算机应用, 2014, 34(5): 1435-1438.
[10]	袁欣辉刘勇漆锋滨. 层次化批分解算法云框架[J]. 计算机应用, 2014, 34(3): 690-694.
[11]	张杨张冬雯王一拙. 基于面向方面和运行时反射技术的并行框架[J]. 计算机应用, 2014, 34(11): 3096-3099.
[12]	李楠冯涛刘斌李贤徽刘磊. 基于面向服务对象体系结构的交通噪声地图分布式计算方法[J]. 计算机应用, 2012, 32(08): 2146-2149.
[13]	谢延红. 因特网上基于.NET的通用计算资源共享环境[J]. 计算机应用, 2011, 31(09): 2563-2566.
[14]	张学锋徐胜超. 因特网上基于节点角色的计算资源共享平台——RB-CRSP[J]. 计算机应用, 2011, 31(03): 834-838.
[15]	冯家耀齐德昱钱正平. 基于数据交换与同步的作业调度方案[J]. 计算机应用, 2009, 29(11): 3165-3170.

基于Sector/Sphere的气相色谱-质谱联用多样本并行对齐算法

Multiple samples alignment for GC-MS data in parallel on Sector/Sphere

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics