基于图卷积神经网络的串联质谱从头测序

doi:10.11772/j.issn.1001-9081.2020111875

计算机应用 ›› 2021, Vol. 41 ›› Issue (9): 2773-2779.DOI: 10.11772/j.issn.1001-9081.2020111875

所属专题：前沿与综合应用

基于图卷积神经网络的串联质谱从头测序

牟长宁, 王海鹏, 周丕宇, 侯鑫行

山东理工大学计算机科学与技术学院, 山东淄博 255000

收稿日期:2020-12-02 修回日期:2021-01-08 发布日期:2021-05-12 出版日期:2021-09-10
通讯作者: 王海鹏
作者简介:牟长宁(1990-),男,山东淄博人,硕士研究生,CCF学生会员,主要研究方向:深度学习、生物信息学;王海鹏(1980-),男,山东淄博人,副教授,博士,主要研究方向:机器学习、生物信息学;周丕宇(1995-),男,山东淄博人,硕士研究生,主要研究方向:深度学习、生物信息学;侯鑫行(1995-),男,山东济宁人,硕士研究生,主要研究方向:深度学习、生物信息学。
基金资助:
国家自然科学基金资助项目（31500669）；山东省自然科学基金资助项目（ZR2014FQ024）；山东省高等学校优秀青年创新团队支持计划项目（2019KJN048）。

De novo peptide sequencing by tandem mass spectrometry based on graph convolutional neural network

MOU Changning, WANG Haipeng, ZHOU Piyu, HOU Xinhang

School of Computer Science and Technology, Shandong University of Technology, Zibo Shandong 255000, China

Received:2020-12-02 Revised:2021-01-08 Online:2021-05-12 Published:2021-09-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (31500669), the Shandong Provincial Natural Science Foundation (ZR2014FQ024), the Support Program for Outstanding Youth Innovation Teams in Colleges and Universities of Shandong Province (2019KJN048).

摘要/Abstract

摘要： 在蛋白质组学中从头测序是串联质谱肽段测序的重要方法之一，其具有不依赖于蛋白质数据库的优势，并在测定未知物种蛋白序列、单克隆抗体测序等领域中起着关键作用。然而由于从头测序的复杂性，导致其测序的准确率远低于数据库搜索方法，制约了从头测序的广泛应用。针对从头测序准确率低的问题，提出一种基于图卷积神经网络（GCN）的从头测序方法denovo-GCN。该方法将质谱中谱峰之间的关系用图结构表示，并从每个相应的肽碎裂位点提取谱峰特征，然后通过GCN预测当前碎裂位点处的氨基酸类型，最后逐步组成完整的肽序列。通过实验确定了GCN模型的层数、离子类型组合和测序使用的谱峰数量这3个影响模型的重要参数，并将多个物种数据集用于实验对比。实验结果表明，该方法在肽水平上的召回率比基于图论的从头测序方法Novor、pNovo提高了4.0~21.1个百分点，比基于卷积神经网络（CNN）和长短期记忆（LSTM）网络的DeepNovo提高了2.1~10.7个百分点。

关键词: 图卷积神经网络, 从头测序, 蛋白质组学, 串联质谱

Abstract: In proteomics, de novo sequencing is one of the most important methods for peptide sequencing by tandem mass spectrometry. It has the advantage of being independent on any protein databases and plays a key role in the determination of protein sequences of unknown species, monoclonal antibodies sequencing and other fields. However, due to its complexity, the accuracy of de novo sequencing is much lower than that of the database search methods, therefore the wide application of de novo sequencing is limited. Focused on the issue of low accuracy of de novo sequencing, denovo-GCN, a de novo sequencing method based on Graph Convolutional neural Network (GCN) was proposed. In this method, the relationships between peaks in mass spectrometry were expressed by using graph structure, and the peak features were extracted from each corresponding peptide cleavage site. Then the amino acid type at the current cleavage site was predicted by GCN, and finally a complete sequence was formed step by step. Three significant parameters affecting the model were experimentally determined, including the GCN model layer number, the combination of ion types and the number of spectral peaks used for sequencing, and datasets of a wide variety of species were used for experimental comparison. Experimental results show that, the peptide-level recall of denovo-GCN is 4.0 percentage points to 21.1 percentage points higher than those of the graph theory-based methods Novor and pNovo, and is 2.1 percentage points to 10.7 percentage points higher than that of DeepNovo, which adopts Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network.

Key words: Graph Convolutional neural Network (GCN), de novo sequencing, proteomics, tandem mass spectrometry

中图分类号:

TP391

牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779.

MOU Changning, WANG Haipeng, ZHOU Piyu, HOU Xinhang. De novo peptide sequencing by tandem mass spectrometry based on graph convolutional neural network[J]. Journal of Computer Applications, 2021, 41(9): 2773-2779.

参考文献

[1] CHEN C,HOU J,TANNER J J,et al. Bioinformatics methods for mass spectrometry-based proteomics data analysis[J]. International Journal of Molecular Sciences,2020,21(8):No. 2873.
[2] PERKINS D N,PAPPIN D J C,CREASY D M,et al. Probabilitybased protein identification by searching sequence databases using mass spectrometry data[J]. Electrophoresis,1999,20(18):3551-3567.
[3] ENG J K,HOOPMANN M R,JAHAN T A,et al. A deeper look into Comet-implementation and features[J]. Journal of The American Society for Mass Spectrometry,2015,26(11):1865-1874.
[4] TYANOVA S,TEMU T,COX J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics[J]. Nature Protocols,2016,11(12):2301-2319.
[5] CHI H,HE K,YANG B,et al. pFind-Alioth:a novel unrestricted database search algorithm to improve the interpretation of highresolution MS/MS data[J]. Journal of Proteomics,2015,125:89-97.
[6] CHI H,LIU C,YANG H,et al. Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine[J]. Nature Biotechnology,2018,36(11):1059-1061.
[7] MA B,ZHANG K Z,HENDRIE C,et al. PEAKS:powerful software for peptide de novo sequencing by tandem mass spectrometry[J]. Rapid Communications in Mass Spectrometry, 2003,17(20):2337-2342.
[8] FRANK A,PEVZNER P. PepNovo:de novo peptide sequencing via probabilistic network modeling[J]. Analytical Chemistry, 2005,77(4):964-973.
[9] CHI H,SUN R X,YANG B,et al. pNovo:de novo peptide sequencing and identification using HCD spectra[J]. Journal of Proteome Research,2010,9(5):2713-2724.
[10] CHI H,CHEN H F,HE K,et al. pNovo+:de novo peptide sequencing using complementary HCD and ETD tandem mass spectra[J]. Journal of Proteome Research, 2013, 12(2):615-625.
[11] YANG H,CHI H,ZHOU W J,et al. Open-pNovo:de novo peptide sequencing with thousands of protein modifications[J]. Journal of Proteome Research,2017,16(2):645-654.
[12] YANG H,CHI H,ZENG W F,et al. pNovo 3:precise de novo peptide sequencing using a learning-to-rank framework[J]. Bioinformatics,2019(14):i183-i190.
[13] FISCHER B,ROTH V,ROOS F,et al. NovoHMM:a hidden Markov model for de novo peptide sequencing[J]. Analytical Chemistry,2005,77(22):7265-7273.
[14] MA B. Novor:real-time peptide de novo sequencing software[J]. Journal of The American Society for Mass Spectrometry,2015,26(11):1885-1894.
[15] TRAN N H,ZHANG X L L,XIN L,et al. De novo peptide sequencing by deep learning[J]. Proceedings of the National Academy of Sciences of the United States of America,2017,114(31):8247-8252.
[16] 赵新元, 秦伟捷, 钱小红. 深度学习方法在生物质谱及蛋白质组学中的应用[J]. 生物化学与生物物理进展,2018,45(12):1214-1223.(ZHAO X Y,QIN W J,QIAN X H. Application of deep learning in biological mass spectrometry and proteomics[J]. Progress in Biochemistry and Biophysics,2018,45(12):1214-1223.)
[17] WEN B, ZENG W F, LIAO Y X, et al. Deep learning in proteomics[J]. Proteomics,2020,20(21/22):No. 1900335.
[18] 徐冰冰, 岑科廷, 黄俊杰, 等. 图卷积神经网络综述[J]. 计算机学报,2020,43(5):755-780.(XU B B,CEN K T,HUANG J J, et al. A survey on graph convolutional neural network[J]. Chinese Journal of Computers,2020,43(5):755-780.)
[19] KIPF T N,WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL](2017-02-22)[2020-09-09]. https://arxiv.org/pdf/1609.02907.pdf.
[20] DEFFERRARD M, BRESSON X, VANDERGHEYNST P. Convolutional neural networks on graphs with fast localized spectral filtering[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc.,2016:3844-3852.
[21] MICHALSKI A,NEUHAUSER N,COX J,et al. A systematic investigation into the nature of tryptic HCD spectra[J]. Journal of Proteome Research,2012,11(11):5479-5491.
[22] LIN T Y,GOYAL P,GIRSHICK R,et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2999-3007.
[23] ZOLG D P,WILHELM M,SCHNATBAUM K,et al. Building ProteomeTools based on a complete synthetic human proteome[J]. Nature Methods,2017,14(3):259-262.
[24] GESSULAT S,SCHMIDT T,ZOLG D P,et al. Prosit:proteomewide prediction of peptide tandem mass spectra by deep learning[J]. Nature Methods,2019,16(6):509-518.

基于图卷积神经网络的串联质谱从头测序

De novo peptide sequencing by tandem mass spectrometry based on graph convolutional neural network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[2]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.
[3]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.
[4]	王星, 刘贵娟, 陈志豪. 高斯混合模型与文本图卷积网络结合的虚假评论识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 360-368.
[5]	何长久, 杨婧涵, 周丕宇, 边昕烨, 吕明明, 董迪, 付岩, 王海鹏. 基于Transformer和门控循环单元的肽序列理论串联质谱图预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3958-3964.
[6]	郭晓, 陈艳平, 唐瑞雪, 黄瑞章, 秦永彬. 融合行为词的罪名预测多任务学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 159-166.
[7]	李豆豆, 李汪根, 夏义春, 束阳, 高坤. 基于特征交互与自适应融合的骨骼动作识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2581-2587.
[8]	何嘉明, 杨巨成, 吴超, 闫潇宁, 许能华. 基于多模态图卷积神经网络的行人重识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2182-2189.
[9]	樊小宇, 蔺素珍, 王彦博, 刘峰, 李大威. 基于残差图卷积神经网络的高倍欠采样核磁共振图像重建算法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1261-1268.
[10]	王若莹, 吕凡, 赵柳清, 胡伏原. 融合用户需求和边界约束的平面图生成算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 575-582.
[11]	陈浩杰, 范江亭, 刘勇. 深度强化学习解决动态旅行商问题[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1194-1200.
[12]	李晓杰, 崔超然, 宋广乐, 苏雅茜, 吴天泽, 张春云. 基于时序超图卷积神经网络的股票趋势预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 797-803.
[13]	富坤, 高金辉, 赵晓梦, 李佳宁. 融合全局结构信息的拓扑优化图卷积网络[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 357-364.
[14]	车冰倩, 周栋. 融合网络结构信息及文本内容的标签推荐方法[J]. 计算机应用, 2021, 41(4): 976-983.
[15]	付颖, 王红玲, 王中卿. 基于宏观篇章结构的科技论文摘要模型[J]. 计算机应用, 2021, 41(10): 2864-2870.