Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (1): 297-304.DOI: 10.11772/j.issn.1001-9081.2025010060
• Frontier and comprehensive applications • Previous Articles Next Articles
Lijin YAO1, Di ZHANG1, Piyu ZHOU2, Zhijian QU1, Haipeng WANG1(
)
Received:2025-01-15
Revised:2025-03-25
Accepted:2025-03-26
Online:2026-01-10
Published:2026-01-10
Contact:
Haipeng WANG
About author:YAO Lijin, born in 1999, M. S. candidate. His research interests include deep learning, bioinformatics.Supported by:通讯作者:
王海鹏
作者简介:姚理进(1999—),男,山东聊城人,硕士研究生, CCF学生会员,主要研究方向:深度学习、生物信息学基金资助:CLC Number:
Lijin YAO, Di ZHANG, Piyu ZHOU, Zhijian QU, Haipeng WANG. Transformer and gated recurrent unit-based de novo sequencing algorithm for phosphopeptides[J]. Journal of Computer Applications, 2026, 46(1): 297-304.
姚理进, 张迪, 周丕宇, 曲志坚, 王海鹏. 基于Transformer和门控循环单元的磷酸化肽从头测序算法[J]. 《计算机应用》唯一官方网站, 2026, 46(1): 297-304.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025010060
| 序号 | TGNovo网络层 | 输出维度 |
|---|---|---|
| 1 | mass embedding | (400, 256) |
| 2 | intensity embedding | (400, 256) |
| 3 | Transformer Encoder×2 | (400, 256) |
| 4 | Embedding | (Len, 256) |
| 5 | position embedding | (Len, 256) |
| 6 | Transformer Decoder×1 | (Len, 256) |
| 7 | Conv1d×3 | (Len, 400, 256) |
| 8 | Batch Norm1d×3 | (Len, 400, 256) |
| 9 | GRU | (Len, 400, 256) |
| 10 | Max Pooling | (Len, 256) |
| 11 | Sum | (Len, 256) |
| 12 | Linear×3 | (Len, 28) |
Tab. 1 Output dimensions of TGNovo network layers
| 序号 | TGNovo网络层 | 输出维度 |
|---|---|---|
| 1 | mass embedding | (400, 256) |
| 2 | intensity embedding | (400, 256) |
| 3 | Transformer Encoder×2 | (400, 256) |
| 4 | Embedding | (Len, 256) |
| 5 | position embedding | (Len, 256) |
| 6 | Transformer Decoder×1 | (Len, 256) |
| 7 | Conv1d×3 | (Len, 400, 256) |
| 8 | Batch Norm1d×3 | (Len, 400, 256) |
| 9 | GRU | (Len, 400, 256) |
| 10 | Max Pooling | (Len, 256) |
| 11 | Sum | (Len, 256) |
| 12 | Linear×3 | (Len, 28) |
| 参数 | 默认值 |
|---|---|
| 谱峰数 | 400 |
| 最大质荷比 | 6 000 |
| 最大肽段长度 | 100 |
| 束搜索最大值 | 10 |
| Transformer编码器层数 | 2 |
| Transformer解码器层数 | 1 |
| Transformer注意力头数 | 4 |
| 隐藏层维度 | 256 |
Tab. 2 Adjustable parameters of TGNovo
| 参数 | 默认值 |
|---|---|
| 谱峰数 | 400 |
| 最大质荷比 | 6 000 |
| 最大肽段长度 | 100 |
| 束搜索最大值 | 10 |
| Transformer编码器层数 | 2 |
| Transformer解码器层数 | 1 |
| Transformer注意力头数 | 4 |
| 隐藏层维度 | 256 |
| 数据集PID | 物种 | 发表年份 | 谱图数 | 磷酸化谱图数 | 数据库 |
|---|---|---|---|---|---|
| PXD008211 | 大肠杆菌 | 2018 | 8 722 | 1 471 | 大肠杆菌 |
| PXD004452 | 人 | 2017 | 70 454 | 33 775 | 人 |
| PXD019697 | 小鼠 | 2020 | 12 492 | 9 757 | 小鼠 |
| PXD011284 | 拟南芥 | 2019 | 373 874 | 4 197 | 拟南芥 |
| PXD023361 | 酵母菌 | 2021 | 20 044 | 146 | 酵母菌 |
| PXD000138 | 合成 | 2013 | 46 885 | 46 885 | 合成肽 |
| PXD007058 | 合成 | 2019 | 947 | 947 | 合成肽 |
| PXD013210 | 合成 | 2019 | 12 118 | 12 118 | 合成肽 |
| PXD004894 | 黑色素肿瘤 | 2017 | 4 101 232 | 未知 | 人 |
Tab. 3 Experimental datasets
| 数据集PID | 物种 | 发表年份 | 谱图数 | 磷酸化谱图数 | 数据库 |
|---|---|---|---|---|---|
| PXD008211 | 大肠杆菌 | 2018 | 8 722 | 1 471 | 大肠杆菌 |
| PXD004452 | 人 | 2017 | 70 454 | 33 775 | 人 |
| PXD019697 | 小鼠 | 2020 | 12 492 | 9 757 | 小鼠 |
| PXD011284 | 拟南芥 | 2019 | 373 874 | 4 197 | 拟南芥 |
| PXD023361 | 酵母菌 | 2021 | 20 044 | 146 | 酵母菌 |
| PXD000138 | 合成 | 2013 | 46 885 | 46 885 | 合成肽 |
| PXD007058 | 合成 | 2019 | 947 | 947 | 合成肽 |
| PXD013210 | 合成 | 2019 | 12 118 | 12 118 | 合成肽 |
| PXD004894 | 黑色素肿瘤 | 2017 | 4 101 232 | 未知 | 人 |
| 参数 | 肽水平召回率/% | 氨基酸水平召回率/% | 单张谱图耗时/ms |
|---|---|---|---|
| 默认值 | 83.2 | 94.3 | 133.3 |
| 谱峰数300 | 81.9 | 92.7 | 111.1 |
| 谱峰数500 | 83.4 | 94.7 | 142.8 |
| 编码器层数4 | 83.6 | 94.9 | 138.8 |
| 解码器层数2 | 84.1 | 95.3 | 142.8 |
| 注意力头数1 | 81.1 | 90.1 | 128.2 |
| 注意力头数8 | 82.9 | 92.4 | 133.3 |
| 束搜索宽度5 | 80.6 | 86.5 | 103.0 |
| 束搜索宽度20 | 83.1 | 94.2 | 166.6 |
Tab. 4 Performance comparison of models with different parameters
| 参数 | 肽水平召回率/% | 氨基酸水平召回率/% | 单张谱图耗时/ms |
|---|---|---|---|
| 默认值 | 83.2 | 94.3 | 133.3 |
| 谱峰数300 | 81.9 | 92.7 | 111.1 |
| 谱峰数500 | 83.4 | 94.7 | 142.8 |
| 编码器层数4 | 83.6 | 94.9 | 138.8 |
| 解码器层数2 | 84.1 | 95.3 | 142.8 |
| 注意力头数1 | 81.1 | 90.1 | 128.2 |
| 注意力头数8 | 82.9 | 92.4 | 133.3 |
| 束搜索宽度5 | 80.6 | 86.5 | 103.0 |
| 束搜索宽度20 | 83.1 | 94.2 | 166.6 |
| 实验设置 | 肽水平召回率 | 氨基酸水平召回率 |
|---|---|---|
| TGNovo (完整模型) | 83.2 | 94.3 |
| 移除GRU模块 | 66.1 | 86.4 |
| 移除Transformer模块 | 74.6 | 92.5 |
| 移除谱峰连接图 | 77.1 | 92.4 |
| 移除磷酸化修饰特征处理 | 72.5 | 91.2 |
Tab. 5 Ablation experimental results of single module
| 实验设置 | 肽水平召回率 | 氨基酸水平召回率 |
|---|---|---|
| TGNovo (完整模型) | 83.2 | 94.3 |
| 移除GRU模块 | 66.1 | 86.4 |
| 移除Transformer模块 | 74.6 | 92.5 |
| 移除谱峰连接图 | 77.1 | 92.4 |
| 移除磷酸化修饰特征处理 | 72.5 | 91.2 |
| [1] | AEBERSOLD R, MANN M. Mass-spectrometric exploration of proteome structure and function [J]. Nature, 2016, 537(7620): 347-355. |
| [2] | MA B, JOHNSON R. De novo sequencing and homology searching [J]. Molecular and Cellular Proteomics, 2012, 11(2): No.O111.014902. |
| [3] | ENG J K, SEARLE B C, CLAUSER K R, et al. A face in the crowd: recognizing peptides through database search [J]. Molecular and Cellular Proteomics, 2011, 10(11): No.R111.009522. |
| [4] | BARTELS C. Fast algorithm for peptide sequencing by mass spectroscopy [J]. Biomedical and Environmental Mass Spectrometry, 1990, 19(6): 363-368. |
| [5] | MA B, ZHANG K, HENDRIE C, et al. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry [J]. Rapid Communications in Mass Spectrometry, 2003, 17(20): 2337-2342. |
| [6] | FRANK A, PEVZNER P. PepNovo: de novo peptide sequencing via probabilistic network modeling [J]. Analytical Chemistry, 2005, 77(4): 964-973. |
| [7] | FISCHER B, ROTH V, ROOS F, et al. NovoHMM: a hidden Markov model for de novo peptide sequencing [J]. Analytical Chemistry, 2005, 77(22): 7265-7273. |
| [8] | MA B. Novor: real-time peptide de novo sequencing software [J]. Journal of the American Society for Mass Spectrometry, 2015, 26(11): 1885-1894. |
| [9] | TRAN N H, ZHANG X, XIN L, et al. De novo peptide sequencing by deep learning [J]. Proceedings of the National Academy of Sciences of the United States of America, 2017, 114(31): 8247-8252. |
| [10] | QIAO R, TRAN N H, XIN L, et al. Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices [J]. Nature Machine Intelligence, 2021, 3(5): 420-425. |
| [11] | 牟长宁,王海鹏,周丕宇,等.基于图卷积神经网络的串联质谱从头测序[J].计算机应用, 2021, 41(9): 2773-2779. |
| MOU C N, WANG H P, ZHOU P Y, et al. De novo peptide sequencing by tandem mass spectrometry based on graph convolutional neural network [J]. Journal of Computer Applications, 2021, 41(9): 2773-2779. | |
| [12] | WU R, ZHANG X, WANG R, et al. Denovo-GCN: de novo peptide sequencing by graph convolutional neural networks [J]. Applied Sciences, 2023, 13(7): No.4604. |
| [13] | YILMAZ M, FONDRIE W E, BITTREMIEUX W, et al. Sequence-to-sequence translation from mass spectra to peptides with a transformer model [J]. Nature Communications, 2024, 15: No.6427. |
| [14] | MAO Z, ZHANG R, XIN L, et al. Mitigating the missing-fragmentation problem in de novo peptide sequencing with a two-stage graph-based deep learning model [J]. Nature Machine Intelligence, 2023, 5(11): 1250-1260. |
| [15] | MACEK B, MANN M, OLSEN J V. Global and site-specific quantitative phosphoproteomics: principles and application [J]. Annual Review of Pharmacology and Toxicology, 2009, 49: 199-221. |
| [16] | ZONG Y, WANG Y, YANG Y, et al. DeepFLR facilitates false localization rate control in phosphoproteomics [J]. Nature Communications, 2023, 14: No.2269. |
| [17] | MARX H, LEMEER S, SCHLIEP J E, et al. A large synthetic peptide and phosphopeptide reference library for mass spectrometry-based proteomics [J]. Nature Biotechnology, 2013, 31(6): 557-564. |
| [18] | FERRIES S, PERKINS S, BROWNRIDGE P J, et al. Evaluation of parameters for confident phosphorylation site localization using an orbitrap fusion tribrid mass spectrometer [J]. Journal of Proteome Research, 2017, 16(9): 3448-3459. |
| [19] | SHTEYNBERG D D, DEUTSCH E W, CAMPBELL D S, et al. PTMProphet: fast and accurate mass modification localization for the trans-proteomic pipeline [J]. Journal of Proteome Research, 2019, 18(12): 4262-4272. |
| [20] | TAUMER C, GRIESBAUM L, KOVACEVIC A, et al. Parallel reaction monitoring on a Q Exactive mass spectrometer increases reproducibility of phosphopeptide detection in bacterial phosphoproteomics measurements [J]. Journal of Proteomics, 2018, 189: 60-66. |
| [21] | BEKKER-JENSEN D B, KELSTRUP C D, BATTH T S, et al. An optimized shotgun strategy for the rapid generation of comprehensive human proteomes [J]. Cell Systems, 2017, 4(6): 587-599.e4. |
| [22] | MEUL T, BERSCHNEIDER K, SCHMITT S, et al. Mitochondrial regulation of the 26S proteasome [J]. Cell Reports, 2020, 32(8): No.108059. |
| [23] | WU X N, CHU L, XI L, et al. Sucrose-Induced Receptor Kinase 1 is modulated by an interacting kinase with short extracellular domain [J]. Molecular and Cellular Proteomics, 2019, 18(8): 1556-1571. |
| [24] | OSMAN S, MOHAMMAD E, LIDSCHREIBER M, et al. The Cdk8 kinase module regulates interaction of the mediator complex with RNA polymerase II [J]. Journal of Biological Chemistry, 2021, 296: No.100734. |
| [25] | BASSANI-STERNBERG M, BRÄUNLEIN E, KLAR R, et al. Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry [J]. Nature Communications, 2016, 7: No.13404. |
| [26] | CHI H, LIU C, YANG H, et al. Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine [J]. Nature Biotechnology, 2018, 36(11): 1059-1061. |
| [27] | SHAO G, CAO Y, CHEN Z, et al. How to use open-pFind in deep proteomics data analysis? — a protocol for rigorous identification and quantitation of peptides and proteins from mass spectrometry data [J]. Biophysics Reports, 2021, 7(3): 207-226. |
| [28] | COX J, MANN M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification [J]. Nature Biotechnology, 2008, 26(12): 1367-1372. |
| [29] | GASSAWAY B M, LI J, RAD R, et al. A multi-purpose, regenerable, proteome-scale, human phosphoserine resource for phosphoproteomics [J]. Nature Methods, 2022, 19(11): 1371-1375. |
| [30] | AN Z, ZHAI L, YING W, et al. PTMiner: localization and quality control of protein modifications detected in an open search and its application to comprehensive post-translational modification characterization in human proteome [J]. Molecular and Cellular Proteomics, 2019, 18(2): 391-405. |
| [31] | REYNISSON B, ALVAREZ B, PAUL S, et al. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data [J]. Nucleic Acids Research, 2020, 48(W1): W449-W454. |
| [32] | VITA R, MAHAJAN S, OVERTON J A, et al. The Immune Epitope Database (IEDB): 2018 update [J]. Nucleic Acids Research, 2019, 47(D1): D339-D343. |
| [1] | Lifang WANG, Wenjing REN, Xiaodong GUO, Rongguo ZHANG, Lihua HU. Trident generative adversarial network for low-dose CT image denoising [J]. Journal of Computer Applications, 2026, 46(1): 270-279. |
| [2] | Junheng WU, Xiaodong WANG, Qixue HE. Time series prediction model based on statistical distribution sensing and frequency domain dual-channel fusion [J]. Journal of Computer Applications, 2026, 46(1): 113-123. |
| [3] | Yu SANG, Tong GONG, Chen ZHAO, Bowen YU, Siman LI. Domain-adaptive nighttime object detection method with photometric alignment [J]. Journal of Computer Applications, 2026, 46(1): 242-251. |
| [4] | Fang WANG, Jing HU, Rui ZHANG, Wenting FAN. Medical image segmentation network with content-guided multi-angle feature fusion [J]. Journal of Computer Applications, 2025, 45(9): 3017-3025. |
| [5] | Li LI, Han SONG, Peihe LIU, Hanlin CHEN. Named entity recognition for sensitive information based on data augmentation and residual networks [J]. Journal of Computer Applications, 2025, 45(9): 2790-2797. |
| [6] | Yiming LIANG, Jing FAN, Wenze CHAI. Multi-scale feature fusion sentiment classification based on bidirectional cross attention [J]. Journal of Computer Applications, 2025, 45(9): 2773-2782. |
| [7] | Jin LI, Liqun LIU. SAR and visible image fusion based on residual Swin Transformer [J]. Journal of Computer Applications, 2025, 45(9): 2949-2956. |
| [8] | Jinggang LYU, Shaorui PENG, Shuo GAO, Jin ZHOU. Speech enhancement network driven by complex frequency attention and multi-scale frequency enhancement [J]. Journal of Computer Applications, 2025, 45(9): 2957-2965. |
| [9] | Jing WANG, Jiaxing LIU, Wanying SONG, Jiaxing XUE, Wenxin DING. Few-shot skin image classification model based on spatial transformer network and feature distribution calibration [J]. Journal of Computer Applications, 2025, 45(8): 2720-2726. |
| [10] | Jin ZHOU, Yuzhi LI, Xu ZHANG, Shuo GAO, Li ZHANG, Jiachuan SHENG. Modulation recognition network for complex electromagnetic environments [J]. Journal of Computer Applications, 2025, 45(8): 2672-2682. |
| [11] | Yongpeng TAO, Shiqi BAI, Zhengwen ZHOU. Neural architecture search for multi-tissue segmentation using convolutional and transformer-based networks in glioma segmentation [J]. Journal of Computer Applications, 2025, 45(7): 2378-2386. |
| [12] | Haoyu LIU, Pengwei KONG, Yaoli WANG, Qing CHANG. Pedestrian detection algorithm based on multi-view information [J]. Journal of Computer Applications, 2025, 45(7): 2325-2332. |
| [13] | Dehui ZHOU, Jun ZHAO, Jinfeng CHENG. Tiny defect detection algorithm for bearing surface based on RT-DETR [J]. Journal of Computer Applications, 2025, 45(6): 1987-1997. |
| [14] | Sheping ZHAI, Yan HUANG, Qing YANG, Rui YANG. Multi-view entity alignment combining triples and text attributes [J]. Journal of Computer Applications, 2025, 45(6): 1793-1800. |
| [15] | Pengcheng XU, Lei HE, Chuan LI, Weiqi QIAN, Tun ZHAO. Deep symbolic regression method based on Transformer [J]. Journal of Computer Applications, 2025, 45(5): 1455-1463. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||