Journal of Computer Applications

    Next Articles

Theoretical tandem mass spectrometry prediction method for peptide sequences based on transformer and gated recurrent unit

  

  • Received:2024-01-04 Revised:2024-03-25 Online:2024-04-15 Published:2024-04-15

基于Transformer和门控循环单元的肽序列理论串联质谱图预测方法

何长久1,杨婧涵2,周丕宇2,边昕烨1,吕明明1,董迪1,付岩2,王海鹏3   

  1. 1. 山东理工大学
    2. 中国科学院数学与系统科学研究院
    3. 山东理工大学计算机科学与技术学院
  • 通讯作者: 何长久
  • 基金资助:
    从头测序智能算法和质量控制统计标准研究;国家自然科学基金资助项目;山东省高等学校优秀青年创新团队支持计划项目

Abstract: Aiming at the issue present in existing prediction methods, namely, their restricted capability to only predict b and y backbone ions, as well as single model's difficulty in capturing the intricate relationships within peptide sequences, a theoretical tandem mass spectrometry prediction method for peptide sequences based on Transformer and Gated Recurrent Unit (GRU), named DeepCollider, was proposed. Firstly, based on a deep learning architecture combining Transformer and GRU, DeepCollider leveraged self-attention mechanisms and long-distance dependencies to enhance the relationship modeling between peptide sequences and their fragment ion intensities. Secondly, unlike existing methods encoding peptide sequences to predict all b and y backbone ions, fragmentation flag was utilized to mark fragmentation sites within peptide sequences, enabling encoding and prediction of fragment ions at specific fragmentation sites. Finally, Pearson Correlation Coefficient (PCC) and Mean Absolute Error (MAE) were employed as evaluation metrics to measure the similarity between predicted spectra and experimental spectra. Experiments demonstrate that DeepCollider shows advantages in both PCC and MAE metrics compared to existing methods limited to predicting b and y backbone ions, such as pDeep and Prosit, with an increase of approximately 0.15 in PCC value and a decrease of approximately 0.005 in MAE value. DeepCollider not only predicts b, y backbone ions and their corresponding dehydrated and deaminated neutral loss ions, but also significantly improves the peak coverage and similarity of theoretical spectrum predictions.

Key words: Keywords: theoretical mass spectrometry prediction, peptide sequence, fragment ion intensity, proteomics, deep learning

摘要: 针对现有理论串联质谱图预测仅限预测b、y主干碎片离子,单一模型难以捕捉肽序列复杂关系的问题,提出了一种基于Transformer和门控循环单元(GRU)的肽序列理论串联质谱图预测方法,名为DeepCollider。首先,使用Transformer和GRU相结合的深度学习架构,通过自注意力机制和长距离依赖关系,增强对肽序列与碎片离子强度关系的建模能力。其次,与现有方法编码肽序列预测所有b、y主干离子不同,使用碎裂标志位来标记肽序列的碎裂位点,可针对特定碎裂位点编码并预测相应的碎片离子。最后,为了计算预测谱图与实验谱图之间的相似度,使用皮尔逊相关系数(PCC)和平均绝对误差(MAE)作为评测指标。实验表明,与现有的仅限预测b、y主干碎片离子的方法(如名为pDeep和Prosit的方法)相比,DeepCollider在PCC和MAE指标上均有优势,PCC值提升约0.15,MAE值降低0.005左右。DeepCollider不仅可以预测b、y、a主干离子及其相应的失水失氨中性丢失离子,还可进一步提高理论谱图预测的谱峰覆盖度和相似性。

关键词: 关键词: 理论质谱图预测, 肽序列, 碎片离子强度, 蛋白质组学, 深度学习

CLC Number: