Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

De novo peptide sequencing by tandem mass spectrometry based on graph convolutional neural network

MOU Changning, WANG Haipeng, ZHOU Piyu, HOU Xinhang

Journal of Computer Applications 2021, 41 (9): 2773-2779. DOI: 10.11772/j.issn.1001-9081.2020111875

Abstract （522）

PDF （11373KB）（394）

Save

In proteomics, de novo sequencing is one of the most important methods for peptide sequencing by tandem mass spectrometry. It has the advantage of being independent on any protein databases and plays a key role in the determination of protein sequences of unknown species, monoclonal antibodies sequencing and other fields. However, due to its complexity, the accuracy of de novo sequencing is much lower than that of the database search methods, therefore the wide application of de novo sequencing is limited. Focused on the issue of low accuracy of de novo sequencing, denovo-GCN, a de novo sequencing method based on Graph Convolutional neural Network (GCN) was proposed. In this method, the relationships between peaks in mass spectrometry were expressed by using graph structure, and the peak features were extracted from each corresponding peptide cleavage site. Then the amino acid type at the current cleavage site was predicted by GCN, and finally a complete sequence was formed step by step. Three significant parameters affecting the model were experimentally determined, including the GCN model layer number, the combination of ion types and the number of spectral peaks used for sequencing, and datasets of a wide variety of species were used for experimental comparison. Experimental results show that, the peptide-level recall of denovo-GCN is 4.0 percentage points to 21.1 percentage points higher than those of the graph theory-based methods Novor and pNovo, and is 2.1 percentage points to 10.7 percentage points higher than that of DeepNovo, which adopts Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network.

Reference | Related Articles | Metrics

Select

Peptide spectrum match scoring algorithm based on multi-head attention mechanism and residual neural network

MIN Xin, WANG Haipeng, MOU Changning

Journal of Computer Applications 2020, 40 (6): 1830-1836. DOI: 10.11772/j.issn.1001-9081.2019101880

Abstract （463）

PDF （1141KB）（561）

Save

Peptide spectrum match scoring algorithm plays a key role in the peptide sequence identification, and the traditional scoring algorithm cannot effectively make full use of the peptide fragmentation pattern to perform scoring. In order to solve the problem, a multi-classification probability sum scoring algorithm combined with the peptide sequence information representation called deepscore-α was proposed. In this algorithm, the second scoring was not performed with the consideration of global information, and there was no limitation on the similarity calculation method of theoretical mass spectrum and experimental mass spectrum. In the algorithm, a one-dimensional residual network was used to extract the underlying information of the sequence, and then the effects of different peptide bonds on the current peptide bond fracture were integrated through the multi-attention mechanism to generate the final fragmention relative intensity distribution probability matrix, after that, the final peptide spectrum match score was calculated by combining the actual relative intensity of the peptide sequence fragmention. This algorithm was compared with Comet and MSGF+, two common open source identification tools. The results show that when False Discovery Rate （FDR） was 0.01 on humanbody proteome dataset, the number of peptide sequences retained by deepScore-α is increased by about 14%, and the Top1 hit ratio (the proportion of the correct peptide sequences in the spectrum with the highest score) of this algorithm is increased by about 5 percentage points. The generalization performance test of the model trained by human ProteomeTools2 dataset show that the number of sequences peptide retained by deepScore-α at FDR of 0.01 is improved by about 7%, the Top1 hit ratio of this algorithm is increased by about 5 percentage points, and the identification results from Decoy library in the Top1 is decreased by about 60%. Experimental results prove that, the algorithm can retain more peptide sequences at lower FDR value, improve the Top1 hit ratio, and has good generalization performance.

Reference | Related Articles | Metrics