%0 Journal Article
%A MIN Xin
%A MOU Changning
%A WANG Haipeng
%T Peptide spectrum match scoring algorithm based on multi-head attention mechanism and residual neural network
%D 2020
%R 10.11772/j.issn.1001-9081.2019101880
%J Journal of Computer Applications
%P 1830-1836
%V 40
%N 6
%X Peptide spectrum match scoring algorithm plays a key role in the peptide sequence identification, and the traditional scoring algorithm cannot effectively make full use of the peptide fragmentation pattern to perform scoring. In order to solve the problem, a multi-classification probability sum scoring algorithm combined with the peptide sequence information representation called deepscore-α was proposed. In this algorithm, the second scoring was not performed with the consideration of global information, and there was no limitation on the similarity calculation method of theoretical mass spectrum and experimental mass spectrum. In the algorithm, a one-dimensional residual network was used to extract the underlying information of the sequence, and then the effects of different peptide bonds on the current peptide bond fracture were integrated through the multi-attention mechanism to generate the final fragmention relative intensity distribution probability matrix, after that, the final peptide spectrum match score was calculated by combining the actual relative intensity of the peptide sequence fragmention. This algorithm was compared with Comet and MSGF+, two common open source identification tools. The results show that when False Discovery Rate （FDR） was 0.01 on humanbody proteome dataset, the number of peptide sequences retained by deepScore-α is increased by about 14%, and the Top1 hit ratio (the proportion of the correct peptide sequences in the spectrum with the highest score) of this algorithm is increased by about 5 percentage points. The generalization performance test of the model trained by human ProteomeTools2 dataset show that the number of sequences peptide retained by deepScore-α at FDR of 0.01 is improved by about 7%, the Top1 hit ratio of this algorithm is increased by about 5 percentage points, and the identification results from Decoy library in the Top1 is decreased by about 60%. Experimental results prove that, the algorithm can retain more peptide sequences at lower FDR value, improve the Top1 hit ratio, and has good generalization performance.
%U https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2019101880