《计算机应用》唯一官方网站 ›› 2021, Vol. 41 ›› Issue (12): 3702-3706.DOI: 10.11772/j.issn.1001-9081.2021010017

• 前沿与综合应用 • 上一篇    

基于L-Metric重叠子图发现的B细胞表位预测模型

高闯1, 唐冕1, 赵亮1,2()   

  1. 1.广西大学 计算机与电子信息学院,南宁 530004
    2.湖北医药学院 太和医院,湖北 十堰 442000
  • 收稿日期:2021-01-07 修回日期:2021-03-07 接受日期:2021-03-23 发布日期:2021-04-15 出版日期:2021-12-10
  • 通讯作者: 赵亮
  • 作者简介:高闯(1991—),男,吉林长春人,硕士研究生,主要研究方向:生物信息学
    唐冕(1997—),男,深圳福田人,硕士研究生,主要研究方向:生物信息学;
  • 基金资助:
    国家自然科学基金地区科学基金资助项目(32060150)

B-cell epitope prediction model with overlapping subgraph mining based on L-Metric

Chuang GAO1, Mian TANG1, Liang ZHAO1,2()   

  1. 1.School of Computing and Electronic Information,Guangxi University,Nanning Guangxi 530004,China
    2.Taihe Hospital,Hubei University of Medicine,Shiyan Hubei 442000,China
  • Received:2021-01-07 Revised:2021-03-07 Accepted:2021-03-23 Online:2021-04-15 Published:2021-12-10
  • Contact: Liang ZHAO
  • About author:GAO Chuang, born in 1991, M. S. candidate. His research interests include bioinformatics.
    TANG Mian, born in 1997, M. S. candidate. His research interests include bioinformatics.
  • Supported by:
    the Fund for Regional Science of National Natural Science Foundation of China(32060150)

摘要:

针对现有表位预测方法对抗原中存在的重叠表位预测能力不佳的问题,提出了将基于局部度量(L-Metric)的重叠子图发现算法用于表位预测的模型。首先,利用抗原上的表面原子构建原子图并升级为氨基酸残基图;然后,利用基于信息流的图划分算法将氨基酸残基图划分为互不重叠的种子子图,并使用基于L-Metric的重叠子图发现算法对种子子图进行扩展以得到重叠子图;最后,利用由图卷积网络(GCN)和全连接网络(FCN)构建的分类模型将扩展后的子图分类为抗原表位和非抗原表位。实验结果表明,所提出的模型在相同数据集上的F1值与现有表位预测模型DiscoTope 2、ElliPro、EpiPred和Glep相比分别提高了267.3%、57.0%、65.4%和3.5%。同时,消融实验结果表明,所提出的重叠子图发现算法能够有效改善预测能力,使用该算法的模型相较于未使用该算法的模型的F1值提高了19.2%。

关键词: 表位预测, 重叠表位发现, 局部度量, 图卷积网络, 焦点损失函数

Abstract:

Existing epitope prediction methods have poor performance on overlapping epitope prediction of antigen. In order to slove the problem, a novel epitope prediction model with the overlapping subgraph mining algorithm based on Local Metric (L-Metric) was proposed. Firstly, an atom graph was constructed based on surface atoms of antigen and upgraded to an amino acid residue graph subsequently. Then, the amino acid residue graph was divided into non-overlapping seed subgraphs by the information flow based graph partitioning algorithm, and these seed subgraphs were expanded to obtain overlapping subgraphs by using the L-Metric based overlapping subgraph mining algorithm. Finally, these expanded graphs were classified into epitopes and non-epitopes by using a classification model constructed based on Graph Convolutional Network (GCN) and Fully Connected Network (FCN). Experimental results show that, the F1-score of the proposed model is increased by 267.3%, 57.0%, 65.4% and 3.5% compared to those of the existing epitope prediction models such as Discontinuous epiTope prediction 2 (DiscoTope 2), Ellipsoid and Protrusion (ElliPro), Epitope Prediction server (EpiPred) and overlapping Graph cLustering-based B-cell epitope predictor (Glep) respectively in the same dataset. At the same time, the ablation experimental results show that the proposed overlapping subgraph mining algorithm can improve the prediction performance effectively, and the model with the proposed algorithm has the F1-score increased by 19.2% compared to the model without the proposed algorithm.

Key words: epitope prediction, overlapping epitope mining, Local Metric (L-Metric), Graph Convolutional Network (GCN), Focal Loss function (FL)

中图分类号: