Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (11): 3146-3150.DOI: 10.11772/j.issn.1001-9081.2019050865

• The 2019 China Conference on Granular Computing and Knowledge Discovery (CGCKD2019) • Previous Articles     Next Articles

Protein-ATP binding site prediction based on 1D-convolutional neural network

ZHANG Yu, YU Dongjun   

  1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing Jiangsu 210094, China
  • Received:2019-05-06 Revised:2019-06-24 Online:2019-11-10 Published:2019-09-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61772273, 61373062).


张寓, 於东军   

  1. 南京理工大学 计算机科学与工程学院, 南京 210094
  • 通讯作者: 於东军
  • 作者简介:张寓(1995-),男,江苏扬州人,硕士研究生,主要研究方向:生物信息计算、模式识别;於东军(1975-),男,江苏镇江人,教授,博士,主要研究方向:生物信息计算、机器学习、模式识别、智能系统。
  • 基金资助:

Abstract: To improve the accuracy of protein-ATP (Adenosine TriPhosphate) binding sites, an algorithm was proposed by using One Dimensional Convolutional Neural Network (1D-CNN). Firstly, based on the protein sequence information, position specific score matrix information, secondary structure information and water solubility information were combined and random under-sampling was used to eliminate the impact of data imbalance. Then, the missing features were completed by recoding. Finally, the training features were obtained. A 1D-CNN was trained to predict protein-ATP binding sites, the network structure was optimized, and experiments were carried out to compare the proposed method and other machine learning methods. Experimental results show that the proposed method is effective and can achieve better performance on AUC (Area Under Curve) compared to the traditional Support Vector Machine (SVM).

Key words: protein-ATP (Adenosine TriPhosphate), Convolutional Neural Network (CNN), data imbalance problem, classification

摘要: 为了提高预测腺嘌呤核苷三磷酸(ATP)绑定位点的准确率,提出了一种基于一维卷积神经网络(1D-CNN)的方法。首先,以蛋白质的序列信息为基础,融合位置特异性得分矩阵信息、二级结构信息和水溶性信息,使用随机下采样的方法消除数据不平衡的影响,再对缺失的特征进行再编码补齐,得到训练特征。训练一个1D-CNN来预测蛋白质-ATP绑定位点,优化网络结构,并且进行实验来对比所提方法和其他机器学习方法的优劣。实验结果展示了所提方法的有效性,并且该方法与传统支持向量机(SVM)相比在AUC指标上有部分的提升。

关键词: 蛋白质-ATP, 卷积神经网络, 数据不平衡问题, 分类

CLC Number: