计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3146-3150.DOI: 10.11772/j.issn.1001-9081.2019050865

• 2019年中国粒计算与知识发现学术会议(CGCKD2019)论文 • 上一篇    下一篇

基于一维卷积神经网络的蛋白质-ATP绑定位点预测

张寓, 於东军   

  1. 南京理工大学 计算机科学与工程学院, 南京 210094
  • 收稿日期:2019-05-06 修回日期:2019-06-24 出版日期:2019-11-10 发布日期:2019-09-11
  • 通讯作者: 於东军
  • 作者简介:张寓(1995-),男,江苏扬州人,硕士研究生,主要研究方向:生物信息计算、模式识别;於东军(1975-),男,江苏镇江人,教授,博士,主要研究方向:生物信息计算、机器学习、模式识别、智能系统。
  • 基金资助:
    国家自然科学基金资助项目(61772273,61373062)。

Protein-ATP binding site prediction based on 1D-convolutional neural network

ZHANG Yu, YU Dongjun   

  1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing Jiangsu 210094, China
  • Received:2019-05-06 Revised:2019-06-24 Online:2019-11-10 Published:2019-09-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61772273, 61373062).

摘要: 为了提高预测腺嘌呤核苷三磷酸(ATP)绑定位点的准确率,提出了一种基于一维卷积神经网络(1D-CNN)的方法。首先,以蛋白质的序列信息为基础,融合位置特异性得分矩阵信息、二级结构信息和水溶性信息,使用随机下采样的方法消除数据不平衡的影响,再对缺失的特征进行再编码补齐,得到训练特征。训练一个1D-CNN来预测蛋白质-ATP绑定位点,优化网络结构,并且进行实验来对比所提方法和其他机器学习方法的优劣。实验结果展示了所提方法的有效性,并且该方法与传统支持向量机(SVM)相比在AUC指标上有部分的提升。

关键词: 蛋白质-ATP, 卷积神经网络, 数据不平衡问题, 分类

Abstract: To improve the accuracy of protein-ATP (Adenosine TriPhosphate) binding sites, an algorithm was proposed by using One Dimensional Convolutional Neural Network (1D-CNN). Firstly, based on the protein sequence information, position specific score matrix information, secondary structure information and water solubility information were combined and random under-sampling was used to eliminate the impact of data imbalance. Then, the missing features were completed by recoding. Finally, the training features were obtained. A 1D-CNN was trained to predict protein-ATP binding sites, the network structure was optimized, and experiments were carried out to compare the proposed method and other machine learning methods. Experimental results show that the proposed method is effective and can achieve better performance on AUC (Area Under Curve) compared to the traditional Support Vector Machine (SVM).

Key words: protein-ATP (Adenosine TriPhosphate), Convolutional Neural Network (CNN), data imbalance problem, classification

中图分类号: