Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (5): 1510-1514.DOI: 10.11772/j.issn.1001-9081.2019111992

• Frontier & interdisciplinary applications • Previous Articles     Next Articles

Protein complex identification algorithm based on XGboost and topological structural information

XU Zhoubo1, YANG Jian1, LIU Huadong1,2, HUANG Wenwen1   

  1. 1.Guangxi Key Laboratory of Trusted Software (Guilin University of Electronic Technology), GuilinGuangxi 541004, China
    2.School of Mechanical and Electrical Engineering, Guilin University of Electronic Technology, GuilinGuangxi 541004, China
  • Received:2019-11-25 Revised:2020-01-19 Online:2020-05-10 Published:2020-05-15
  • Contact: LIU Huadong, born in 1978, M. S., lecturer. His research interests include representation of graph data.
  • About author:XU Zhoubo, born in 1976, Ph. D., professor. Her research interests include symbolic calculation, intelligent planning, constraint solving.YANG Jian, born in 1994, M. S. candidate. His research interests include graph data mining.LIU Huadong, born in 1978, M. S., lecturer. His research interests include representation of graph data.HUANG Wenwen, born in 1994, M. S. candidate. His research interests include pattern mining.
  • Supported by:

    This work is partially supported by the National Natural Science Foundation of China (61762027), the Natural Science Foundation of Guangxi (2017GXNSFAA198172).


徐周波1, 杨健1, 刘华东1,2, 黄文文1   

  1. 1.广西可信软件重点实验室(桂林电子科技大学),广西桂林 541004
    2.桂林电子科技大学 机电工程学院,广西桂林 541004
  • 通讯作者: 刘华东(1978—)
  • 作者简介:徐周波(1976—),女,浙江奉化人,教授,博士,CCF高级会员,主要研究方向:符号计算、智能规划、约束求解; 杨健(1994—),男,江苏宿迁人,硕士研究生,主要研究方向:图数据挖掘; 刘华东(1978—),男,江西瑞金人,讲师,硕士,主要研究方向:图数据表示; 黄文文(1994—),男,安徽芜湖人,硕士研究生,主要研究方向:模式挖掘。
  • 基金资助:



Large amount of uncertainty in PPI network and the incompleteness of the known protein complex data add inaccuracy to the methods only considering the topological structural information to search or performing supervised learning to the known complex data. In order to solve the problem, a search method called XGBoost model for Predicting protein complex (XGBP) was proposed. Firstly, feature extraction was performed based on the topological structural information of complexes. Then, the extracted features were trained by XGBoost model. Finally, a mapping relationship between features and protein complexes was constructed by combining topological structural information and supervised learning method, in order to improve the accuracy of protein complex prediction. Comparisons were performed with eight popular unsupervised algorithms: Markov CLustering (MCL), Clustering based on Maximal Clique (CMC), Core-Attachment based method (COACH), Fast Hierarchical clustering algorithm for functional modules discovery in Protein Interaction (HC-PIN), Cluster with Overlapping Neighborhood Expansion (ClusterONE), Molecular COmplex DEtection (MCODE), Detecting Complex based on Uncertain graph model (DCU), Weighted COACH (WCOACH); and three supervisedmethods Bayesian Network (BN), Support Vector Machine (SVM), Regression Model (RM). The results show that the proposed algorithm has good performance in terms of precision, sensitivity and F-measure.

Key words: protein complex, XGBoost model, Protein-Protein Interaction (PPI) network, graph data mining, machine learning



关键词: 蛋白质复合物, XGBoost模型, 蛋白质相互作用网络, 图数据挖掘, 机器学习

CLC Number: