Protein complex identification algorithm based on XGboost and topological structural information
XU Zhoubo1, YANG Jian1, LIU Huadong1,2, HUANG Wenwen1
1.Guangxi Key Laboratory of Trusted Software (Guilin University of Electronic Technology), GuilinGuangxi 541004, China
2.School of Mechanical and Electrical Engineering, Guilin University of Electronic Technology, GuilinGuangxi 541004, China
Large amount of uncertainty in PPI network and the incompleteness of the known protein complex data add inaccuracy to the methods only considering the topological structural information to search or performing supervised learning to the known complex data. In order to solve the problem, a search method called XGBoost model for Predicting protein complex (XGBP) was proposed. Firstly, feature extraction was performed based on the topological structural information of complexes. Then, the extracted features were trained by XGBoost model. Finally, a mapping relationship between features and protein complexes was constructed by combining topological structural information and supervised learning method, in order to improve the accuracy of protein complex prediction. Comparisons were performed with eight popular unsupervised algorithms: Markov CLustering (MCL), Clustering based on Maximal Clique (CMC), Core-Attachment based method (COACH), Fast Hierarchical clustering algorithm for functional modules discovery in Protein Interaction (HC-PIN), Cluster with Overlapping Neighborhood Expansion (ClusterONE), Molecular COmplex DEtection (MCODE), Detecting Complex based on Uncertain graph model (DCU), Weighted COACH (WCOACH); and three supervisedmethods Bayesian Network (BN), Support Vector Machine (SVM), Regression Model (RM). The results show that the proposed algorithm has good performance in terms of precision, sensitivity and F-measure.
徐周波, 杨健, 刘华东, 黄文文. 基于XGBoost与拓扑结构信息的蛋白质复合物识别算法[J]. 计算机应用, 2020, 40(5): 1510-1514.
XU Zhoubo, YANG Jian, LIU Huadong, HUANG Wenwen. Protein complex identification algorithm based on XGboost and topological structural information. Journal of Computer Applications, 2020, 40(5): 1510-1514.
1 毛伊敏,刘银萍,梁田,等 . 基于模糊谱聚类的不确定蛋白质相互作用网络功能模块挖掘[J]. 计算机应用, 2019, 39(4):1032-1040. MAO Y M , LIU Y P , LIANG T , et al . Functional module mining in uncertain protein-protein interaction network based on fuzzy spectral clustering[J]. Journal of Computer Applications,2019, 39(4):1032-1040.
2 DONGEN S VAN . A cluster algorithm for graphs[R]. Amsterdam: Centrum Wiskunde and Informatica,2000:1-40.
3 BADER G D , HOGUE C W V . An automated method for finding molecular complexes in large protein interaction networks[J]. BMC Bioinformatics, 2003, 4(1): No.2.
4 NEPUSZ T , YU H , PACCANARO A . Detecting overlapping protein complexes in protein-protein interaction networks[J]. Nature Methods, 2012, 9(5):471-472.
5 LIU G , WONG L , CHUA H N . Complex discovery from weighted PPI networks[J]. Bioinformatics, 2009, 25(15):1891-1897
6 WANG J , LI M , CHEN J , et al . A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011, 8(3):607-620.
7 WU M , LI X , KWOH C K , et al . A core-attachment based method to detect protein complexes in PPI networks[J]. BMC Bioinformatics, 2009, 10(1): No.169.
8 ZHAO B , WANG J , LI M , et al . Detecting protein complexes based on uncertain graph model[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014, 11(3):486-497.
9 KOUHSAR M , ZARE-MIRAKABAD F , JAMALI Y . WCOACH: protein complex prediction in weighted PPI networks[J]. Genes and Genetic Systems, 2016, 90(5):317-324.
10 DONG Y , SUN Y , QIN C . Predicting protein complexes using a supervised learning method combined with local structural information[J]. PLoS ONE, 2018, 13(3):No.e0194124.
11 唐家琪,吴璟莉 .基于PPI网络与机器学习的蛋白质功能预测方法[J].计算机应用, 2018, 38(3):722-727. (TANG J Q, WU J L. Protein function prediction method based on PPI network and machine learning[J]. Journal of Computer Applications, 2018, 38(3):722-727.)
12 QI Y , BALEM F , FALOUTSOS C , et al . Protein complex identification by supervised graph local clustering[J]. Bioinformatics, 2008, 24(13):i250-i268.
13 YU F , YANG Z , TANG N , et al . Predicting protein complex in protein interaction network — a supervised learning based method[J]. BMC Systems Biology, 2014, 8(S3): No.S4.
14 YU F , YANG Z , HU X , et al . Protein complex detection in PPI networks based on data integration and supervised learning method[J]. BMC Bioinformatics, 2015, 16(Suppl 12): S3.
15 SPIRIN V , MIRNY L A . Protein complexes and functional modules in molecular networks[J]. Proceedings of the National Academy of Sciences, 2003, 100(21):12123-12128.
16 CHEN T , GUESTRIN C . XGBoost: a scalable tree boosting system[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 785-794.
17 MEWES H W , AMID C , ARNOLD R , et al . MIPS: analysis and annotation of proteins from whole genomes[J]. Nucleic Acids Research, 2004, 34(S1):D41-D44.
18 SALWINSKI L , MILLER C S , SMITH A J , et al . The database of interacting proteins: 2004 update[J]. Nucleic Acids Research, 2004, 33(S1): D449-D451
19 KROGAN N J , CAGNEY G , YU H , et al . Global landscape of protein complexes in the yeast Saccharomyces cerevisiae[J]. Nature, 2006, 440(7084):637-643.
20 WANG R , LIU G , WANG C . Identifying protein complexes based on an edge weight algorithm and core-attachment structure[J]. BMC Bioinformatics, 2019, 20(1): No.471.