Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (4): 1032-1040.DOI: 10.11772/j.issn.1001-9081.2018091880

Previous Articles     Next Articles

Functional module mining in uncertain protein-protein interaction network based on fuzzy spectral clustering

MAO Yimin1, LIU Yinping1, LIANG Tian2, MAO Dinghui3   

  1. 1. School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou Jiangxi, 341000, China;
    2. College of Applied Science, Jiangxi University of Science and Technology, Ganzhou Jiangxi, 341000, China;
    3
  • Received:2018-09-10 Revised:2018-11-04 Online:2019-04-10 Published:2019-04-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (41562019), the Science and Technology Project of Education Department of Jiangxi Province (GJJ161566).

基于模糊谱聚类的不确定蛋白质相互作用网络功能模块挖掘

毛伊敏1, 刘银萍1, 梁田2, 毛丁慧3   

  1. 1. 江西理工大学 信息工程学院, 江西 赣州 341000;
    2. 江西理工大学 应用科学学院, 江西 赣州 341000;
    3. 中陕核工业集团 二一一大队有限公司, 西安 710000
  • 通讯作者: 刘银萍
  • 作者简介:毛伊敏(1970-),女,江西赣州人,教授,博士,主要研究方向:数据挖掘、地理信息系统;刘银萍(1993-),女,山西朔州人,硕士研究生,主要研究方向:数据挖掘、地理信息系统;梁田(1983-),女,江西赣州人,讲师,博士,主要研究方向:数据挖掘;毛丁慧(1993-),女,新疆伊宁人,工程师,主要研究方向:数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(41562019);江西省教育厅科技项目(GJJ161566)。

Abstract: Aiming at the problem that Protein-Protein Interaction (PPI) network functional module mining method based on spectral clustering and Fuzzy C-Means (FCM) clustering has low accuracy and low running efficiency, and is susceptible to false positive, a method for Functional Module mining in uncertain PPI network based on Fuzzy Spectral Clustering (FSC-FM) was proposed. Firstly, in order to overcome the effect of false positives, an uncertain PPI network was constructed, in which every protein-protein interaction was endowed with a existence probability measure by using edge aggregation coefficient. Secondly, based on edge aggregation coefficient and flow distance, the similarity calculation of spectral clustering was modified using Flow distance of Edge Clustering coefficient (FEC) strategy to overcome the sensitivity problem of the spectral clustering to the scaling parameters. Then the spectral clustering algorithm was used to preprocess the uncertain PPI network data, reducing the dimension of the data and improving the accuracy of clustering. Thirdly, Density-based Probability Center Selection (DPCS) strategy was designed to solve the problem that FCM algorithm was sensitive to the initial cluster center and clustering numbers, and the processed PPI data was clustered by using FCM algorithm to improve the running efficiency and sensitivity of the clustering. Finally, the mined functional module was filtered by Edge-Expected Density (EED) strategy. Experiments on yeast DIP dataset show that, compared with Detecting protein Complexes based on Uncertain graph model (DCU) algorithm, FSC-FM has F-measure increased by 27.92%, running efficiency increased by 27.92%; compared with an uncertain model-based approach for identifying Dynamic protein Complexes in Uncertain protein-protein interaction Networks (CDUN), Evolutionary Algorithm (EA) and Medical Gene or Protein Prediction Algorithm (MGPPA), FSC-FM also has higher F-measure and running efficiency. The experimental results show that FSC-FM is suitable for the functional module mining in the uncertain PPI network.

Key words: uncertain data, Protein-Protein Interaction (PPI), spectral clustering algorithm, Fuzzy C-Means (FCM), functional module, expected density

摘要: 针对谱聚类融合模糊C-means(FCM)聚类的蛋白质相互作用(PPI)网络功能模块挖掘方法准确率不高、执行效率较低和易受假阳性影响的问题,提出一种基于模糊谱聚类的不确定PPI网络功能模块挖掘(FSC-FM)方法。首先,构建一个不确定PPI网络模型,使用边聚集系数给每一条蛋白质交互作用赋予一个存在概率测度,克服假阳性对实验结果的影响;第二,利用基于边聚集系数流行距离(FEC)策略改进谱聚类中的相似度计算,解决谱聚类算法对尺度参数敏感的问题,进而利用谱聚类算法对不确定PPI网络数据进行预处理,降低数据的维数,提高聚类的准确率;第三,设计基于密度的概率中心选取策略(DPCS)解决模糊C-means算法对初始聚类中心和聚类数目敏感的问题,并对预处理后的PPI数据进行FCM聚类,提高聚类的执行效率以及灵敏度;最后,采用改进的边期望稠密度(EED)对挖掘出的蛋白质功能模块进行过滤。在酵母菌DIP数据集上运行各个算法可知,FSC-FM与基于不确定图模型的检测蛋白质复合物(DCU)算法相比,F-measure值提高了27.92%,执行效率提高了27.92%;与在动态蛋白质相互作用网络中识别复合物的方法(CDUN)、演化算法(EA)、医学基因或蛋白质预测算法(MGPPA)相比也有更高的F-measure值和执行效率。实验结果表明,在不确定PPI网络中,FSC-FM适合用于功能模块的挖掘。

关键词: 不确定数据, 蛋白质相互作用, 谱聚类算法, 模糊C-means, 功能模块, 期望稠密度

CLC Number: