Drug repositioning method based on  meta-path and principal component analysis

doi:10.11772/j.issn.1001-9081.2025070819

Abstract

Abstract: Most studies in drug repositioning rely on the "similar drugs treat similar diseases" hypothesis, requiring similarity data between diseases and drugs. However, such data faces challenges including acquisition difficulties, significant discrepancies in computational results, and the inability to conduct research when data is missing. To address these issues, this study proposes an Empirical Meta-Path and Principal Component Analysis-based Drug Repositioning Method (EMP-PCA) that achieves drug-disease association prediction without similarity data. The method first introduces five meta-paths corresponding to different interaction datasets to generate an exchange matrix for mining multi-source correlation information. Subsequently, it employs principal component analysis to identify variance-maximizing directions, performs dimensionality reduction through data projection, and retains core information while simplifying computations. Finally, gradient-boosting trees are used to construct base classifiers for each meta-path, which are then combined into an ensemble classifier to effectively integrate multi-source data. Experimental comparisons with similarity-based methods like DRHGCN, ANMF, and LAGCN demonstrate that EMP-PCA can effectively fuse multi-source interaction data between drugs, proteins, and diseases without requiring similarity data. The method outperforms competing approaches in key evaluation metrics including AUC, precision, and recall rates, effectively resolving data dependency and missing value issues inherent in similarity-based methods. It exhibits superior association prediction performance and strong practical application value.

Key words: drug repurposing, meta-path, exchange matrix, principal component analysis, gradient boosting tree, interaction data, ensemble classifier

摘要： 针对药物重定位领域多数研究依赖“相似药物治疗相似疾病”假设，需使用疾病、药物等相似性数据，而此类数据存在获取困难、不同计算方法结果差异大，且数据缺失时研究无法开展的难题，本研究提出一种基于元路径与主成分分析的药物重定位方法(EMP-PCA)，无需相似性数据即可实现药物-疾病关联预测。该方法首先引入对应不同相互作用数据的五条元路径，生成交换矩阵以挖掘多源关联信息。再通过主成分分析找寻方差最大方向，对数据进行投影降维，在简化计算的同时保留核心信息。最后利用梯度提升树方法为每条元路径构建基分类器，并组合成集成分类器，实现多源数据的有效整合。实验中，将EMP-PCA与DRHGCN、ANMF、LAGCN等基于相似性数据的药物重定位方法进行对比。实验结果表明，EMP-PCA无需引入任何相似性数据，即可有效融合药物、蛋白质与疾病间的多源相互作用数据。且在AUC、精确率、召回率等关键评价指标上均优于对比方法，能有效解决基于相似性方法的数据依赖与缺失难题，具备优异的关联预测性能和较强的实际应用价值。

关键词: 药物重定位, 元路径, 交换矩阵, 主成分分析, 梯度提升树, 相互作用数据, 集成分类器

王思秀陈新周李敏赵晓敏. 基于元路径与主成分分析的药物重定位方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025070819.

[1]	Qingli CHEN, Yuanbo GUO, Chen FANG. Clustering federated learning algorithm for heterogeneous data [J]. Journal of Computer Applications, 2025, 45(4): 1086-1094.
[2]	Junchi GE, Weihua ZHAO. Distance weighted discriminant analysis based on robust principal component analysis for matrix data [J]. Journal of Computer Applications, 2024, 44(7): 2073-2079.
[3]	Yi ZHANG, Gangsheng CAI, Zhenmei WANG. Long non-coding RNA-disease association prediction model based on semantic and global dual attention mechanism [J]. Journal of Computer Applications, 2023, 43(7): 2125-2132.
[4]	Xianbojun FAN, Lijia CHEN, Shen LI, Chenlu WANG, Min WANG, Zan WANG, Mingguo LIU. Robust joint modeling and optimization method for visual manipulators [J]. Journal of Computer Applications, 2023, 43(3): 962-971.
[5]	Yuyu MENG, Jing GUO. Link prediction algorithm based on information entropy improved PCA model [J]. Journal of Computer Applications, 2022, 42(9): 2823-2829.
[6]	Jiafan ZHOU, Yuefeng DU, Baoyan SONG, Xiaoguang LI, Azhu ZHAO, Xujie XIAO. MOOC video recommendation method based on meta-path attention mechanism [J]. Journal of Computer Applications, 2022, 42(6): 1808-1813.
[7]	Li LI, Kexin SHI, Zhenkang REN. Cross-project defect prediction method based on feature selection and TrAdaBoost [J]. Journal of Computer Applications, 2022, 42(5): 1554-1562.
[8]	WANG Zisen, LIANG Ying, LIU Zhengjun, XIE Xiaojie, ZHANG Wei, SHI Hongzhou. Matching method for academic expertise of research project peer review experts [J]. Journal of Computer Applications, 2021, 41(8): 2418-2426.
[9]	WANG Xin, ZHU Haohua, LIU Guangcan. Convolution robust principal component analysis [J]. Journal of Computer Applications, 2021, 41(5): 1314-1318.
[10]	LIN Yixing, TANG Hua. Hybrid recommendation model based on heterogeneous information network [J]. Journal of Computer Applications, 2021, 41(5): 1348-1355.
[11]	Bei BI, Huiyao PAN, Feng CHEN, Jingyan SUI, Yang GAO, Yaojun WANG. Microblog rumor detection model based on heterogeneous graph attention network [J]. Journal of Computer Applications, 2021, 41(12): 3546-3550.
[12]	LU Rongxiu, CHEN Mingming, YANG Hui, ZHU Jianyong. Element component content dynamic monitoring system based on time sequence characteristics of solution images [J]. Journal of Computer Applications, 2021, 41(10): 3075-3081.
[13]	CHEN Lixia, BAN Ying, WANG Xuewen. Background subtraction based on tensor nuclear norm and 3D total variation [J]. Journal of Computer Applications, 2020, 40(9): 2737-2742.
[14]	ZHENG Yanbin, HAN Mengyun, FAN Wenxin. Handwritten Chinese character recognition based on two dimensional principal component analysis and convolutional neural network [J]. Journal of Computer Applications, 2020, 40(8): 2465-2471.
[15]	LI Dongbo, HUANG Lyuwen. Reweighted sparse principal component analysis algorithm and its application in face recognition [J]. Journal of Computer Applications, 2020, 40(3): 717-722.

Drug repositioning method based on meta-path and principal component analysis

基于元路径与主成分分析的药物重定位方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics