Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
王思秀1,陈新周2,李敏3,赵晓敏3
通讯作者:
基金资助:
Abstract: Most studies in drug repositioning rely on the "similar drugs treat similar diseases" hypothesis, requiring similarity data between diseases and drugs. However, such data faces challenges including acquisition difficulties, significant discrepancies in computational results, and the inability to conduct research when data is missing. To address these issues, this study proposes an Empirical Meta-Path and Principal Component Analysis-based Drug Repositioning Method (EMP-PCA) that achieves drug-disease association prediction without similarity data. The method first introduces five meta-paths corresponding to different interaction datasets to generate an exchange matrix for mining multi-source correlation information. Subsequently, it employs principal component analysis to identify variance-maximizing directions, performs dimensionality reduction through data projection, and retains core information while simplifying computations. Finally, gradient-boosting trees are used to construct base classifiers for each meta-path, which are then combined into an ensemble classifier to effectively integrate multi-source data. Experimental comparisons with similarity-based methods like DRHGCN, ANMF, and LAGCN demonstrate that EMP-PCA can effectively fuse multi-source interaction data between drugs, proteins, and diseases without requiring similarity data. The method outperforms competing approaches in key evaluation metrics including AUC, precision, and recall rates, effectively resolving data dependency and missing value issues inherent in similarity-based methods. It exhibits superior association prediction performance and strong practical application value.
Key words: drug repurposing, meta-path, exchange matrix, principal component analysis, gradient boosting tree, interaction data, ensemble classifier
摘要: 针对药物重定位领域多数研究依赖“相似药物治疗相似疾病”假设,需使用疾病、药物等相似性数据,而此类数据存在获取困难、不同计算方法结果差异大,且数据缺失时研究无法开展的难题,本研究提出一种基于元路径与主成分分析的药物重定位方法(EMP-PCA),无需相似性数据即可实现药物-疾病关联预测。该方法首先引入对应不同相互作用数据的五条元路径,生成交换矩阵以挖掘多源关联信息。再通过主成分分析找寻方差最大方向,对数据进行投影降维,在简化计算的同时保留核心信息。最后利用梯度提升树方法为每条元路径构建基分类器,并组合成集成分类器,实现多源数据的有效整合。实验中,将EMP-PCA与DRHGCN、ANMF、LAGCN等基于相似性数据的药物重定位方法进行对比。实验结果表明,EMP-PCA无需引入任何相似性数据,即可有效融合药物、蛋白质与疾病间的多源相互作用数据。且在AUC、精确率、召回率等关键评价指标上均优于对比方法,能有效解决基于相似性方法的数据依赖与缺失难题,具备优异的关联预测性能和较强的实际应用价值。
关键词: 药物重定位, 元路径, 交换矩阵, 主成分分析, 梯度提升树, 相互作用数据, 集成分类器
王思秀 陈新周 李敏 赵晓敏. 基于元路径与主成分分析的药物重定位方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025070819.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025070819