《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (6): 1979-1986.DOI: 10.11772/j.issn.1001-9081.2022050727
所属专题: 前沿与综合应用
• 前沿与综合应用 • 上一篇
收稿日期:
2022-05-23
修回日期:
2022-09-01
接受日期:
2022-09-05
发布日期:
2022-09-23
出版日期:
2023-06-10
通讯作者:
王真梅
作者简介:
张奕(1977—),女,江西九江人,教授,博士,主要研究方向:生物信息学、机器学习、服务计算基金资助:
Yi ZHANG1,2, Zhenmei WANG1()
Received:
2022-05-23
Revised:
2022-09-01
Accepted:
2022-09-05
Online:
2022-09-23
Published:
2023-06-10
Contact:
Zhenmei WANG
About author:
ZHANG Yi, born in 1977, Ph. D., professor. Her research interests include bioinformatics, machine learning, service computing.
Supported by:
摘要:
大部分现有的用于预测环状RNA(circRNA)与疾病之间关联关系的计算模型通常使用circRNA和疾病相关数据等生物学知识,配合已知的circRNA-疾病关联信息对来挖掘出潜在的关联信息。然而这些模型受已知关联构成的网络稀疏性、负样本过少等固有问题的影响,导致预测性能不佳。因此,在图自动编码器基础上引入归纳式矩阵补全及自注意力机制进行二阶段融合,以实现circRNA-疾病关联预测,由此构建的模型叫GIS-CDA (Graph auto-encoder combining Inductive matrix complementation and Self-attention mechanism for predicting CircRNA-Disease Association)。首先,计算circRNA集成和疾病集成的相似性,并利用图自动编码器学习circRNA和疾病的潜在特征,以获得低维表征;接着,将学习到的特征输入归纳式矩阵补全,以提高节点之间的相似性和依赖性;然后,将circRNA特征矩阵和疾病特征矩阵整合为circRNA-疾病特征矩阵,以增强预测的稳定性和精确性;最后,引入自注意力机制,从特征矩阵中提取重要特征,并减少对其他生物信息的依赖。五折交叉和十折交叉验证的结果显示:GIS-CDA获得的平均接收者操作特征曲线下面积(AUROC)值分别为0.930 3和0.939 3,前者比基于KATZ测度的人类circRNA-疾病关联预测模型(KATZHCDA)、基于深度矩阵分解方法的circRNA-疾病关联(DMFCDA)预测模型、RWR(Random Walk with Restart)和基于加速归纳式矩阵补全的circRNA-疾病关联(SIMCCDA)预测模型分别高出了13.19、35.73、13.28和5.01个百分点;GIS-CDA的精确率-召回率曲线下面积(AUPR)值分别为0.227 1和0.234 0,前者比上述对比模型分别高出了21.72、22.43、21.96和13.86个百分点。此外,在circRNADisease、circ2Disease和circR2Disease数据集上的消融实验和案例研究进一步验证了GIS-CDA在预测circRNA-疾病的潜在关联方面具有较好的性能。
中图分类号:
张奕, 王真梅. 图自动编码器上二阶段融合实现的环状RNA-疾病关联预测[J]. 计算机应用, 2023, 43(6): 1979-1986.
Yi ZHANG, Zhenmei WANG. circRNA-disease association prediction by two-stage fusion on graph auto-encoder[J]. Journal of Computer Applications, 2023, 43(6): 1979-1986.
AUROC | AUPR | |
---|---|---|
32 | 0.902 5 | 0.137 4 |
64 | 0.916 9 | 0.175 9 |
128 | 0.924 6 | 0.216 4 |
256 | 0.9303 | 0.2271 |
512 | 0.911 6 | 0.184 8 |
表1 不同隐藏层维度d下的AUROC、AUPR值
Tab. 1 AUROC、AUPR values with different hidden layer dimension d
AUROC | AUPR | |
---|---|---|
32 | 0.902 5 | 0.137 4 |
64 | 0.916 9 | 0.175 9 |
128 | 0.924 6 | 0.216 4 |
256 | 0.9303 | 0.2271 |
512 | 0.911 6 | 0.184 8 |
AUROC | AUPR | |
---|---|---|
0.1 | 0.916 2 | 0.161 6 |
0.3 | 0.929 2 | 0.202 6 |
0.5 | 0.9303 | 0.2271 |
0.7 | 0.924 3 | 0.213 4 |
0.9 | 0.921 6 | 0.202 2 |
表2 不同平衡系数α下的AUROC、AUPR值
Tab. 2 AUROC、AUPR values with different balance coefficient α
AUROC | AUPR | |
---|---|---|
0.1 | 0.916 2 | 0.161 6 |
0.3 | 0.929 2 | 0.202 6 |
0.5 | 0.9303 | 0.2271 |
0.7 | 0.924 3 | 0.213 4 |
0.9 | 0.921 6 | 0.202 2 |
AUROC | AUPR | |
---|---|---|
0.001 | 0.885 5 | 0.136 0 |
0.005 | 0.921 3 | 0.184 1 |
0.01 | 0.9303 | 0.2271 |
0.05 | 0.882 0 | 0.105 2 |
0.1 | 0.736 4 | 0.043 5 |
表3 不同学习率l下的AUROC、AUPR值
Tab. 3 AUROC、AUPR values with different learning rate l
AUROC | AUPR | |
---|---|---|
0.001 | 0.885 5 | 0.136 0 |
0.005 | 0.921 3 | 0.184 1 |
0.01 | 0.9303 | 0.2271 |
0.05 | 0.882 0 | 0.105 2 |
0.1 | 0.736 4 | 0.043 5 |
实验分组 | 自注意力机制 | 重建损失函数 |
---|---|---|
第1组 | 引入 | 无 |
第2组 | 无 | 引入 |
第3组 | 无 | 无 |
表4 消融实验对比设置
Tab. 4 Comparison setting of ablation experiment
实验分组 | 自注意力机制 | 重建损失函数 |
---|---|---|
第1组 | 引入 | 无 |
第2组 | 无 | 引入 |
第3组 | 无 | 无 |
数据集名称 | 疾病数 | circRNA数 | 关联数 |
---|---|---|---|
circRNADisease | 34 | 223 | 241 |
circ2Disease | 46 | 215 | 240 |
circR2Disease | 88 | 585 | 650 |
表5 不同数据集数据细节
Tab. 5 Data details of different datasets
数据集名称 | 疾病数 | circRNA数 | 关联数 |
---|---|---|---|
circRNADisease | 34 | 223 | 241 |
circ2Disease | 46 | 215 | 240 |
circR2Disease | 88 | 585 | 650 |
模型 | 时间/s | 模型 | 时间/s |
---|---|---|---|
GIS-CDA | 55 | RWR | 7 826 |
KATZHCDA | 8 010 | SIMCCDA | 5 |
DMFCDA | 116 |
表6 所提模型与现有模型的AUROC值、AUPR值和运行时间对比
Tab.6 Comparison of AUROC values, AUPR values and running time of the proposed model and existing models
模型 | 时间/s | 模型 | 时间/s |
---|---|---|---|
GIS-CDA | 55 | RWR | 7 826 |
KATZHCDA | 8 010 | SIMCCDA | 5 |
DMFCDA | 116 |
排名 | circRNA名称 | PMID编号 |
---|---|---|
1 | circPVT1/hsa_circ_0001821 | 31865777 |
2 | hsa_circRNA_100782/circHIPK3/ hsa_circ_0000284 | 32833501 |
3 | hsa_circ_0001313/circCCDC66 | 34252882 |
4 | circ-Foxo3/hsa_circ_0006404 | 33833780 |
5 | Cir-ITCH/hsa_circ_0001141/ hsa_circ_001763 | 29887952 |
6 | hsa_circRNA_103110/hsa_circ_103110/ hsa_circ_0004771 | 暂无证据 |
7 | CDR1as/ciRS-7/hsa_circ_0001946 | 32894144 |
8 | circPRKCI/hsa_circ_0067934 | 31409777 |
9 | hsa_circRNA_102049 | 暂无证据 |
10 | hsa_circ_0002495 | 暂无证据 |
表7 前10个与神经胶质癌相关的circRNA
Tab. 7 Top 10 circRNAs associated with glioma
排名 | circRNA名称 | PMID编号 |
---|---|---|
1 | circPVT1/hsa_circ_0001821 | 31865777 |
2 | hsa_circRNA_100782/circHIPK3/ hsa_circ_0000284 | 32833501 |
3 | hsa_circ_0001313/circCCDC66 | 34252882 |
4 | circ-Foxo3/hsa_circ_0006404 | 33833780 |
5 | Cir-ITCH/hsa_circ_0001141/ hsa_circ_001763 | 29887952 |
6 | hsa_circRNA_103110/hsa_circ_103110/ hsa_circ_0004771 | 暂无证据 |
7 | CDR1as/ciRS-7/hsa_circ_0001946 | 32894144 |
8 | circPRKCI/hsa_circ_0067934 | 31409777 |
9 | hsa_circRNA_102049 | 暂无证据 |
10 | hsa_circ_0002495 | 暂无证据 |
排名 | circRNA名称 | PMID编号 |
---|---|---|
1 | CirITCH/hsa_circ_0001141/ hsa_circ_001763 | 33060778 |
2 | hsa_circ_0001313/circCCDC66 | 32253030 |
3 | hsa_circ_0007534 | 32419229 |
4 | hsa_circRNA_103110/hsa_circ_103110/ hsa_circ_0004771 | 29098316 |
5 | circPRKCI/hsa_circ_0067934 | 35113408 |
6 | circ-Foxo3/hsa_circ_0006404 | 33833780 |
7 | circZFR/hsa_circRNA_103809/hsa_circ_0072088 | 32572921 |
8 | hsa_circRNA_102049 | 暂无证据 |
9 | circSMARCA5/hsa_circ_0001445 | 30956729 |
10 | circRNA_102913/hsa_circ_0058058 | 暂无证据 |
表8 前10个与胃癌相关的circRNA
Tab. 8 Top 10 circRNAs associated with gastric cancer
排名 | circRNA名称 | PMID编号 |
---|---|---|
1 | CirITCH/hsa_circ_0001141/ hsa_circ_001763 | 33060778 |
2 | hsa_circ_0001313/circCCDC66 | 32253030 |
3 | hsa_circ_0007534 | 32419229 |
4 | hsa_circRNA_103110/hsa_circ_103110/ hsa_circ_0004771 | 29098316 |
5 | circPRKCI/hsa_circ_0067934 | 35113408 |
6 | circ-Foxo3/hsa_circ_0006404 | 33833780 |
7 | circZFR/hsa_circRNA_103809/hsa_circ_0072088 | 32572921 |
8 | hsa_circRNA_102049 | 暂无证据 |
9 | circSMARCA5/hsa_circ_0001445 | 30956729 |
10 | circRNA_102913/hsa_circ_0058058 | 暂无证据 |
1 | SALZMAN J, CHEN R E, OLSEN M N, et al. Cell-type specific features of circular RNA expression[J]. PLoS Genetics, 2013, 9(12): No.e1003777. 10.1371/journal.pgen.1003777 |
2 | 付瑶. 环状RNA——一个新的非编码RNA的功能与特性[J]. 吉林畜牧兽医, 2017, 38(11):11-13. 10.3969/j.issn.1672-2078.2017.11.003 |
FU Y. Function and properties of circular RNA — a new non-coding RNA[J]. Jilin Animal Husbandry and Veterinary Medicine, 2017, 38(11):11-13. 10.3969/j.issn.1672-2078.2017.11.003 | |
3 | MENG S J, ZHOU H C, FENG Z Y, et al. circRNA: functions and properties of a novel potential biomarker for cancer [J]. Molecular Cancer, 2017, 16: No.94. 10.1186/s12943-017-0663-2 |
4 | 张懿恋,张诗雨,胡吉,等. 环状RNA在糖尿病及其慢性并发症中的机制研究[J]. 中国医学科学院学报, 2022, 44(3):521-528. 10.3881/j.issn.1000-503X.13436 |
ZHANG Y L, ZHANG S Y, HU J, et al. Circular RNA in diabetes and its complications[J]. Acta Academiae Medicinae Sinicae, 2022, 44(3):521-528. 10.3881/j.issn.1000-503X.13436 | |
5 | FAN C Y, LEI X J, WU F X. Prediction of circRNA-disease associations using KATZ model based on heterogeneous networks[J]. International Journal of Biological Sciences, 2018, 14(14): 1950-1959. 10.7150/ijbs.28260 |
6 | XIAO Q, YU H M, ZHONG J C, et al. An in-silico method with graph-based multi-label learning for large-scale prediction of circRNA-disease associations[J]. Genomics, 2020, 112(5): 3407-3415. 10.1016/j.ygeno.2020.06.017 |
7 | LEI X J, FANG Z Q, CHEN L N, et al. PWCDA: path weighted method for predicting circRNA-disease associations[J]. International Journal of Molecular Sciences, 2018, 19(11): No.3410. 10.3390/ijms19113410 |
8 | LEI X J, BIAN C. Integrating random walk with restart and k-nearest neighbor to identify novel circRNA-disease association [J]. Scientific Reports, 2020, 10: No.1943. 10.1038/s41598-020-59040-0 |
9 | YAN C, WANG J X, WU F X. DWNN-RLS: regularized least squares method for predicting circRNA-disease associations [J]. BMC Bioinformatics, 2018, 19(S19): No.520. 10.1186/s12859-018-2522-6 |
10 | DING Y L, CHEN B L, LEI X J, et al. Predicting novel circRNA-disease associations based on random walk and logistic regression model[J]. Computational Biology and Chemistry, 2020, 87: No.107287. 10.1016/j.compbiolchem.2020.107287 |
11 | WANG L, YOU Z H, HUANG Y A, et al. An efficient approach based on multi-sources information to predict circRNA-disease associations using deep convolutional neural network[J]. Bioinformatics, 2020, 36(13): 4038-4046. 10.1093/bioinformatics/btz825 |
12 | FAN C Y, LEI X J, PAN Y. Prioritizing circRNA-disease associations with convolutional neural network based on multiple similarity feature fusion[J]. Frontiers in Genetics, 2020, 11: No.540751. 10.3389/fgene.2020.540751 |
13 | LI G H, WANG D C, ZHANG Y J, et al. Using graph attention network and graph convolutional network to explore human circrRNA-disease associations based on multi-source data[J]. Frontiers in Genetics, 2022, 13: No.829937. 10.3389/fgene.2022.829937 |
14 | DEEPTHI K, JEREESH A S. Inferring potential circRNA-disease associations via deep autoencoder-based classification[J]. Molecular Diagnosis and Therapy, 2021, 25(1): 87-97. 10.1007/s40291-020-00499-y |
15 | CHEN X, WANG L, QU J, et al. Predicting miRNA-disease association based on inductive matrix completion[J]. Bioinformatics, 2018, 34(24): 4256-4265. 10.1093/bioinformatics/bty503 |
16 | LU C Q, ZENG M, ZHANG F H, et al. Deep matrix factorization improves prediction of human circRNA-disease associations[J]. IEEE Journal of Biomedical and Health Informatics, 2021, 25(3): 891-899. 10.1109/jbhi.2020.2999638 |
17 | LI M L, LIU M Y, BIN Y N, et al. Prediction of circRNA-disease associations based on inductive matrix completion[J]. BMC Medical Genomics, 2020, 13(S5): No.42. 10.1186/s12920-020-0679-0 |
18 | WANG D, WANG J, LU M, et al. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases [J]. Bioinformatics, 2010, 26(13): 1644-1650. 10.1093/bioinformatics/btq241 |
19 | LI Z W, LI J S, NIE R, et al. A graph auto-encoder model for miRNA-disease associations prediction[J]. Briefings in Bioinformatics, 2021, 22(4): No.bbaa240. 10.1093/bib/bbaa240 |
20 | LI J, ZHANG S, LIU T, et al. Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction [J]. Bioinformatics, 2020, 36(8): 2538-2546. 10.1093/bioinformatics/btz965 |
21 | BIAN C, LEI X J, WU F X. GATCDA: predicting circRNA-disease associations based on graph attention network[J]. Cancers, 2021, 13(11): No.2595. 10.3390/cancers13112595 |
22 | JIN C, SHI Z W, LIN K, et al. Predicting miRNA-disease association based on neural inductive matrix completion with graph autoencoders and self-attention mechanism[J]. Biomolecules, 2022, 12(1): No.64. 10.3390/biom12010064 |
23 | FAN C Y, LEI X J, FANG Z Q, et al. circR2Disease: a manually curated database for experimentally supported circular RNAs associated with various diseases [J]. Database, 2018, 2018: No.bay044. 10.1093/database/bay044 |
24 | ZHAO Z, WANG K Y, WU F, et al. circRNA disease: a manually curated database of experimentally supported circRNA-disease associations[J]. Cell Death and Disease, 2018, 9: No.475. 10.1038/s41419-018-0503-3 |
25 | YAO D X, ZHANG L, ZHENG M Y, et al. circ2Disease: a manually curated database of experimentally validated circRNAs in human disease [J]. Scientific Reports, 2018, 8: No.11018. 10.1038/s41598-018-29360-3 |
26 | VURAL H, KAYA M, ALHAJJ R. A model based on random walk with restart to predict circRNA-disease associations on heterogeneous network [C]// Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. New York: ACM, 2019: 929-932. 10.1145/3341161.3343514 |
27 | ZHU J L, YE J L, ZHANG L, et al. Differential expression of circular RNAs in glioblastoma multiforme and its correlation with prognosis [J]. Translational Oncology, 2017, 10(2): 271-279. 10.1016/j.tranon.2016.12.006 |
28 | DANG Y, LAN F H, OUYANG X J, et al. Expression and clinical significance of long non-coding RNA HNF1A-AS1 in human gastric cancer [J]. World Journal of Surgical Oncology, 2015, 13: No.302. 10.1186/s12957-015-0706-3 |
29 | CHI G N, YANG F W, XU D H, et al. Silencing hsa_circ_PVT1 (circPVT1) suppresses the growth and metastasis of glioblastoma multiforme cells by up-regulation of miR-199a-5p [J]. Artificial Cells, Nanomedicine, and Biotechnology, 2020, 48(1):188-196. 10.1080/21691401.2019.1699825 |
30 | YIN H Q CUI X. Knockdown of circHIPK3 facilitates temozolomide sensitivity in glioma by regulating cellular behaviors through miR-524-5p/KIF2A-mediated PI3K/AKT pathway[J]. Cancer Biotherapy and Radiopharmaceuticals, 2021, 36(7): 556-567. 10.1089/cbr.2020.3575 |
31 | PENG Y, WANG H H. Cir-ITCH inhibits gastric cancer migration, invasion and proliferation by regulating the Wnt/β-catenin pathway [J]. Scientific Reports, 2020, 10: No.17443. 10.1038/s41598-020-74452-8 |
32 | ZHANG Q, MIAO Y C, FU Q S, et al. CircRNACCDC66 regulates cisplatin resistance in gastric cancer via the miR-618/BCL2 axis[J]. Biochemical and Biophysical Research Communications, 2020, 526(3): 713-720. 10.1016/j.bbrc.2020.03.156 |
[1] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[2] | 李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738. |
[3] | 刘越, 刘芳, 武奥运, 柴秋月, 王天笑. 基于自注意力机制与图卷积的3D目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1972-1977. |
[4] | 徐泽鑫, 杨磊, 李康顺. 较短的长序列时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1824-1831. |
[5] | 黄荣, 宋俊杰, 周树波, 刘浩. 基于自监督视觉Transformer的图像美学质量评价方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1269-1276. |
[6] | 罗歆然, 李天瑞, 贾真. 基于自注意力机制与词汇增强的中文医学命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 385-392. |
[7] | 黄子麒, 胡建鹏. 实体类别增强的汽车领域嵌套命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 377-384. |
[8] | 仇丽青, 苏小盼. 个性化多层兴趣提取点击率预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3411-3418. |
[9] | 杨兴耀, 沈洪涛, 张祖莲, 于炯, 陈嘉颖, 王东晓. 基于层级过滤器和时间卷积增强自注意力网络的序列推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3090-3096. |
[10] | 李言博, 何庆, 陆顺意. 融合语义和句法信息的方面情感三元组抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3275-3280. |
[11] | 陈丽安, 过弋. 融合个体偏差信息的文本情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 145-151. |
[12] | 陈佳, 张鸿. 基于特征增强和语义相关性匹配的图像文本检索方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 16-23. |
[13] | 史含笑, 王雷春. 结合LSTM和自注意力机制的图卷积网络短期电力负荷预测[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 311-317. |
[14] | 袁国龙, 张玉金, 刘洋. 基于残差反馈和自注意力的图像篡改取证网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2925-2931. |
[15] | 孙浩, 曹健, 李海生, 毛典辉. 基于改进胶囊网络的会话型推荐模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1043-1049. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||