Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (11): 3449-3458.DOI: 10.11772/j.issn.1001-9081.2024030398
• Data science and technology • Previous Articles Next Articles
Zhijie LI1, Xuhong LIAO1(), Yuanxiang LI2, Qinglan LI3
Received:
2024-04-08
Revised:
2024-06-28
Accepted:
2024-07-12
Online:
2024-08-05
Published:
2024-11-10
Contact:
Xuhong LIAO
About author:
LI Zhijie, born in 1964, Ph. D., associate professor. His research interests include computational biology, data mining.Supported by:
通讯作者:
廖旭红
作者简介:
李志杰(1964—),男,湖南永兴人,副教授,博士,CCF会员,主要研究方向:计算生物学、数据挖掘基金资助:
CLC Number:
Zhijie LI, Xuhong LIAO, Yuanxiang LI, Qinglan LI. Disease sample classification algorithm by Bayesian network with gene association analysis[J]. Journal of Computer Applications, 2024, 44(11): 3449-3458.
李志杰, 廖旭红, 李元香, 李青蓝. 基于基因关联分析的贝叶斯网络疾病样本分类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3449-3458.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024030398
样本 | g1 | g2 | g3 | g4 | g5 | g6 |
---|---|---|---|---|---|---|
s1(-) | 0.155 | 0.076 | 0.284 | 0.097 | 0.013 | 0.023 |
s2(-) | 0.217 | 0.084 | 0.409 | 0.138 | -0.159 | 0.129 |
s3(-) | 0.375 | 0.115 | -0.201 | 0.254 | -0.094 | -0.181 |
s4(-) | 0.238 | 0.000 | 0.150 | 0.165 | -0.191 | 0.132 |
s5(-) | -0.073 | -0.146 | 0.442 | -0.077 | -0.341 | 0.063 |
s6(-) | 0.394 | 0.909 | 0.443 | 0.818 | 1.070 | 0.227 |
s7(+) | 0.385 | 0.822 | 0.426 | 0.768 | 1.013 | 0.226 |
s8(+) | 0.329 | 0.690 | 0.244 | 0.550 | 0.790 | 0.327 |
s9(+) | 0.384 | 0.730 | 0.066 | 0.529 | 0.852 | 0.313 |
s10(+) | -0.316 | -0.191 | 0.202 | -0.140 | 0.043 | 0.076 |
Tab. 1 Examples of disease gene expression data
样本 | g1 | g2 | g3 | g4 | g5 | g6 |
---|---|---|---|---|---|---|
s1(-) | 0.155 | 0.076 | 0.284 | 0.097 | 0.013 | 0.023 |
s2(-) | 0.217 | 0.084 | 0.409 | 0.138 | -0.159 | 0.129 |
s3(-) | 0.375 | 0.115 | -0.201 | 0.254 | -0.094 | -0.181 |
s4(-) | 0.238 | 0.000 | 0.150 | 0.165 | -0.191 | 0.132 |
s5(-) | -0.073 | -0.146 | 0.442 | -0.077 | -0.341 | 0.063 |
s6(-) | 0.394 | 0.909 | 0.443 | 0.818 | 1.070 | 0.227 |
s7(+) | 0.385 | 0.822 | 0.426 | 0.768 | 1.013 | 0.226 |
s8(+) | 0.329 | 0.690 | 0.244 | 0.550 | 0.790 | 0.327 |
s9(+) | 0.384 | 0.730 | 0.066 | 0.529 | 0.852 | 0.313 |
s10(+) | -0.316 | -0.191 | 0.202 | -0.140 | 0.043 | 0.076 |
样本 | 基因表达值降序排序 | 基因列下标序列 |
---|---|---|
s1(-) | 0.284(g3)→0.155(g1)→0.097(g4)→0.076(g2)→0.023(g6)→0.013(g5) | 3→1→4→2→6→5 |
s2(-) | 0.409(g3)→0.217(g1)→0.138(g4)→0.129(g6)→0.084(g2)→-0.159(g5) | 3→1→4→6→2→5 |
s3(-) | 0.375(g1)→0.254(g4)→0.115(g2)→-0.094(g5)→-0.181(g6)→-0.201(g3) | 1→4→2→5→6→3 |
s4(-) | 0.238(g1)→0.165(g4)→0.150(g3)→0.132(g6)→0.000(g2)→-0.191(g5) | 1→4→3→6→2→5 |
s5(-) | 0.442(g3)→0.063(g6)→-0.073(g1)→-0.077(g4)→-0.146(g2)→-0.341(g5) | 3→6→1→4→2→5 |
s6(-) | 1.070(g5)→0.909(g2)→0.818(g4)→0.443(g3)→0.394(g1)→0.227(g6) | 5→2→4→3→1→6 |
s7(+) | 1.013(g5)→0.822(g2)→0.768(g4)→0.426(g1)→0.385(g6)→0.226(g3) | 5→2→4→1→6→3 |
s8(+) | 0.790(g5)→0.690(g2)→0.550(g4)→0.329(g1)→0.327(g6)→0.244(g3) | 5→2→4→1→6→3 |
s9(+) | 0.852(g5)→0.730(g2)→0.529(g4)→0.384(g1)→0.313(g6)→0.066(g3) | 5→2→4→1→6→3 |
s10(+) | 0.202(g3)→0.076(g6)→0.043(g5)→-0.140(g4)→-0.191(g2)→-0.316(g1) | 3→6→5→4→2→1 |
Tab. 2 Descending ordering of gene expression values and corresponding subscript sequences
样本 | 基因表达值降序排序 | 基因列下标序列 |
---|---|---|
s1(-) | 0.284(g3)→0.155(g1)→0.097(g4)→0.076(g2)→0.023(g6)→0.013(g5) | 3→1→4→2→6→5 |
s2(-) | 0.409(g3)→0.217(g1)→0.138(g4)→0.129(g6)→0.084(g2)→-0.159(g5) | 3→1→4→6→2→5 |
s3(-) | 0.375(g1)→0.254(g4)→0.115(g2)→-0.094(g5)→-0.181(g6)→-0.201(g3) | 1→4→2→5→6→3 |
s4(-) | 0.238(g1)→0.165(g4)→0.150(g3)→0.132(g6)→0.000(g2)→-0.191(g5) | 1→4→3→6→2→5 |
s5(-) | 0.442(g3)→0.063(g6)→-0.073(g1)→-0.077(g4)→-0.146(g2)→-0.341(g5) | 3→6→1→4→2→5 |
s6(-) | 1.070(g5)→0.909(g2)→0.818(g4)→0.443(g3)→0.394(g1)→0.227(g6) | 5→2→4→3→1→6 |
s7(+) | 1.013(g5)→0.822(g2)→0.768(g4)→0.426(g1)→0.385(g6)→0.226(g3) | 5→2→4→1→6→3 |
s8(+) | 0.790(g5)→0.690(g2)→0.550(g4)→0.329(g1)→0.327(g6)→0.244(g3) | 5→2→4→1→6→3 |
s9(+) | 0.852(g5)→0.730(g2)→0.529(g4)→0.384(g1)→0.313(g6)→0.066(g3) | 5→2→4→1→6→3 |
s10(+) | 0.202(g3)→0.076(g6)→0.043(g5)→-0.140(g4)→-0.191(g2)→-0.316(g1) | 3→6→5→4→2→1 |
原子序列 | 次数 | 原子序列 | 次数 | 原子序列 | 次数 |
---|---|---|---|---|---|
6→5 | 2 | 3→6 | 3 | 4→1 | 2 |
4→3 | 3 | 1→6 | 4 | 3→1 | 4 |
2→4 | 4 | 6→2 | 2 | 4→2 | 4 |
1→4 | 5 | 6→3 | 3 | ||
2→5 | 4 | 5→2 | 4 |
Tab. 3 Frequent gene atomic sequence counts
原子序列 | 次数 | 原子序列 | 次数 | 原子序列 | 次数 |
---|---|---|---|---|---|
6→5 | 2 | 3→6 | 3 | 4→1 | 2 |
4→3 | 3 | 1→6 | 4 | 3→1 | 4 |
2→4 | 4 | 6→2 | 2 | 4→2 | 4 |
1→4 | 5 | 6→3 | 3 | ||
2→5 | 4 | 5→2 | 4 |
序号 | 原子序列 | 关联熵 | 序号 | 原子序列 | 关联熵 |
---|---|---|---|---|---|
1 | 5→2 | 0.805 | 8 | 4→2 | 1.257 |
2 | 4→3 | 0.852 | 9 | 2→4 | 1.318 |
3 | 2→5 | 0.856 | 10 | 3→6 | 1.386 |
4 | 6→5 | 0.946 | 11 | 6→2 | 1.453 |
5 | 6→3 | 1.007 | 12 | 4→1 | 1.568 |
6 | 3→1 | 1.109 | 13 | 1→6 | 2.047 |
7 | 1→4 | 1.159 |
Tab. 4 Frequent gene atomic sequence association entropies
序号 | 原子序列 | 关联熵 | 序号 | 原子序列 | 关联熵 |
---|---|---|---|---|---|
1 | 5→2 | 0.805 | 8 | 4→2 | 1.257 |
2 | 4→3 | 0.852 | 9 | 2→4 | 1.318 |
3 | 2→5 | 0.856 | 10 | 3→6 | 1.386 |
4 | 6→5 | 0.946 | 11 | 6→2 | 1.453 |
5 | 6→3 | 1.007 | 12 | 4→1 | 1.568 |
6 | 3→1 | 1.109 | 13 | 1→6 | 2.047 |
7 | 1→4 | 1.159 |
数据集 | 基因数 | 样本数 | 类别数 | |
---|---|---|---|---|
肿瘤数据集 | Leukemia | 7 129 | 72 | 2 |
Colon | 2 000 | 62 | 2 | |
SRBCT | 2 308 | 83 | 4 | |
Brain | 5 920 | 90 | 5 | |
Breast | 10 | 683 | 2 | |
Duke_bc | 7 129 | 44 | 2 | |
非肿瘤数据集 | Heart | 13 | 270 | 2 |
Mushrooms | 112 | 8 124 | 2 | |
Protein | 357 | 17 766 | 3 |
Tab. 5 Details of disease gene expression datasets
数据集 | 基因数 | 样本数 | 类别数 | |
---|---|---|---|---|
肿瘤数据集 | Leukemia | 7 129 | 72 | 2 |
Colon | 2 000 | 62 | 2 | |
SRBCT | 2 308 | 83 | 4 | |
Brain | 5 920 | 90 | 5 | |
Breast | 10 | 683 | 2 | |
Duke_bc | 7 129 | 44 | 2 | |
非肿瘤数据集 | Heart | 13 | 270 | 2 |
Mushrooms | 112 | 8 124 | 2 | |
Protein | 357 | 17 766 | 3 |
类型 | 被分为正类 | 被分为负类 |
---|---|---|
实际为正类 | TP | FN |
实际为负类 | FP | TN |
Tab. 6 Confusion matrix
类型 | 被分为正类 | 被分为负类 |
---|---|---|
实际为正类 | TP | FN |
实际为负类 | FP | TN |
数据集 | BN_NB | BN_SA | CFWNB | WATAN | BN_CI | BN_ICS | BCGA | |
---|---|---|---|---|---|---|---|---|
肿瘤数据集 | Leukemia | 0.483 | 0.505 | 0.411 | 0.428 | 0.406 | 0.370 | |
Colon | 0.403 | 0.372 | 0.295 | 0.332 | 0.324 | 0.341 | ||
SRBCT | 0.069 | 0.027 | 0.018 | 0.016 | 0.012 | 0.033 | ||
Brain | 0.302 | 0.276 | 0.242 | 0.271 | 0.230 | 0.250 | ||
Breast | 0.280 | 0.359 | 0.320 | 0.320 | 0.320 | 0.249 | ||
Duke_bc | 0.492 | 0.453 | 0.315 | 0.308 | 0.325 | 0.330 | ||
平均RMSE | 0.335 | 0.317 | 0.275 | 0.280 | 0.277 | 0.252 | ||
序号平均值 | 6.000 | 5.000 | 3.833 | 4.167 | 3.000 | 3.667 | 2.333 | |
非肿瘤数据集 | Heart | 0.601 | 0.479 | 0.342 | 0.345 | 0.344 | 0.329 | |
Mushrooms | 0.350 | 0.232 | 0.433 | 0.402 | 0.399 | 0.398 | ||
Protein | 0.467 | 0.289 | 0.393 | 0.350 | 0.352 | 0.349 | ||
平均RMSE | 0.473 | 0.388 | 0.365 | 0.365 | 0.364 | 0.328 | ||
序号平均值 | 5.667 | 2.667 | 5.000 | 4.333 | 5.000 | 3.667 | 1.667 | |
Overall RMSE | 0.381 | 0.322 | 0.313 | 0.308 | 0.306 | 0.277 | ||
Overall Rank | 5.889 | 4.222 | 4.222 | 4.222 | 3.667 | 3.667 | 2.111 |
Tab. 7 RMSE experimental results of different BN variant classifiers
数据集 | BN_NB | BN_SA | CFWNB | WATAN | BN_CI | BN_ICS | BCGA | |
---|---|---|---|---|---|---|---|---|
肿瘤数据集 | Leukemia | 0.483 | 0.505 | 0.411 | 0.428 | 0.406 | 0.370 | |
Colon | 0.403 | 0.372 | 0.295 | 0.332 | 0.324 | 0.341 | ||
SRBCT | 0.069 | 0.027 | 0.018 | 0.016 | 0.012 | 0.033 | ||
Brain | 0.302 | 0.276 | 0.242 | 0.271 | 0.230 | 0.250 | ||
Breast | 0.280 | 0.359 | 0.320 | 0.320 | 0.320 | 0.249 | ||
Duke_bc | 0.492 | 0.453 | 0.315 | 0.308 | 0.325 | 0.330 | ||
平均RMSE | 0.335 | 0.317 | 0.275 | 0.280 | 0.277 | 0.252 | ||
序号平均值 | 6.000 | 5.000 | 3.833 | 4.167 | 3.000 | 3.667 | 2.333 | |
非肿瘤数据集 | Heart | 0.601 | 0.479 | 0.342 | 0.345 | 0.344 | 0.329 | |
Mushrooms | 0.350 | 0.232 | 0.433 | 0.402 | 0.399 | 0.398 | ||
Protein | 0.467 | 0.289 | 0.393 | 0.350 | 0.352 | 0.349 | ||
平均RMSE | 0.473 | 0.388 | 0.365 | 0.365 | 0.364 | 0.328 | ||
序号平均值 | 5.667 | 2.667 | 5.000 | 4.333 | 5.000 | 3.667 | 1.667 | |
Overall RMSE | 0.381 | 0.322 | 0.313 | 0.308 | 0.306 | 0.277 | ||
Overall Rank | 5.889 | 4.222 | 4.222 | 4.222 | 3.667 | 3.667 | 2.111 |
数据集 | LR | SVM | KNN | OB | LB | RF | BCGA | |
---|---|---|---|---|---|---|---|---|
肿瘤数据集 | Leukemia | 0.767 | 0.744 | 0.802 | 0.754 | 0.807 | 0.813 | |
Colon | 0.757 | 0.790 | 0.827 | 0.780 | 0.780 | 0.748 | ||
SRBCT | 0.983 | 0.999 | 0.998 | 0.998 | 0.998 | |||
Brain | 0.818 | 0.811 | 0.831 | 0.858 | 0.845 | 0.858 | ||
Breast | 0.898 | 0.750 | 0.726 | 0.717 | 0.717 | 0.749 | ||
Duke_bc | 0.758 | 0.795 | 0.850 | 0.850 | 0.844 | 0.864 | ||
平均准确率 | 0.830 | 0.834 | 0.825 | 0.835 | 0.828 | 0.848 | ||
序号平均值 | 5.333 | 4.333 | 3.333 | 4.583 | 3.917 | 4.500 | 2.000 | |
非肿瘤数据集 | Heart | 0.640 | 0.769 | 0.827 | 0.840 | 0.840 | 0.844 | |
Mushrooms | 0.988 | 0.690 | 0.774 | 0.734 | 0.737 | 0.771 | ||
Protein | 0.673 | 0.666 | 0.630 | 0.725 | 0.721 | 0.735 | ||
平均准确率 | 0.730 | 0.808 | 0.720 | 0.775 | 0.765 | 0.768 | ||
序号平均值 | 4.667 | 4.333 | 5.333 | 3.667 | 4.500 | 3.500 | 2.000 | |
Overall Accuracy | 0.797 | 0.825 | 0.804 | 0.809 | 0.812 | 0.808 | 0.826 | |
Overall Rank | 5.111 | 4.333 | 4.000 | 4.278 | 4.111 | 4.167 | 2.000 |
Tab. 8 Accuracy experimental results of different BN variant classifiers
数据集 | LR | SVM | KNN | OB | LB | RF | BCGA | |
---|---|---|---|---|---|---|---|---|
肿瘤数据集 | Leukemia | 0.767 | 0.744 | 0.802 | 0.754 | 0.807 | 0.813 | |
Colon | 0.757 | 0.790 | 0.827 | 0.780 | 0.780 | 0.748 | ||
SRBCT | 0.983 | 0.999 | 0.998 | 0.998 | 0.998 | |||
Brain | 0.818 | 0.811 | 0.831 | 0.858 | 0.845 | 0.858 | ||
Breast | 0.898 | 0.750 | 0.726 | 0.717 | 0.717 | 0.749 | ||
Duke_bc | 0.758 | 0.795 | 0.850 | 0.850 | 0.844 | 0.864 | ||
平均准确率 | 0.830 | 0.834 | 0.825 | 0.835 | 0.828 | 0.848 | ||
序号平均值 | 5.333 | 4.333 | 3.333 | 4.583 | 3.917 | 4.500 | 2.000 | |
非肿瘤数据集 | Heart | 0.640 | 0.769 | 0.827 | 0.840 | 0.840 | 0.844 | |
Mushrooms | 0.988 | 0.690 | 0.774 | 0.734 | 0.737 | 0.771 | ||
Protein | 0.673 | 0.666 | 0.630 | 0.725 | 0.721 | 0.735 | ||
平均准确率 | 0.730 | 0.808 | 0.720 | 0.775 | 0.765 | 0.768 | ||
序号平均值 | 4.667 | 4.333 | 5.333 | 3.667 | 4.500 | 3.500 | 2.000 | |
Overall Accuracy | 0.797 | 0.825 | 0.804 | 0.809 | 0.812 | 0.808 | 0.826 | |
Overall Rank | 5.111 | 4.333 | 4.000 | 4.278 | 4.111 | 4.167 | 2.000 |
数据集 | LR | SVM | KNN | OB | LB | RF | BCGA |
---|---|---|---|---|---|---|---|
W/D/L | 5/0/1 | 4/1/1 | 4/1/1 | 5/1/0 | 4/2/0 | 5/1/0 | — |
Leukemia | 0.767- | 0.737- | 0.993+ | 0.731- | 0.793- | 0.801- | 0.927 |
Colon | 0.535- | 0.807- | 0.780- | 0.808- | 0.789- | 0.791- | 0.904 |
Breast | 0.810- | 0.931= | 0.855- | 0.871- | 0.856- | 0.889- | 0.979 |
Duke_bc | 0.685- | 0.795- | 0.853- | 0.806- | 0.806- | 0.801- | 0.907 |
Heart | 0.448- | 0.755- | 0.829= | 0.804- | 0.827= | 0.819- | 0.864 |
Mushrooms | 0.871+ | 0.988+ | 0.675- | 0.733= | 0.683= | 0.687= | 0.719 |
Tab. 9 F1-Score results of different algorithms on 6 datasets
数据集 | LR | SVM | KNN | OB | LB | RF | BCGA |
---|---|---|---|---|---|---|---|
W/D/L | 5/0/1 | 4/1/1 | 4/1/1 | 5/1/0 | 4/2/0 | 5/1/0 | — |
Leukemia | 0.767- | 0.737- | 0.993+ | 0.731- | 0.793- | 0.801- | 0.927 |
Colon | 0.535- | 0.807- | 0.780- | 0.808- | 0.789- | 0.791- | 0.904 |
Breast | 0.810- | 0.931= | 0.855- | 0.871- | 0.856- | 0.889- | 0.979 |
Duke_bc | 0.685- | 0.795- | 0.853- | 0.806- | 0.806- | 0.801- | 0.907 |
Heart | 0.448- | 0.755- | 0.829= | 0.804- | 0.827= | 0.819- | 0.864 |
Mushrooms | 0.871+ | 0.988+ | 0.675- | 0.733= | 0.683= | 0.687= | 0.719 |
数据集 | LR | SVM | KNN | OB | LB | RF | BCGA |
---|---|---|---|---|---|---|---|
W/D/L | 5/1/0 | 4/1/1 | 4/1/1 | 1/5/0 | 0/6/0 | 0/6/0 | — |
Leukemia | 0.524- | 0.722- | 0.852- | 0.812- | 0.920= | 0.911= | 0.919 |
Colon | 0.811- | 0.780- | 0.873= | 0.893= | 0.894= | 0.896= | 0.895 |
Breast | 0.687- | 0.968= | 0.929- | 0.958= | 0.947= | 0.968= | 0.972 |
Duke_bc | 0.764- | 0.716- | 0.917= | 0.920= | 0.922= | 0.920= | 0.920 |
Heart | 0.835- | 0.752- | 0.928+ | 0.882= | 0.904= | 0.901= | 0.899 |
Mushrooms | 0.829= | 1.000+ | 0.768- | 0.823= | 0.837= | 0.838= | 0.822 |
Tab. 10 AUC results of different algorithms on 6 datasets
数据集 | LR | SVM | KNN | OB | LB | RF | BCGA |
---|---|---|---|---|---|---|---|
W/D/L | 5/1/0 | 4/1/1 | 4/1/1 | 1/5/0 | 0/6/0 | 0/6/0 | — |
Leukemia | 0.524- | 0.722- | 0.852- | 0.812- | 0.920= | 0.911= | 0.919 |
Colon | 0.811- | 0.780- | 0.873= | 0.893= | 0.894= | 0.896= | 0.895 |
Breast | 0.687- | 0.968= | 0.929- | 0.958= | 0.947= | 0.968= | 0.972 |
Duke_bc | 0.764- | 0.716- | 0.917= | 0.920= | 0.922= | 0.920= | 0.920 |
Heart | 0.835- | 0.752- | 0.928+ | 0.882= | 0.904= | 0.901= | 0.899 |
Mushrooms | 0.829= | 1.000+ | 0.768- | 0.823= | 0.837= | 0.838= | 0.822 |
数据集 | 算法 | 准确率 | RMSE | F1-Score | AUC |
---|---|---|---|---|---|
Leukemia | BN | 0.714 | 0.483 | 0.753 | 0.802 |
BN+GA | 0.765 | 0.462 | 0.847 | 0.909 | |
BN+GCPT | 0.805 | 0.390 | 0.901 | 0.903 | |
BCGA | 0.813 | 0.370 | 0.928 | 0.919 | |
Colon | BN | 0.693 | 0.403 | 0.802 | 0.821 |
BN+GA | 0.768 | 0.377 | 0.901 | 0.854 | |
BN+GCPT | 0.796 | 0.380 | 0.899 | 0.878 | |
BCGA | 0.804 | 0.317 | 0.905 | 0.896 | |
Breast | BN | 0.663 | 0.261 | 0.843 | 0.843 |
BN+GA | 0.632 | 0.283 | 0.881 | 0.880 | |
BN+GCPT | 0.692 | 0.248 | 0.839 | 0.936 | |
BCGA | 0.749 | 0.249 | 0.979 | 0.972 | |
Heart | BN | 0.716 | 0.601 | 0.771 | 0.801 |
BN+GA | 0.716 | 0.599 | 0.773 | 0.810 | |
BN+GCPT | 0.808 | 0.480 | 0.818 | 0.856 | |
BCGA | 0.844 | 0.329 | 0.864 | 0.900 | |
Mushrooms | BN | 0.642 | 0.350 | 0.611 | 0.771 |
BN+GA | 0.650 | 0.345 | 0.613 | 0.781 | |
BN+GCPT | 0.712 | 0.325 | 0.736 | 0.821 | |
BCGA | 0.771 | 0.316 | 0.719 | 0.823 |
Tab. 11 BCGA ablation experimental results on 5 datasets
数据集 | 算法 | 准确率 | RMSE | F1-Score | AUC |
---|---|---|---|---|---|
Leukemia | BN | 0.714 | 0.483 | 0.753 | 0.802 |
BN+GA | 0.765 | 0.462 | 0.847 | 0.909 | |
BN+GCPT | 0.805 | 0.390 | 0.901 | 0.903 | |
BCGA | 0.813 | 0.370 | 0.928 | 0.919 | |
Colon | BN | 0.693 | 0.403 | 0.802 | 0.821 |
BN+GA | 0.768 | 0.377 | 0.901 | 0.854 | |
BN+GCPT | 0.796 | 0.380 | 0.899 | 0.878 | |
BCGA | 0.804 | 0.317 | 0.905 | 0.896 | |
Breast | BN | 0.663 | 0.261 | 0.843 | 0.843 |
BN+GA | 0.632 | 0.283 | 0.881 | 0.880 | |
BN+GCPT | 0.692 | 0.248 | 0.839 | 0.936 | |
BCGA | 0.749 | 0.249 | 0.979 | 0.972 | |
Heart | BN | 0.716 | 0.601 | 0.771 | 0.801 |
BN+GA | 0.716 | 0.599 | 0.773 | 0.810 | |
BN+GCPT | 0.808 | 0.480 | 0.818 | 0.856 | |
BCGA | 0.844 | 0.329 | 0.864 | 0.900 | |
Mushrooms | BN | 0.642 | 0.350 | 0.611 | 0.771 |
BN+GA | 0.650 | 0.345 | 0.613 | 0.781 | |
BN+GCPT | 0.712 | 0.325 | 0.736 | 0.821 | |
BCGA | 0.771 | 0.316 | 0.719 | 0.823 |
数据集 | 噪声比例/% | GB | GA |
---|---|---|---|
Leukemia | 0 | 0.714 | 0.765 |
10 | 0.643 | 0.726 | |
20 | 0.572 | 0.688 | |
30 | 0.500 | 0.650 | |
Colon | 0 | 0.693 | 0.768 |
10 | 0.623 | 0.729 | |
20 | 0.554 | 0.691 | |
30 | 0.485 | 0.653 | |
Breast | 0 | 0.663 | 0.632 |
10 | 0.597 | 0.600 | |
20 | 0.531 | 0.569 | |
30 | 0.465 | 0.536 | |
Heart | 0 | 0.716 | 0.716 |
10 | 0.645 | 0.680 | |
20 | 0.573 | 0.644 | |
30 | 0.522 | 0.593 | |
Mushrooms | 0 | 0.642 | 0.650 |
10 | 0.578 | 0.618 | |
20 | 0.534 | 0.585 | |
30 | 0.514 | 0.552 |
Tab. 12 Accuracies after adding noise to 5 datasets
数据集 | 噪声比例/% | GB | GA |
---|---|---|---|
Leukemia | 0 | 0.714 | 0.765 |
10 | 0.643 | 0.726 | |
20 | 0.572 | 0.688 | |
30 | 0.500 | 0.650 | |
Colon | 0 | 0.693 | 0.768 |
10 | 0.623 | 0.729 | |
20 | 0.554 | 0.691 | |
30 | 0.485 | 0.653 | |
Breast | 0 | 0.663 | 0.632 |
10 | 0.597 | 0.600 | |
20 | 0.531 | 0.569 | |
30 | 0.465 | 0.536 | |
Heart | 0 | 0.716 | 0.716 |
10 | 0.645 | 0.680 | |
20 | 0.573 | 0.644 | |
30 | 0.522 | 0.593 | |
Mushrooms | 0 | 0.642 | 0.650 |
10 | 0.578 | 0.618 | |
20 | 0.534 | 0.585 | |
30 | 0.514 | 0.552 |
数据集 | 网络总边数 | 算法 | 正确边数 | 正确边占比 |
---|---|---|---|---|
ALARM | 46 | GB | 22 | 0.48 |
GSB | 39 | 0.85 | ||
mildew | 46 | GB | 28 | 0.61 |
GSB | 38 | 0.83 | ||
gene1 | 132 | GB | 65 | 0.49 |
GSB | 97 | 0.73 | ||
gene2 | 272 | GB | 132 | 0.49 |
GSB | 193 | 0.71 |
Tab. 13 Comparison of GB and GSB algorithms on 4 datasets
数据集 | 网络总边数 | 算法 | 正确边数 | 正确边占比 |
---|---|---|---|---|
ALARM | 46 | GB | 22 | 0.48 |
GSB | 39 | 0.85 | ||
mildew | 46 | GB | 28 | 0.61 |
GSB | 38 | 0.83 | ||
gene1 | 132 | GB | 65 | 0.49 |
GSB | 97 | 0.73 | ||
gene2 | 272 | GB | 132 | 0.49 |
GSB | 193 | 0.71 |
1 | YANG X, WANG Z, SUN J, et al. Unlabeled data driven cost-sensitive inverse projection sparse representation-based classification with 1/2 regularization[J]. SCIENCE CHINA Information Sciences, 2022, 65(8): No.182102. |
2 | 姜涛,李战怀.基因数据表达中的局部模式挖掘研究综述[J]. 计算机研究与发展,2018,55(11):2343-2360. |
JIANG T, LI Z H. A survey on local pattern mining in gene expression data[J]. Journal of Computer Research and Development, 2018, 55(11): 2343-2360. | |
3 | CHADDAD A, LU Q, LI J, et al. Explainable, domain-adaptive, and federated artificial intelligence in medicine[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(4): 859-876. |
4 | PAN H, ZHANG Y, CHENG J, et al. Symplectic geometry transformation-based periodic segment method: algorithm and applications[J]. IEEE Transactions on Instrumentation and Measurement, 2023, 72: No.6502708. |
5 | 樊航旗.基于集成学习的贝叶斯因果森林研究[D].长春:吉林大学,2022:7-20. |
FAN H Q. Research of Bayesian causal forest based on ensemble learning[D]. Changchun: Jilin University, 2022: 7-20. | |
6 | SHEN T, ZHANG J, JIA X, et al. Federated mutual learning: a collaborative machine learning method for heterogeneous data, models, and objectives[J]. Frontiers of Information Technology and Electronic Engineering, 2023, 24(10): 1390-1402. |
7 | JIANG L, ZHANG L, YU L, et al. Class-specific attribute weighted naive Bayes[J]. Pattern Recognition, 2019, 88: 321-330. |
8 | YAO H, SHE J, ZHOU Y. Risk assessment of construction safety accidents based on association rule mining and Bayesian network[J]. Journal of Intelligent Construction, 2024, 2: No.9180015. |
9 | MA Y, ZHANG W, DU M, et al. Hierarchical Bayesian causality network to extract high-level semantic information in visual cortex[J]. International Journal of Neural Systems, 2024, 34(1): No.2450002. |
10 | WANG L, QI S, LIU Y, et al. Bagging k-dependence Bayesian network classifiers[J]. Intelligent Data Analysis, 2021, 25(3): 641-667. |
11 | LIU Y, WANG L, MAMMADOV M. Learning semi-lazy Bayesian network classifier under the c.i.i.d assumption[J]. Knowledge-Based Systems, 2020, 208: No.106422. |
12 | HECKERMAN D. A Bayesian approach to learning causal networks[C]// Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers Inc., 1995: 285-295. |
13 | KONG H, SHI X, WANG L, et al. Averaged tree-augmented one-dependence estimators[J]. Applied Intelligence, 2021, 51(7): 4270-4286. |
14 | 鱼亮,任姝洁.基于网络和基因差异表达信息的癌症致病基因预测[J]. 中国科学:生命科学,2023,53(1):94-108. |
YU L, REN S J. Prediction of cancerous pathogenic genes based on network and gene differential expression information[J]. SCIENTIA SINICA Vitae, 2023, 53(1): 94-108. | |
15 | LI X, CHANG M, WANG L. Information recognition of pathogenic modules in gene statistics of big data[J]. Nanomaterials and Energy, 2021, 10(1): 35-42. |
16 | WANG C, ZHANG J, WANG X, et al. Pathogenic gene prediction algorithm based on heterogeneous information fusion[J]. Frontiers in Genetics, 2020, 11: No.5. |
17 | ZHU J F, HAO Z K, LIU Q, et al. Towards exploring large molecular space: an efficient chemical genetic algorithm[J]. Journal of Computer Science and Technology, 2022, 37(6): 1464-1477. |
18 | WANG L, CHEN P, CHEN S, et al. A novel approach to fully representing the diversity in conditional dependencies for learning Bayesian network classifier[J]. Intelligent Data Analysis, 2021, 25(11): 35-55. |
19 | SUN J, TAYLOR D, BOLLT E M. Causal network inference by optimal causation entropy[J]. SIAM Journal on Applied Dynamical Systems, 2015, 14(3): 73-106. |
20 | DUAN Z, WANG L, CHEN S, et al. Instance-based weighting filter for superparent one-dependence estimators[J]. Knowledge-Based Systems, 2020, 203: No.106085. |
21 | JIANG L, ZHANG L, LI C, et al. A correlation-based feature weighting filter for naive Bayes[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(2): 201-213. |
22 | JIANG L, CAI Z, WANG D, et al. Improving tree augmented naive Bayes for class probability estimation[J]. Knowledge-Based Systems, 2012, 26: 239-245. |
23 | CABUZ S, ABREU G. Causal inference for multivariate stochastic process prediction[J]. Information Sciences, 2018, 448/449: 134-148. |
[1] | GAO Huiyun, LU Huijuan, YAN Ke, YE Minchao. Selective ensemble algorithm for gene expression data based on diversity and accuracy of weighted harmonic average measure [J]. Journal of Computer Applications, 2018, 38(5): 1512-1516. |
[2] | WANG Lili, FU Zhongliang, TAO Pan, HU Xin. Heart disease classification based on active imbalance multi-class AdaBoost algorithm [J]. Journal of Computer Applications, 2017, 37(7): 1994-1998. |
[3] | HUANG Xiaojuan, ZHANG Li. Modified multi-class support vector machine recursive feature elimination for cancer multi-classification [J]. Journal of Computer Applications, 2015, 35(10): 2798-2802. |
[4] | ZHAO Yu-hai, WANG Guo-ren,YIN Ying. Non-parameter clustering method for gene expression data [J]. Journal of Computer Applications, 2005, 25(06): 1388-1391. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||