《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (6): 1784-1792.DOI: 10.11772/j.issn.1001-9081.2024060902
• 第十二届CCF大数据学术会议 • 上一篇
姜超英1,2, 李倩1,2, 刘宁1,2(), 刘磊3, 崔立真1,2
收稿日期:
2024-07-01
修回日期:
2024-07-16
接受日期:
2024-07-22
发布日期:
2024-08-19
出版日期:
2025-06-10
通讯作者:
刘宁
作者简介:
姜超英(2001—),女,山东威海人,硕士研究生,CCF会员,主要研究方向:数据挖掘基金资助:
Chaoying JIANG1,2, Qian LI1,2, Ning LIU1,2(), Lei LIU3, Lizhen CUI1,2
Received:
2024-07-01
Revised:
2024-07-16
Accepted:
2024-07-22
Online:
2024-08-19
Published:
2025-06-10
Contact:
Ning LIU
About author:
JIANG Chaoying, born in 2001, M. S. candidate. Her research interests include data mining.Supported by:
摘要:
针对疾病间的共同作用与再入院情况的关系的挖掘不足以及相关模型泛化能力较弱的问题,提出一种基于图对比学习的再入院预测模型HealthGraph。首先,利用数据集中的疾病共现信息构建疾病编码图,以充分挖掘疾病之间的关联信息;其次,提出一种以图对比学习的思想为指导的患者数据增强方法,通过图采样器自适应地捕捉与任务相关的拓扑结构,构造新视图,提升数据丰富度,从而提高模型的泛化性能;最后,结合初始疾病编码图嵌入和新视图嵌入进行再入院预测。在真实数据集MIMIC-Ⅲ上构建呼吸系统疾病和循环系统疾病这2个数据集并进行大量实验。结果表明,相较于反转时间注意力模型(RETAIN)和阶段感知神经网络模型(StageNet),所提模型在准确率和F1指标上提升了1个百分点左右。此外,2组消融实验结果验证了所提模型在提高再入院预测的准确性和泛化性中的有效性。
中图分类号:
姜超英, 李倩, 刘宁, 刘磊, 崔立真. 基于图对比学习的再入院预测模型[J]. 计算机应用, 2025, 45(6): 1784-1792.
Chaoying JIANG, Qian LI, Ning LIU, Lei LIU, Lizhen CUI. Readmission prediction model based on graph contrastive learning[J]. Journal of Computer Applications, 2025, 45(6): 1784-1792.
符号 | 定义 |
---|---|
特定患者就诊序列 | |
患者第 | |
患者是否会再次入院 | |
预测患者是否会再次入院 | |
节点特征矩阵 | |
邻接矩阵 | |
疾病编码图表示(c表示患者,g表示图的嵌入) | |
多热向量(点采样中决定节点是否删除) | |
多热向量(边采样中决定边是否删除), i、 j表示疾病 | |
节点嵌入表示(v表示节点的嵌入) | |
节点增强视图 | |
边增强视图(e表示边的嵌入) | |
节点增强视图图嵌入 | |
边增强视图图嵌入 | |
患者此次入院表征向量 |
表1 重要符号及其定义
Tab. 1 Important symbols and their definitions
符号 | 定义 |
---|---|
特定患者就诊序列 | |
患者第 | |
患者是否会再次入院 | |
预测患者是否会再次入院 | |
节点特征矩阵 | |
邻接矩阵 | |
疾病编码图表示(c表示患者,g表示图的嵌入) | |
多热向量(点采样中决定节点是否删除) | |
多热向量(边采样中决定边是否删除), i、 j表示疾病 | |
节点嵌入表示(v表示节点的嵌入) | |
节点增强视图 | |
边增强视图(e表示边的嵌入) | |
节点增强视图图嵌入 | |
边增强视图图嵌入 | |
患者此次入院表征向量 |
实验组 | 图数 | 平均节点数 | 平均边数 |
---|---|---|---|
呼吸系统疾病 | 17 679 | 14.42 | 93.27 |
循环系统疾病 | 26 559 | 13.48 | 79.33 |
表2 疾病编码图的数据信息
Tab. 2 Data information of disease code maps
实验组 | 图数 | 平均节点数 | 平均边数 |
---|---|---|---|
呼吸系统疾病 | 17 679 | 14.42 | 93.27 |
循环系统疾病 | 26 559 | 13.48 | 79.33 |
模型 | 针对呼吸系统疾病 | 针对循环系统疾病 | ||||
---|---|---|---|---|---|---|
AUC | ACC | F1 macro | AUC | ACC | F1 macro | |
LR | 0.723 2 | 0.695 2 | 0.687 8 | 0.719 0 | 0.589 9 | |
KNN | 0.611 7 | 0.512 4 | 0.511 3 | 0.609 8 | 0.608 5 | 0.568 1 |
DT | 0.606 8 | 0.646 4 | 0.608 1 | 0.573 4 | 0.660 8 | 0.574 5 |
RF | 0.684 5 | 0.645 0 | 0.392 1 | 0.670 2 | 0.717 3 | 0.417 7 |
DNN | 0.533 5 | 0.645 0 | 0.392 1 | 0.501 5 | 0.717 3 | 0.417 7 |
RETAIN | 0.720 3 | 0.686 0 | 0.636 7 | 0.687 6 | ||
StageNet | 0.731 1 | 0.635 2 | 0.717 5 | 0.615 0 | ||
HealthGraph | 0.712 8 | 0.655 8 | 0.705 7 | 0.735 5 | 0.626 5 |
表3 不同模型在再入院预测任务上的对比实验结果
Tab. 3 Comparative experimental results of different models on readmission prediction task
模型 | 针对呼吸系统疾病 | 针对循环系统疾病 | ||||
---|---|---|---|---|---|---|
AUC | ACC | F1 macro | AUC | ACC | F1 macro | |
LR | 0.723 2 | 0.695 2 | 0.687 8 | 0.719 0 | 0.589 9 | |
KNN | 0.611 7 | 0.512 4 | 0.511 3 | 0.609 8 | 0.608 5 | 0.568 1 |
DT | 0.606 8 | 0.646 4 | 0.608 1 | 0.573 4 | 0.660 8 | 0.574 5 |
RF | 0.684 5 | 0.645 0 | 0.392 1 | 0.670 2 | 0.717 3 | 0.417 7 |
DNN | 0.533 5 | 0.645 0 | 0.392 1 | 0.501 5 | 0.717 3 | 0.417 7 |
RETAIN | 0.720 3 | 0.686 0 | 0.636 7 | 0.687 6 | ||
StageNet | 0.731 1 | 0.635 2 | 0.717 5 | 0.615 0 | ||
HealthGraph | 0.712 8 | 0.655 8 | 0.705 7 | 0.735 5 | 0.626 5 |
实验类型 | 模型组成部分 | 针对呼吸系统疾病的再入院预测 | 针对循环系统疾病的再入院预测 | ||||
---|---|---|---|---|---|---|---|
AUC | ACC | F1 macro | AUC | ACC | F1 macro | ||
图采样模块 | 仅节点采样 | 0.716 7 | 0.701 9 | 0.637 0 | 0.698 3 | 0.731 8 | |
仅边采样 | 0.619 4 | ||||||
组合采样 | 0.729 1 | 0.712 8 | 0.655 8 | 0.705 7 | 0.735 5 | 0.626 5 | |
损失函数 | 无对比损失 | ||||||
完整损失 | 0.729 1 | 0.712 8 | 0.655 8 | 0.705 7 | 0.735 5 | 0.626 5 |
表4 不同模型组成部分对再入院预测任务的消融实验结果
Tab.4 Ablation experiment results of different model components on readmission prediction task
实验类型 | 模型组成部分 | 针对呼吸系统疾病的再入院预测 | 针对循环系统疾病的再入院预测 | ||||
---|---|---|---|---|---|---|---|
AUC | ACC | F1 macro | AUC | ACC | F1 macro | ||
图采样模块 | 仅节点采样 | 0.716 7 | 0.701 9 | 0.637 0 | 0.698 3 | 0.731 8 | |
仅边采样 | 0.619 4 | ||||||
组合采样 | 0.729 1 | 0.712 8 | 0.655 8 | 0.705 7 | 0.735 5 | 0.626 5 | |
损失函数 | 无对比损失 | ||||||
完整损失 | 0.729 1 | 0.712 8 | 0.655 8 | 0.705 7 | 0.735 5 | 0.626 5 |
1 | JENCKS S F, WILLIAMS M V, COLEMAN E A. Rehospitalizations among patients in the Medicare fee-for-service program[J]. The New England Journal of Medicine, 2009, 360(14): 1418-1428. |
2 | BENJAMIN E J, MUNTNER P, ALONSO A, et al. Heart disease and stroke statistics — 2019 update: a report from the American Heart Association[J]. Circulation, 2019, 139(10): e56-e528. |
3 | TU K, MITIKU T F, IVERS N M, et al. Evaluation of Electronic Medical Record Administrative data Linked Database (EMRALD)[J]. The American Journal of Managed Care, 2014, 20(1): e15-e21. |
4 | GOLDMAN L E, SARKAR U, KESSELL E, et al. Support from hospital to home for elders: a randomized trial[J]. Annals of Internal Medicine, 2014, 161(7): 472-481. |
5 | MAHAJAN S M, HEIDENREICH P, ABBOTT B, et al. Predictive models for identifying risk of readmission after index hospitalization for heart failure: a systematic review[J]. European Journal of Cardiovascular Nursing, 2018, 17(8): 675-689. |
6 | LIU J, LIU P, LEI M R, et al. Readmission risk prediction model for patients with chronic heart failure: a systematic review and meta-analysis [J]. Iranian Journal of Public Health, 2022, 51(7): 1481-1493. |
7 | ZHANG W, CHENG W, FUJIWARA K, et al. Predictive modeling for hospital readmissions for patients with heart disease: an updated review from 2012—2023 [J]. IEEE Journal of Biomedical and Health Informatics, 2024, 28(4): 2259-2269. |
8 | GOLAS S B, SHIBAHARA T, AGBOOLA S, et al. A machine learning model to predict the risk of 30-day readmissions in patients with heart failure: a retrospective analysis of electronic medical records data[J]. BMC Medical Informatics and Decision Making, 2018, 18: No.44. |
9 | KANSAGARA D, ENGLANDER H, SALANITRO A, et al. Risk prediction models for hospital readmission: a systematic review[J]. Journal of the American Medical Association, 2011, 306(15):1688-1698. |
10 | GOLDSTEIN B A, NAVAR A M, PENCINA M J, et al. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review[J]. Journal of the American Medical Informatics Association, 2017, 24(1): 198-208. |
11 | YAO L, MAO C, LUO Y. Graph convolutional networks for text classification[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019: 7370-7377. |
12 | EAPEN Z J, LIANG L, FONAROW G C, et al. Validated, electronic health record deployable prediction models for assessing patient risk of 30-day rehospitalization and mortality in older heart failure patients [J]. JACC: Heart Failure, 2013, 1(3): 245-251. |
13 | SCHAEFER G, EL-KAREH R, QUARTAROLO J, et al. Evaluation of the Yale New Haven Readmission Risk Score for pneumonia in a general hospital population[J]. The American Journal of Medicine, 2017, 130(9): 1107-1111.e1. |
14 | AU A G, McALISTER F A, BAKAL J A, et al. Predicting the risk of unplanned readmission or death within 30 days of discharge after a heart failure hospitalization[J]. American Heart Journal, 2012, 164(3): 365-372. |
15 | LENZI J, AVALDI V M, HERNANDEZ-BOUSSARD T, et al. Risk-adjustment models for heart failure patients’ 30-day mortality and readmission rates: the incremental value of clinical data abstracted from medical charts beyond hospital discharge record[J]. BMC Health Services Research, 2016, 16: No.473. |
16 | McADAMS-DeMARCO M A, LAW A, SALTER M L, et al. Frailty and early hospital readmission after kidney transplantation[J]. American Journal of Transplantation, 2013, 13(8): 2091-2095. |
17 | BROWN J R, ALONSO A, MAZIMBA S, et al. Improved 30 day heart failure rehospitalization prediction through the addition of device-measured parameters[J]. ESC Heart Failure, 2020, 7(6): 3762-3771. |
18 | JUNQUEIRA A R B, MIRZA F, BAIG M M. A machine learning model for predicting ICU readmissions and key risk factors: analysis from a longitudinal health records[J]. Health and Technology, 2019, 9(3): 297-309. |
19 | BEN-ASSULI O, HEART T, KLEMPFNER R, et al. Human-machine collaboration for feature selection and integration to improve congestive Heart failure risk prediction[J]. Decision Support Systems, 2023, 172: No.113982. |
20 | AMARASINGHAM R, PATEL P C, TOTO K, et al. Allocating scarce resources in real-time to reduce heart failure readmissions: a prospective, controlled study[J]. BMJ Quality and Safety, 2013, 22(12): 998-1005. |
21 | AWAN S E, BENNAMOUN M, SOHEL F, et al. Feature selection and transformation by machine learning reduce variable numbers and improve prediction for heart failure readmission or death [J]. PLoS ONE, 2019, 14(6): No.e0218760. |
22 | POLO FRIZ H, ESPOSITO V, MARANO G, et al. Machine learning and LACE index for predicting 30-day readmissions after heart failure hospitalization in elderly patients[J]. Internal and Emergency Medicine, 2022, 17(6): 1727-1737. |
23 | SHARMA V, KULKARNI V, MCALISTER F, et al. Predicting 30-day readmissions in patients with heart failure using administrative data: a machine learning approach [J]. Journal of Cardiac Failure, 2022, 28(5): 710-722. |
24 | FRIZZELL J D, LIANG L, SCHULTE P J, et al. Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches [J]. JAMA Cardiology, 2017, 2(2): 204-209. |
25 | AWAN S E, BENNAMOUN M, SOHEL F, et al. Machine learning-based prediction of heart failure readmission or death: implications of choosing the right model and the right metrics[J]. ESC Heart Failure, 2019, 6(2): 428-435. |
26 | SONG X, TONG Y, LUO Y, et al. Predicting 7-day unplanned readmission in elderly patients with coronary heart disease using machine learning[J]. Frontiers in Cardiovascular Medicine, 2023, 10: No.1190038. |
27 | BROWN J R, RICKET I M, REEVES R M, et al. Information extraction from electronic health records to predict readmission following acute myocardial infarction: does natural language processing using clinical notes improve prediction of readmission?[J]. Journal of the American Heart Association, 2022, 11(7): No. e024198. |
28 | XU Y, CHU X, YANG K, et al. SeqCare: sequential training with external medical knowledge graph for diagnosis prediction in healthcare data[C]// Proceedings of the ACM Web Conference 2023. New York: ACM, 2023:2819-2830. |
29 | HAMILTON W L, YING R, LESKOVEC J. Inductive representation learning on large graphs[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 1025-1035. |
30 | JANG E, GU S, POOLE B. Categorical reparameterization with Gumbel-Softmax[EB/OL]. [2024-05-13].. |
31 | YOU Y, CHEN T, SUI Y, et al. Graph contrastive learning with augmentations[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 5812-5823. |
32 | HARUTYUNYAN H, KACHATRIAN H, KALE D C, et al. Multitask learning and benchmarking with clinical time series data[J]. Scientific Data, 2019, 6: No.96. |
33 | JOHNSON A E W, POLLARD T J, SHEN L, et al. MIMIC-Ⅲ, a freely accessible critical care database[J]. Scientific Data, 2016, 3: No.160035. |
34 | CHOI E, BAHADORI M T, KULAS J A, et al. RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism[C]// Proceedings of the 30h International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 3512-3520. |
35 | GAO J, XIAO C, WANG Y, et al. StageNet: stage-aware neural networks for health risk prediction[C]// Proceedings of the Web Conference 2020. New York: ACM, 2020: 530-540. |
36 | 曾华堂,柯夏童,伍丽群,等. 人工智能在医疗质量管理中应用现状和效果范围综述[J]. 中国医院管理, 2023, 43(8):21-26. |
ZENG H T, KE X T, WU L Q, et al. Status and effectiveness of artificial intelligence in medical quality management: a scoping review [J]. Chinese Hospital Management, 2023, 43(8):21-26. |
[1] | 李雪莹, 杨琨, 涂国庆, 刘树波. 基于局部增强的时序数据对抗样本生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1573-1581. |
[2] | 游兰, 张雨昂, 刘源, 陈智军, 王伟, 曾星, 何张玮. 基于协作贡献网络的开源项目开发者推荐[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1213-1222. |
[3] | 王聪, 史艳翠. 基于多视角学习的图神经网络群组推荐模型[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1205-1212. |
[4] | 田仁杰, 景明利, 焦龙, 王飞. 基于混合负采样的图对比学习推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1053-1060. |
[5] | 孙海涛, 林佳瑜, 梁祖红, 郭洁. 结合标签混淆的中文文本分类数据增强技术[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1113-1119. |
[6] | 党伟超, 温鑫瑜, 高改梅, 刘春霞. 基于多视图多尺度对比学习的图协同过滤[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1061-1068. |
[7] | 盛坤, 王中卿. 基于大语言模型和数据增强的通感隐喻分析[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 794-800. |
[8] | 孙晨伟, 侯俊利, 刘祥根, 吕建成. 面向工程图纸理解的大语言模型提示生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 801-807. |
[9] | 蔡启健, 谭伟. 语义图增强的多模态推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 421-427. |
[10] | 富坤, 应世聪, 郑婷婷, 屈佳捷, 崔静远, 李建伟. 面向小样本节点分类的图数据增强方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 392-402. |
[11] | 严雪文, 黄章进. 基于对比学习的小样本图像分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 383-391. |
[12] | 马汉达, 吴亚东. 多域时空层次图神经网络的空气质量预测[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 444-452. |
[13] | 赵文博, 马紫彤, 杨哲. 基于有向超图自适应卷积的链接预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 15-23. |
[14] | 张嘉琳, 任庆桦, 毛启容. 利用全局-局部特征依赖的反欺骗说话人验证系统[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 308-317. |
[15] | 余肖生, 王智鑫. 基于多层次图对比学习的序列推荐模型[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 106-114. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||