基于图对比学习的再入院预测模型

doi:10.11772/j.issn.1001-9081.2024060902

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (6): 1784-1792.DOI: 10.11772/j.issn.1001-9081.2024060902

• 第十二届CCF大数据学术会议 • 上一篇

基于图对比学习的再入院预测模型

姜超英¹^,², 李倩¹^,², 刘宁¹^,²(), 刘磊³, 崔立真¹^,²

^1.山东大学软件学院，济南 250101
^2.山东大学山东大学-南洋理工大学人工智能国际联合研究院，济南 250101
^3.山东省工业技术研究院，济南 250100

收稿日期:2024-07-01 修回日期:2024-07-16 接受日期:2024-07-22 发布日期:2024-08-19 出版日期:2025-06-10
通讯作者: 刘宁
作者简介:姜超英（2001—），女，山东威海人，硕士研究生，CCF会员，主要研究方向：数据挖掘
李倩（1993—），女，山东滨州人，助理研究员，博士，CCF会员，主要研究方向：知识图谱表示学习、知识推理、自然语言处理
刘宁（1993—），男，山东日照人，助理研究员，博士，CCF会员，主要研究方向：医疗数据挖掘、知识增强 liun21cs@sdu.edu.cn
刘磊（1981—），男，黑龙江哈尔滨人，教授，博士，CCF会员，主要研究方向：多媒体网络、软件定义网络
崔立真（1976—），男，河北故城人，教授，博士，CCF会员，主要研究方向：大数据智能理论、数据挖掘、智慧科学和医疗健康、大数据AI。
基金资助:
山东省自然科学基金青年基金资助项目(ZR2022QF114)

Readmission prediction model based on graph contrastive learning

Chaoying JIANG¹^,², Qian LI¹^,², Ning LIU¹^,²(), Lei LIU³, Lizhen CUI¹^,²

^1.School of Software，Shandong University，Jinan Shandong 250101 China
^2.Joint SDU-NTU Centre for Artificial Intelligence Research，Shandong University，Jinan Shandong 250101，China
^3.Shandong Research Institute of Industrial Technology，Jinan Shandong 250100，China

Received:2024-07-01 Revised:2024-07-16 Accepted:2024-07-22 Online:2024-08-19 Published:2025-06-10
Contact: Ning LIU
About author:JIANG Chaoying， born in 2001， M. S. candidate. Her research interests include data mining.
LI Qian， born in 1993， Ph. D.， assistant research fellow. Her research interests include knowledge graph representation learning， knowledge reasoning， natural language processing.
LIU Ning， born in 1993， Ph. D.， assistant research fellow. His research interests include medical data mining， knowledge enhancement.
LIU Lei， born in 1981， Ph. D.， professor. His research interests include multimedia network， software defined network.
CUI Lizhen， born in 1976， Ph. D.， professor. His research interests include big data intelligence theory， data mining， smart science and medical health， big data AI.
Supported by:
Youth Fund of Shandong Provincial Natural Science Foundation(ZR2022QF114)

摘要/Abstract

摘要：

针对疾病间的共同作用与再入院情况的关系的挖掘不足以及相关模型泛化能力较弱的问题，提出一种基于图对比学习的再入院预测模型HealthGraph。首先，利用数据集中的疾病共现信息构建疾病编码图，以充分挖掘疾病之间的关联信息；其次，提出一种以图对比学习的思想为指导的患者数据增强方法，通过图采样器自适应地捕捉与任务相关的拓扑结构，构造新视图，提升数据丰富度，从而提高模型的泛化性能；最后，结合初始疾病编码图嵌入和新视图嵌入进行再入院预测。在真实数据集MIMIC-Ⅲ上构建呼吸系统疾病和循环系统疾病这2个数据集并进行大量实验。结果表明，相较于反转时间注意力模型（RETAIN）和阶段感知神经网络模型（StageNet），所提模型在准确率和F1指标上提升了1个百分点左右。此外，2组消融实验结果验证了所提模型在提高再入院预测的准确性和泛化性中的有效性。

关键词: 电子健康记录, 再入院预测, 图对比学习, 数据增强, 图神经网络

Abstract:

In order to solve the problems of the insufficient mining of relationship among inter-disease joint effects and readmission and the weak generalization ability of related models， a readmission prediction model based on graph contrastive learning was proposed， called HealthGraph. Firstly， the disease co-occurrence information in the dataset was used to construct a disease code map， so that the correlation information among diseases was fully explored. Then， a patient data augmentation method was proposed with the guidance of the idea of graph contrastive learning， and the topology related to the task was captured by the graph sampler adaptively， and a new view was constructed to improve the data richness， thereby improving generalization performance of the model. Finally， readmission prediction was carried out by combining the initial disease code map embedding and the new view embedding. The respiratory and circulatory system diseases datasets were constructed on real dataset MIMIC-Ⅲ and extensive experiments were conducted. The results show that compared with REverse Time AttentIoN model （RETAIN） and the Stage-aware neural Network model （StageNet）， the proposed model has the accuracy and F1 indicators improved by about 1 percentage point. In addition， results of two groups of ablation experiments verify the effectiveness of the proposed model in improving the accuracy and generalization of readmission prediction.

Key words: Electronic Health Record (EHR), readmission prediction, graph contrastive learning, data augmentation, graph neural network

中图分类号:

TP391.1

姜超英, 李倩, 刘宁, 刘磊, 崔立真. 基于图对比学习的再入院预测模型[J]. 计算机应用, 2025, 45(6): 1784-1792.

Chaoying JIANG, Qian LI, Ning LIU, Lei LIU, Lizhen CUI. Readmission prediction model based on graph contrastive learning[J]. Journal of Computer Applications, 2025, 45(6): 1784-1792.

图/表 7

图1 患者所患疾病对患者再入院的影响

Fig. 1 Influence of patients’ diseases on patient readmission

图2 HealthGraph框架

Fig. 2 Framework of HealthGraph

表1 重要符号及其定义

Tab. 1 Important symbols and their definitions

符号	定义
$D$	特定患者就诊序列
$C i$	患者第 $i$ 次就诊时的高维医疗编码列表
$y$	患者是否会再次入院
$y'$	预测患者是否会再次入院
$X$	节点特征矩阵
$A$	邻接矩阵
$s g c$	疾病编码图表示（c表示患者，g表示图的嵌入）
$m θ$	多热向量（点采样中决定节点是否删除）
$m i j$	多热向量（边采样中决定边是否删除）， i、 j表示疾病
$s v$	节点嵌入表示（v表示节点的嵌入）
$G v$	节点增强视图
$G e$	边增强视图（e表示边的嵌入）
$g v$	节点增强视图图嵌入
$g e$	边增强视图图嵌入
$s c$	患者此次入院表征向量

表1 重要符号及其定义

Tab. 1 Important symbols and their definitions

符号	定义
$D$	特定患者就诊序列
$C i$	患者第 $i$ 次就诊时的高维医疗编码列表
$y$	患者是否会再次入院
$y'$	预测患者是否会再次入院
$X$	节点特征矩阵
$A$	邻接矩阵
$s g c$	疾病编码图表示（c表示患者，g表示图的嵌入）
$m θ$	多热向量（点采样中决定节点是否删除）
$m i j$	多热向量（边采样中决定边是否删除）， i、 j表示疾病
$s v$	节点嵌入表示（v表示节点的嵌入）
$G v$	节点增强视图
$G e$	边增强视图（e表示边的嵌入）
$g v$	节点增强视图图嵌入
$g e$	边增强视图图嵌入
$s c$	患者此次入院表征向量

表2 疾病编码图的数据信息

Tab. 2 Data information of disease code maps

实验组	图数	平均节点数	平均边数
呼吸系统疾病	17 679	14.42	93.27
循环系统疾病	26 559	13.48	79.33

表3 不同模型在再入院预测任务上的对比实验结果

Tab. 3 Comparative experimental results of different models on readmission prediction task

模型	针对呼吸系统疾病			针对循环系统疾病
模型	AUC	ACC	F1 macro	AUC	ACC	F1 macro
LR	0.723 2	0.695 2	0.643 0	0.687 8	0.719 0	0.589 9
KNN	0.611 7	0.512 4	0.511 3	0.609 8	0.608 5	0.568 1
DT	0.606 8	0.646 4	0.608 1	0.573 4	0.660 8	0.574 5
RF	0.684 5	0.645 0	0.392 1	0.670 2	0.717 3	0.417 7
DNN	0.533 5	0.645 0	0.392 1	0.501 5	0.717 3	0.417 7
RETAIN	0.720 3	0.686 0	0.636 7	0.687 6	0.720 3	0.619 4
StageNet	0.731 1	0.702 5	0.635 2	0.687 9	0.717 5	0.615 0
HealthGraph	0.729 1	0.712 8	0.655 8	0.705 7	0.735 5	0.626 5

表4 不同模型组成部分对再入院预测任务的消融实验结果

Tab.4 Ablation experiment results of different model components on readmission prediction task

实验类型	模型组成部分	针对呼吸系统疾病的再入院预测			针对循环系统疾病的再入院预测
实验类型	模型组成部分	AUC	ACC	F1 macro	AUC	ACC	F1 macro
图采样模块	仅节点采样	0.716 7	0.701 9	0.637 0	0.698 3	0.731 8	0.622 7
	仅边采样	0.727 9	0.710 3	0.652 1	0.705 0	0.733 6	0.619 4
	组合采样	0.729 1	0.712 8	0.655 8	0.705 7	0.735 5	0.626 5
损失函数	无对比损失	0.725 4	0.706 9	0.650 0	0.693 3	0.732 9	0.625 8
损失函数	完整损失	0.729 1	0.712 8	0.655 8	0.705 7	0.735 5	0.626 5

图3 实例分析

Fig. 3 Case analysis

参考文献 36

1	JENCKS S F， WILLIAMS M V， COLEMAN E A. Rehospitalizations among patients in the Medicare fee-for-service program［J］. The New England Journal of Medicine， 2009， 360（14）： 1418-1428.
2	BENJAMIN E J， MUNTNER P， ALONSO A， et al. Heart disease and stroke statistics — 2019 update： a report from the American Heart Association［J］. Circulation， 2019， 139（10）： e56-e528.
3	TU K， MITIKU T F， IVERS N M， et al. Evaluation of Electronic Medical Record Administrative data Linked Database （EMRALD）［J］. The American Journal of Managed Care， 2014， 20（1）： e15-e21.
4	GOLDMAN L E， SARKAR U， KESSELL E， et al. Support from hospital to home for elders： a randomized trial［J］. Annals of Internal Medicine， 2014， 161（7）： 472-481.
5	MAHAJAN S M， HEIDENREICH P， ABBOTT B， et al. Predictive models for identifying risk of readmission after index hospitalization for heart failure： a systematic review［J］. European Journal of Cardiovascular Nursing， 2018， 17（8）： 675-689.
6	LIU J， LIU P， LEI M R， et al. Readmission risk prediction model for patients with chronic heart failure： a systematic review and meta-analysis ［J］. Iranian Journal of Public Health， 2022， 51（7）： 1481-1493.
7	ZHANG W， CHENG W， FUJIWARA K， et al. Predictive modeling for hospital readmissions for patients with heart disease： an updated review from 2012—2023 ［J］. IEEE Journal of Biomedical and Health Informatics， 2024， 28（4）： 2259-2269.
8	GOLAS S B， SHIBAHARA T， AGBOOLA S， et al. A machine learning model to predict the risk of 30-day readmissions in patients with heart failure： a retrospective analysis of electronic medical records data［J］. BMC Medical Informatics and Decision Making， 2018， 18： No.44.
9	KANSAGARA D， ENGLANDER H， SALANITRO A， et al. Risk prediction models for hospital readmission： a systematic review［J］. Journal of the American Medical Association， 2011， 306（15）：1688-1698.
10	GOLDSTEIN B A， NAVAR A M， PENCINA M J， et al. Opportunities and challenges in developing risk prediction models with electronic health records data： a systematic review［J］. Journal of the American Medical Informatics Association， 2017， 24（1）： 198-208.
11	YAO L， MAO C， LUO Y. Graph convolutional networks for text classification［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2019： 7370-7377.
12	EAPEN Z J， LIANG L， FONAROW G C， et al. Validated， electronic health record deployable prediction models for assessing patient risk of 30-day rehospitalization and mortality in older heart failure patients ［J］. JACC： Heart Failure， 2013， 1（3）： 245-251.
13	SCHAEFER G， EL-KAREH R， QUARTAROLO J， et al. Evaluation of the Yale New Haven Readmission Risk Score for pneumonia in a general hospital population［J］. The American Journal of Medicine， 2017， 130（9）： 1107-1111.e1.
14	AU A G， McALISTER F A， BAKAL J A， et al. Predicting the risk of unplanned readmission or death within 30 days of discharge after a heart failure hospitalization［J］. American Heart Journal， 2012， 164（3）： 365-372.
15	LENZI J， AVALDI V M， HERNANDEZ-BOUSSARD T， et al. Risk-adjustment models for heart failure patients’ 30-day mortality and readmission rates： the incremental value of clinical data abstracted from medical charts beyond hospital discharge record［J］. BMC Health Services Research， 2016， 16： No.473.
16	McADAMS-DeMARCO M A， LAW A， SALTER M L， et al. Frailty and early hospital readmission after kidney transplantation［J］. American Journal of Transplantation， 2013， 13（8）： 2091-2095.
17	BROWN J R， ALONSO A， MAZIMBA S， et al. Improved 30 day heart failure rehospitalization prediction through the addition of device-measured parameters［J］. ESC Heart Failure， 2020， 7（6）： 3762-3771.
18	JUNQUEIRA A R B， MIRZA F， BAIG M M. A machine learning model for predicting ICU readmissions and key risk factors： analysis from a longitudinal health records［J］. Health and Technology， 2019， 9（3）： 297-309.
19	BEN-ASSULI O， HEART T， KLEMPFNER R， et al. Human-machine collaboration for feature selection and integration to improve congestive Heart failure risk prediction［J］. Decision Support Systems， 2023， 172： No.113982.
20	AMARASINGHAM R， PATEL P C， TOTO K， et al. Allocating scarce resources in real-time to reduce heart failure readmissions： a prospective， controlled study［J］. BMJ Quality and Safety， 2013， 22（12）： 998-1005.
21	AWAN S E， BENNAMOUN M， SOHEL F， et al. Feature selection and transformation by machine learning reduce variable numbers and improve prediction for heart failure readmission or death ［J］. PLoS ONE， 2019， 14（6）： No.e0218760.
22	POLO FRIZ H， ESPOSITO V， MARANO G， et al. Machine learning and LACE index for predicting 30-day readmissions after heart failure hospitalization in elderly patients［J］. Internal and Emergency Medicine， 2022， 17（6）： 1727-1737.
23	SHARMA V， KULKARNI V， MCALISTER F， et al. Predicting 30-day readmissions in patients with heart failure using administrative data： a machine learning approach ［J］. Journal of Cardiac Failure， 2022， 28（5）： 710-722.
24	FRIZZELL J D， LIANG L， SCHULTE P J， et al. Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure： comparison of machine learning and other statistical approaches ［J］. JAMA Cardiology， 2017， 2（2）： 204-209.
25	AWAN S E， BENNAMOUN M， SOHEL F， et al. Machine learning-based prediction of heart failure readmission or death： implications of choosing the right model and the right metrics［J］. ESC Heart Failure， 2019， 6（2）： 428-435.
26	SONG X， TONG Y， LUO Y， et al. Predicting 7-day unplanned readmission in elderly patients with coronary heart disease using machine learning［J］. Frontiers in Cardiovascular Medicine， 2023， 10： No.1190038.
27	BROWN J R， RICKET I M， REEVES R M， et al. Information extraction from electronic health records to predict readmission following acute myocardial infarction： does natural language processing using clinical notes improve prediction of readmission？［J］. Journal of the American Heart Association， 2022， 11（7）： No. e024198.
28	XU Y， CHU X， YANG K， et al. SeqCare： sequential training with external medical knowledge graph for diagnosis prediction in healthcare data［C］// Proceedings of the ACM Web Conference 2023. New York： ACM， 2023：2819-2830.
29	HAMILTON W L， YING R， LESKOVEC J. Inductive representation learning on large graphs［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 1025-1035.
30	JANG E， GU S， POOLE B. Categorical reparameterization with Gumbel-Softmax［EB/OL］. ［2024-05-13］..
31	YOU Y， CHEN T， SUI Y， et al. Graph contrastive learning with augmentations［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 5812-5823.
32	HARUTYUNYAN H， KACHATRIAN H， KALE D C， et al. Multitask learning and benchmarking with clinical time series data［J］. Scientific Data， 2019， 6： No.96.
33	JOHNSON A E W， POLLARD T J， SHEN L， et al. MIMIC-Ⅲ， a freely accessible critical care database［J］. Scientific Data， 2016， 3： No.160035.
34	CHOI E， BAHADORI M T， KULAS J A， et al. RETAIN： an interpretable predictive model for healthcare using reverse time attention mechanism［C］// Proceedings of the 30h International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2018： 3512-3520.
35	GAO J， XIAO C， WANG Y， et al. StageNet： stage-aware neural networks for health risk prediction［C］// Proceedings of the Web Conference 2020. New York： ACM， 2020： 530-540.
36	曾华堂，柯夏童，伍丽群，等. 人工智能在医疗质量管理中应用现状和效果范围综述［J］. 中国医院管理， 2023， 43（8）：21-26.
	ZENG H T， KE X T， WU L Q， et al. Status and effectiveness of artificial intelligence in medical quality management： a scoping review ［J］. Chinese Hospital Management， 2023， 43（8）：21-26.

[1]	李雪莹, 杨琨, 涂国庆, 刘树波. 基于局部增强的时序数据对抗样本生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1573-1581.
[2]	游兰, 张雨昂, 刘源, 陈智军, 王伟, 曾星, 何张玮. 基于协作贡献网络的开源项目开发者推荐[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1213-1222.
[3]	王聪, 史艳翠. 基于多视角学习的图神经网络群组推荐模型[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1205-1212.
[4]	田仁杰, 景明利, 焦龙, 王飞. 基于混合负采样的图对比学习推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1053-1060.
[5]	孙海涛, 林佳瑜, 梁祖红, 郭洁. 结合标签混淆的中文文本分类数据增强技术[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1113-1119.
[6]	党伟超, 温鑫瑜, 高改梅, 刘春霞. 基于多视图多尺度对比学习的图协同过滤[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1061-1068.
[7]	盛坤, 王中卿. 基于大语言模型和数据增强的通感隐喻分析[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 794-800.
[8]	孙晨伟, 侯俊利, 刘祥根, 吕建成. 面向工程图纸理解的大语言模型提示生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 801-807.
[9]	蔡启健, 谭伟. 语义图增强的多模态推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 421-427.
[10]	富坤, 应世聪, 郑婷婷, 屈佳捷, 崔静远, 李建伟. 面向小样本节点分类的图数据增强方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 392-402.
[11]	严雪文, 黄章进. 基于对比学习的小样本图像分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 383-391.
[12]	马汉达, 吴亚东. 多域时空层次图神经网络的空气质量预测[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 444-452.
[13]	赵文博, 马紫彤, 杨哲. 基于有向超图自适应卷积的链接预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 15-23.
[14]	张嘉琳, 任庆桦, 毛启容. 利用全局-局部特征依赖的反欺骗说话人验证系统[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 308-317.
[15]	余肖生, 王智鑫. 基于多层次图对比学习的序列推荐模型[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 106-114.

基于图对比学习的再入院预测模型

Readmission prediction model based on graph contrastive learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 36

相关文章 15

编辑推荐

Metrics