计算机应用 ›› 2021, Vol. 41 ›› Issue (10): 3082-3088.DOI: 10.11772/j.issn.1001-9081.2020101695

所属专题: 前沿与综合应用

• 前沿与综合应用 • 上一篇    下一篇

基于变分自编码器的异常颈动脉早期识别和预测

黄晓祥1,2, 胡咏梅1, 吴丹2, 任力杰3   

  1. 1. 山东大学 控制科学与工程学院, 济南 250061;
    2. 中国科学院 深圳先进技术研究院, 广东 深圳 518055;
    3. 深圳市第二人民医院 神经内科, 广东 深圳 518028
  • 收稿日期:2020-11-02 修回日期:2021-01-28 出版日期:2021-10-10 发布日期:2021-02-24
  • 通讯作者: 胡咏梅
  • 作者简介:黄晓祥(1993-),男,湖北襄阳人,硕士研究生,主要研究方向:机器学习、数据挖掘;胡咏梅(1965-),女,山东济南人,教授,博士,主要研究方向:粗糙集、模式识别、数据挖掘、医院信息系统;吴丹(1984-),女,广东深圳人,助理研究员,博士,主要研究方向:数据挖掘、健康大数据分析与心血管疾病预测;任力杰(1972-),男,广东深圳人,主任医师,博士,主要研究方向:心血管疾病。
  • 基金资助:
    国家自然科学基金资助项目(81701788);深圳市科创委应用示范项目(KJYY20180703165202011)。

Early identification and prediction of abnormal carotid arteries based on variational autoencoder

HUANG Xiaoxiang1,2, HU Yongmei1, WU Dan2, REN Lijie3   

  1. 1. School of Control Science and Engineering, Shandong University, Jinan Shandong 250061, China;
    2. Shenzhen Institutes of Advanced Technology, Chinese Academy of Science, Shenzhen Guangdong 518055, China;
    3. Neurology Department, The Second People's Hospital of Shenzhen, Shenzhen Guangdong 518028, China
  • Received:2020-11-02 Revised:2021-01-28 Online:2021-10-10 Published:2021-02-24
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (81701788), the Shenzhen Science and Technology Innovation Commission Application Demonstration Project (KJYY20180703165202011).

摘要: 颈动脉狭窄、颈动脉内中膜厚度增加(CIMT)或颈动脉斑块等可导致脑卒中的发生。为实现脑卒中大规模初步筛查,提出基于医疗数据的改进的变分自编码器(VAE)来识别和预测异常颈动脉。首先,针对医疗数据存在缺失的情况,采用K近邻(KNN)、均值和众数相混合的方法(MKNN)以及改进的VAE对缺失数据进行填补以得到完整的数据集,从而提高数据的应用范围;接着,分析特征属性,并依据重要性对特征进行排序;然后,运用逻辑回归(LR)、支持向量机(SVM)、随机森林(RF)和极限梯度提升树(XGBT)这四种有监督学习方法结合遗传算法(GA)来建立异常颈动脉识别模型;最后,基于改进的VAE建立预测异常颈动脉的半监督模型。相较于基线模型,基于改进的VAE的半监督模型性能提升明显,灵敏度达到0.893 8,特异性达到0.927 2,F1值达到0.910 5,分类准确率达到0.910 5。实验结果表明,所建立的半监督模型可以用来识别异常颈动脉,进而作为一种识别脑卒中高危人群的工具,预防和减少脑卒中的发生。

关键词: 颈动脉, 脑卒中, 变分自编码器, 遗传算法, 半监督模型

Abstract: Carotid artery stenosis, Carotid Intima Media Thickness (CIMT) or carotid artery plaque may lead to stroke. For large-scale preliminary screening of stroke, an improved Variational AutoEncoder (VAE) based on medical data was proposed to predict and identify abnormal carotid arteries. Firstly, for the missing values in medical data, K-Nearest Neighbor (KNN), Mixture of mean, mode and KNN (MKNN) method and improved VAE were respectively used to impute the missed values to obtain the complete dataset, improving the application range of the data. Secondly, the feature attributes were analyzed and the features were ranked in order of importance. Thirdly, four supervised algorithms, Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF) and eXtreme Gradient Boosting Tree (XGBT), were combined with Genetic Algorithm (GA) to build the abnormal carotid artery identification models. Finally, based on the improved VAE, a semi-supervised abnormal carotid artery prediction model was built. Compared to the performance of baseline model, the performance of the semi-supervised model based on the improved VAE improves significantly with sensitivity of 0.893 8, specificity of 0.927 2, F1-measure of 0.910 5 and classification accuracy of 0.910 5. Experimental results show that this semi-supervised model can be used to identify the abnormal carotid arteries and thus serves as a tool to recognize high-risk groups of stroke, preventing and reducing the occurrence of stroke.

Key words: carotid artery, stroke, Variational AutoEncoder (VAE), Genetic Algorithm (GA), semi-supervised model

中图分类号: