《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (6): 1892-1897.DOI: 10.11772/j.issn.1001-9081.2021061068

• 人工智能 • 上一篇    

基于改进的RAKEL算法的心电图诊断分类

赵静, 韩京宇(), 钱龙, 毛毅   

  1. 南京邮电大学 计算机学院,南京 210023
  • 收稿日期:2021-06-22 修回日期:2022-01-16 接受日期:2022-01-20 发布日期:2022-06-22 出版日期:2022-06-10
  • 通讯作者: 韩京宇
  • 作者简介:赵静(1996—),女,江苏连云港人,硕士研究生,主要研究方向:机器学习
    钱龙(1994—),男,江苏盐城人,硕士研究生,主要研究方向:机器学习
    毛毅(1985—),女,江苏南京人,讲师,博士,主要研究方向:机器学习、深度学习。
  • 基金资助:
    国家自然科学基金资助项目(62002174)

ECG diagnostic classification based on improved RAKEL algorithm

Jing ZHAO, Jingyu HAN(), Long QIAN, Yi MAO   

  1. School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210023,China
  • Received:2021-06-22 Revised:2022-01-16 Accepted:2022-01-20 Online:2022-06-22 Published:2022-06-10
  • Contact: Jingyu HAN
  • About author:ZHAO Jing,born in 1996,M. S. candidate. Her research interests include machine learning.
    QIAN Long,born in 1994,M. S. candidate. His research interests include machine learning.
    MAO Yi,born in 1985,Ph. D.,lecturer. Her research interests include machine learning,deep learning
  • Supported by:
    National Natural Science Foundation of China(62002174)

摘要:

心电图(ECG)数据通常包含多种病症,而ECG诊断是一个典型的多标签分类问题。在多标签分类方法中,RAKEL算法将标签集随机分解为若干个大小为k的子集,并建立LP分类器进行训练;然而由于没有充分考虑标签间的相关性,LP分类器中容易产生一些标签组合所对应样本稀少的情况,从而影响预测性能。为了充分考虑标签间的相关性,提出一种基于贝叶斯网络的RAKEL算法BN-RAKEL。首先利用贝叶斯网络找到标签间的相关性,确定候选标签子集;然后对每个标签采用基于信息增益的特征选择算法确定其最优特征空间,并针对每个候选标签子集利用最优特征空间相似性来检测其相关程度,以确定最终的具有强相关性的标签子集;最后在标签子集的最优特征空间上训练LP分类器。在实际的ECG数据集上,与多标签K近邻(ML-KNN)、RAKEL、CC和基于FP-Growth的RAKEL算法FI-RAKEL进行对比,结果显示所提算法在召回率和F-score上最少提高了3.6个百分点和2.3个百分点。实验结果表明,BN-RAKEL算法有较好的预测性能,能有效提升ECG诊断的准确性。

关键词: 心电图, 多标签, 标签相关性, 贝叶斯网络, 信息增益, 特征选择, RAKEL算法

Abstract:

ElectroCardioGram (ECG) data usually contain many diseases, and ECG diagnosis is a typical multi-label classification problem. In RAndom k-labELsets (RAKEL) algorithm, one of multi-label classification methods, all labels are randomly decomposed into several labelsets of size k, and a Label Powerset (LP) classifier is established for training; however, the lack of sufficient consideration of correlation between labels makes the LP classifier obtain quite few samples corresponding to certain label combinations, which affects the prediction performance. To fully consider the correlation between labels, a Bayesian Network-based RAKEL (BN-RAKEL) algorithm was proposed. Firstly, the correlation between labels was found by Bayesian network to determine the candidate labelsets. Then, a feature selection method based on information gain was applied to construct the optimal feature space for each label, and the optimal feature space similarity was used for each candidate label subset to detect its correlation degree, determing the final labelsets with strong correlation. Finally, the LP classifiers were trained in the optimal feature space of the corresponding labelsets. A comparison with K-Nearest Neighbors for Multi-label Learning (ML-KNN), RAKEL, Classifier Chains (CC) and FP-Growth based RAKEL algorithm named FI-RAKEL on the real ECG dataset showed that the proposed algorithm achieved a minimum improvement of 3.6 percentage points and 2.3percentage points in recall and F-score, respectively. Experimental results show that BN-RAKEL algorithm has good prediction performance, and can effectively improve the ECG diagnosis accuracy.

Key words: ElectroCardioGram (ECG), multi-label, label correlation, Bayesian network, information gain, feature selection, RAndom k-labELsets (RAKEL) algorithm

中图分类号: