《计算机应用》唯一官方网站

• •    下一篇

基于改进的RAKEL算法的心电诊断分类

赵静,韩京宇,钱龙,毛毅   

  1. 南京邮电大学 计算机学院
  • 收稿日期:2021-06-22 修回日期:2021-10-26 发布日期:2022-03-02 出版日期:2022-03-02
  • 通讯作者: 韩京宇
  • 基金资助:
    国家自然科学基金

ECG classification based on improved RAKEL algorithm

ZHAO Jing, HAN Jingyu, QIAN Long, MAO Yi   

  1. School of Computer Science and Technology, Nanjing University of Posts and Telecommunications
  • Received:2021-06-22 Revised:2021-10-26 Online:2022-03-02 Published:2022-03-02
  • Contact: HAN Jingyu

摘要: 摘 要: 心电图数据通常包含多种病症,因此适用于多标签分类算法,在多标签分类方法中,RAKEL(Random k-labelsets)方法是将初始标签集分解为若干个大小为k的随机子集,建立LP(Label Powerset)分类器进行训练。由于随机选择标签,没有充分考虑标签间的相关性,使得LP分类器中容易产生标签组合对应样本稀少的情况,从而影响预测性能。本文提出了一种基于贝叶斯的RAKEL算法(Bayesian network-based RAKEL, BN-RAKEL),首先利用贝叶斯网络找到标签间的相关性,确定候选标签子集;然后对每个标签采用基于信息增益的特征选择方法确定其最优特征空间,针对每个候选标签子集,利用最优特征空间相似性来检测其相关程度,从而确定最终的标签子集;最后在标签子集的最优特征空间上训练LP分类器。实验表明,改进的算法有更好的预测性能,心电图数据集包含的18个病症中,有17个病症的Recall和F_score值较RAKEL算发有所提升。

关键词: 关键词: 多标签, 标签相关性, 贝叶斯网络, 信息增益, 特征选择

Abstract: Abstract: ECG samples usually exhibit many diseases, and it is a typical multi-label classification problem. In multi-label classification methods, RAKEL (random k-labelsets) method decomposes the initial label set into several random subsets of size k, and establishes LP (label powerset) classifier for training. Because of the random selection of labels, the correlation between labels is not fully considered, which can easily cause few training examples for some new label subsets, which will deteriorate the prediction performance. This paper proposes a BN-RAKEL (Bayesian-network based RAKEL) algorithm. Firstly, the correlation among labels is found by Bayesian network, and then the candidate label subset is determined by using the feature selection method based on information gain. For each candidate label subset, the correlation degree is detected by using the similarity of optimal feature space. Finally, LP classifier is trained in the optimal feature space of the label subset. The experiment shows that the improved algorithm has improved prediction performance. Among the 18 diseases included in the ECG data set, the Recall and F_score values of 17 diseases have been improved compared with the RAKEL algorithm.

Key words: Keywords: multi-label, label correlation, bayesian-network, information gain, feature selection

中图分类号: