Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (5): 1512-1516.DOI: 10.11772/j.issn.1001-9081.2017102464

Previous Articles     Next Articles

Selective ensemble algorithm for gene expression data based on diversity and accuracy of weighted harmonic average measure

GAO Huiyun, LU Huijuan, YAN Ke, YE Minchao   

  1. College of Information Engineering, China Jiliang University, Hangzhou Zhejiang 310018, China
  • Received:2017-10-17 Revised:2017-11-24 Online:2018-05-10 Published:2018-05-24
  • Contact: 陆慧娟
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61272315), the Science and Technology Project of Zhejiang Province (2017C34003).

基于差异性和准确性的加权调和平均度量的 基因表达数据选择性集成算法

高慧云, 陆慧娟, 严珂, 叶敏超   

  1. 中国计量大学 信息工程学院, 杭州 310018
  • 通讯作者: 陆慧娟
  • 作者简介:高慧云(1993-),女,江苏南京人,硕士研究生,CCF会员,主要研究方向:机器学习、数据挖掘;陆慧娟(1962-),女,浙江东阳人,教授,博士,CCF杰出会员,主要研究方向:机器学习、模式识别、生物信息学;严珂(1983-),男,新加坡人,讲师,博士,主要研究方向:机器学习、图像处理;叶敏超(1987-),男,浙江杭州人,讲师,博士,主要研究方向:人工智能、图像处理。
  • 基金资助:
    国家自然科学基金资助项目(61272315);浙江省科技计划项目(2017C34003)。

Abstract: The diversity between base classifiers and the accuracy of single base classifiers itself are two important factors that affect the generalization performance of ensemble system. Aiming at the problem that the diversity and accuracy are difficult to balance, a selective ensemble algorithm for gene expression data based on Diversity and Accuracy of Weighted Harmonic Average (D-A-WHA) was proposed. The Kernel Extreme Learning Machine (KELM) was used as the base classifier, and the diversity and accuracy of base classifiers were adjusted by D-A-WHA measure. Finally, a set of classifiers with high accuracy and high diversity with other base classifiers were selected to ensemble. The experimental results on UCI gene dataset show that compared with traditional Bagging, Adaboost and other ensemble algorithms, the classification accuracy and stability of the selective ensemble algorithm based on D-A-WHA measure are improved significantly,and it can be applied to the classification of cancer gene expression data effectively.

Key words: selective ensemble, Kernel Extreme Learning Machine (KELM), gene expression data, diversity, accuracy

摘要: 基分类器之间的差异性和单个基分类器自身的准确性是影响集成系统泛化性能的两个重要因素,针对差异性和准确性难以平衡的问题,提出了一种基于差异性和准确性的加权调和平均(D-A-WHA)度量基因表达数据的选择性集成算法。以核超限学习机(KELM)作为基分类器,通过D-A-WHA度量调节基分类器之间的差异性和准确性,最后选择一组准确性较高并且与其他基分类器差异性较大的基分类器组合进行集成。通过在UCI基因数据集上进行仿真实验,实验结果表明,与传统的Bagging、Adaboost等集成算法相比,基于D-A-WHA度量的选择性集成算法分类精度和稳定性都有显著的提高,且能有效应用于癌症基因数据的分类中。

关键词: 选择性集成, 核超限学习机, 基因表达数据, 差异性, 准确性

CLC Number: