计算机应用 ›› 2015, Vol. 35 ›› Issue (10): 2798-2802.DOI: 10.11772/j.issn.1001-9081.2015.10.2798

• 第十五届中国机器学习会议(CCML2015)论文 • 上一篇    下一篇

改进的多类支持向量机递归特征消除在癌症多分类中的应用

黄晓娟, 张莉   

  1. 苏州大学 计算机科学与技术学院, 苏州 215006
  • 收稿日期:2015-06-01 修回日期:2015-06-20 出版日期:2015-10-10 发布日期:2015-10-14
  • 通讯作者: 张莉(1975-),女,江苏张家港人,教授,博士生导师,博士,主要研究方向:机器学习、模式识别,zhangliml@suda.edu.cn
  • 作者简介:黄晓娟(1992-),女,江苏高邮人,主要研究方向:机器学习、模式识别。
  • 基金资助:
    国家自然科学基金资助项目(61373093,61402310);江苏省自然科学基金资助项目(BK20140008,BK201222725);江苏省高校自然科学研究项目(13KJA520001);江苏省"青蓝工程"资助项目;苏州大学大学生课外学术科研基金资助项目(KY2015549B)。

Modified multi-class support vector machine recursive feature elimination for cancer multi-classification

HUANG Xiaojuan, ZHANG Li   

  1. School of Computer Science and Technology, Soochow University, Suzhou Jiangsu 215006, China
  • Received:2015-06-01 Revised:2015-06-20 Online:2015-10-10 Published:2015-10-14

摘要: 为处理癌症多分类问题,已经提出了多类支持向量机递归特征消除(MSVM-RFE)方法,但该方法考虑的是所有子分类器的权重融合,忽略了各子分类器自身挑选特征的能力。为提高多分类问题的识别率,提出了一种改进的多类支持向量机递归特征消除(MMSVM-RFE)方法。所提方法利用一对多策略把多类问题化解为多个两类问题,每个两类问题均采用支持向量机递归特征消除来逐渐剔除掉冗余特征,得到一个特征子集;然后将得到的多个特征子集合并得到最终的特征子集;最后用SVM分类器对获得的特征子集进行建模。在3个基因数据集上的实验结果表明,改进的算法整体识别率提高了大约2%,单个类别的精度有大幅度提升甚至100%。与随机森林、k近邻分类器以及主成分分析(PCA)降维方法的比较均验证了所提算法的优势。

关键词: 支持向量机, 特征选择, 递归特征消除, 癌症分类, 基因数据

Abstract: To deal with cancer multi-cancer classification problems, a Multi-class feature selection method based on Support Vector Machine Recursive Feature Elimination (MSVM-RFE) has been proposed. However, it takes the combined weights of all SVM-RFE sub-classifiers into consideration, and ignores the ability of feature selection of each SVM-RFE sub-classifiers. To improve the recognition rate of multi-classification problem, a Modified MSVM-RFE (MMSVM-RFE) was presented. Similar to MSVM-RFE, MMSVM-RFE converted a multi-class problem into multiple binary tasks, then each binary feature elimination problem was solved by an SVM-REF which iteratively removed irrelevant features to obtain a feature subset. All these feature subsets were merged into one final feature subset on which an SVM classifier was trained. The experimental results on three gene datasets show that the proposed method can select a useful feature subset which is efficient in cancer classification. The proposed algorithm can increase the overall recognition rate by about 2%, and significantly enhances the precision of a single category, even to 100%. Compared to random forest, K-Nearest Neighbor (KNN) classifier and PCA dimension reduction, the proposed method can achieve better performance.

Key words: Support Vector Machine (SVM), feature selection, Recursive Feature Elimination (RFE), cancer classification, gene expression data

中图分类号: