计算机应用 ›› 2018, Vol. 38 ›› Issue (10): 2772-2777.DOI: 10.11772/j.issn.1001-9081.2018041101

• 2018中国粒计算与知识发现学术会议(CGCKD 2018)论文 • 上一篇    下一篇

基于代表的留一法集成学习分类

王轩, 张林, 高磊, 蒋昊坤   

  1. 西南石油大学 计算机科学学院, 成都 610500
  • 收稿日期:2018-03-28 修回日期:2018-06-02 出版日期:2018-10-10 发布日期:2018-10-13
  • 通讯作者: 张林
  • 作者简介:王轩(1991-),男,河南新乡人,硕士研究生,CCF会员,主要研究方向:主动学习、粗糙集;张林(1963-),男,四川乐山人,教授,博士,主要研究方向:计算机图像处理、网络安全;高磊(1979-),女,山东烟台人,副教授,博士,主要研究方向:智能算法、机器学习;蒋昊坤(1994-),男,四川遂宁人,硕士研究生,主要研究方向:粗糙集、机器学习。
  • 基金资助:
    国家自然科学基金资助项目(61379089,41604114)。

Representative-based ensemble learning classification with leave-one-out

WANG Xuan, ZHANG Lin, GAO Lei, JIANG Haokun   

  1. School of Computer Science, Southwest Petroleum University, Chengdu Sichuan 610500, China
  • Received:2018-03-28 Revised:2018-06-02 Online:2018-10-10 Published:2018-10-13
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61379089, 41604114).

摘要: 为应对抽样不均匀带来的影响,以基于代表的分类算法为基础,提出一种用于符号型数据分类的留一法集成学习分类算法(LOOELCA)。首先采用留一法获得n个小训练集,其中n为初始训练集大小。然后使用每个训练集构建独立的基于代表的分类器,并标注出分类错误的分类器及对象。最后,标注分类器和原始分类器形成委员会并对测试集对象进行分类。如委员会表决一致,则直接给该测试对象贴上类标签;否则,基于k最近邻(kNN)算法并利用标注对象对测试对象分类。在UCI标准数据集上的实验结果表明,LOOELCA与基于代表的粗糙集覆盖分类(RBC-CBNRS)算法相比,精度平均提升0.35~2.76个百分点,LOOELCA与ID3、J48、Naïve Bayes、OneR等方法相比也有更高的分类准确率。

关键词: 代表, 粗糙集, 邻域, 留一法, 集成学习

Abstract: In order to response the effect of sampling non-uniformity, based on the representative-based classification algorithm, a Leave-One-Out Ensemble Learning Classification Algorithm (LOOELCA) for symbolic data classification was proposed. Firstly, n small training sets were obtained through leave-one-out methods, where n is the initial training set size. Then independent representative-based classifiers were built by using training sets, and the misclassified classifiers and objects were marked out. Finally, the marked classifier and the original classifier formed a committee and the test set objects were classified. If the committee voted the same, the test object was directly labeled with a class label; otherwise, the test object was classified based on the k-Nearest Neighbor (kNN) algorithm and the marked objects. The experimental results on the UCI standard dataset show that the accuracy of LOOELCA improved 0.35-2.76 percentage points on average compared with the Representative-Based Classification through Covering-Based Neighborhood Rough Set (RBC-CBNRS); compared with ID3, J48, Naïve Bayes, OneR and other methods, LOOELCA also has higher classification accuracy.

Key words: representative, rough set, neighborhood, leave-one-out, ensemble learning

中图分类号: