基于代表的留一法集成学习分类

doi:10.11772/j.issn.1001-9081.2018041101

计算机应用 ›› 2018, Vol. 38 ›› Issue (10): 2772-2777.DOI: 10.11772/j.issn.1001-9081.2018041101

• 2018中国粒计算与知识发现学术会议(CGCKD 2018)论文 • 上一篇下一篇

基于代表的留一法集成学习分类

王轩, 张林, 高磊, 蒋昊坤

西南石油大学计算机科学学院, 成都 610500

收稿日期:2018-03-28 修回日期:2018-06-02 出版日期:2018-10-10 发布日期:2018-10-13
通讯作者: 张林
作者简介:王轩(1991-),男,河南新乡人,硕士研究生,CCF会员,主要研究方向:主动学习、粗糙集;张林(1963-),男,四川乐山人,教授,博士,主要研究方向:计算机图像处理、网络安全;高磊(1979-),女,山东烟台人,副教授,博士,主要研究方向:智能算法、机器学习;蒋昊坤(1994-),男,四川遂宁人,硕士研究生,主要研究方向:粗糙集、机器学习。
基金资助:
国家自然科学基金资助项目（61379089，41604114）。

Representative-based ensemble learning classification with leave-one-out

WANG Xuan, ZHANG Lin, GAO Lei, JIANG Haokun

School of Computer Science, Southwest Petroleum University, Chengdu Sichuan 610500, China

Received:2018-03-28 Revised:2018-06-02 Online:2018-10-10 Published:2018-10-13
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61379089, 41604114).

摘要/Abstract

摘要： 为应对抽样不均匀带来的影响，以基于代表的分类算法为基础，提出一种用于符号型数据分类的留一法集成学习分类算法（LOOELCA）。首先采用留一法获得n个小训练集，其中n为初始训练集大小。然后使用每个训练集构建独立的基于代表的分类器，并标注出分类错误的分类器及对象。最后，标注分类器和原始分类器形成委员会并对测试集对象进行分类。如委员会表决一致，则直接给该测试对象贴上类标签；否则，基于k最近邻（kNN）算法并利用标注对象对测试对象分类。在UCI标准数据集上的实验结果表明，LOOELCA与基于代表的粗糙集覆盖分类（RBC-CBNRS）算法相比，精度平均提升0.35~2.76个百分点，LOOELCA与ID3、J48、Naïve Bayes、OneR等方法相比也有更高的分类准确率。

关键词: 代表, 粗糙集, 邻域, 留一法, 集成学习

Abstract: In order to response the effect of sampling non-uniformity, based on the representative-based classification algorithm, a Leave-One-Out Ensemble Learning Classification Algorithm (LOOELCA) for symbolic data classification was proposed. Firstly, n small training sets were obtained through leave-one-out methods, where n is the initial training set size. Then independent representative-based classifiers were built by using training sets, and the misclassified classifiers and objects were marked out. Finally, the marked classifier and the original classifier formed a committee and the test set objects were classified. If the committee voted the same, the test object was directly labeled with a class label; otherwise, the test object was classified based on the k-Nearest Neighbor (kNN) algorithm and the marked objects. The experimental results on the UCI standard dataset show that the accuracy of LOOELCA improved 0.35-2.76 percentage points on average compared with the Representative-Based Classification through Covering-Based Neighborhood Rough Set (RBC-CBNRS); compared with ID3, J48, Naïve Bayes, OneR and other methods, LOOELCA also has higher classification accuracy.

Key words: representative, rough set, neighborhood, leave-one-out, ensemble learning

中图分类号:

TP181

王轩, 张林, 高磊, 蒋昊坤. 基于代表的留一法集成学习分类[J]. 计算机应用, 2018, 38(10): 2772-2777.

WANG Xuan, ZHANG Lin, GAO Lei, JIANG Haokun. Representative-based ensemble learning classification with leave-one-out[J]. Journal of Computer Applications, 2018, 38(10): 2772-2777.

参考文献

[1] HOLZINGER A. Interactive machine learning for health informatics:when do we need the human-in-the-loop?[J]. Brain Informatics, 2016, 3(2):119-131.
[2] YAO J T, CIUCCI D, ZHANG Y. Generalized rough sets[J]. Studies in Fuzziness & Soft Computing, 2015, 324:413-424.
[3] YAO Y, YAO B. Covering based rough set approximations[J]. Information Sciences, 2012, 200(1):91-107.
[4] ZHU X Z, ZHU W, FAN X N. Rough set methods in feature selection via submodular function[J]. Soft Computing, 2017, 21(13):3699-3711.
[5] CHEN H, LI T, CAI Y, et al. Parallel attribute reduction in dominance-based neighborhood rough set[J]. Information Sciences, 2016, 373:351-368.
[6] ZHANG B W, MIN F, CIUCCI D. Representative-based classification through covering-based neighborhood rough sets[J]. Applied Intelligence, 2015, 43(4):840-854.
[7] 吴思博, 陈志刚, 黄瑞.基于相关系数的ID3优化算法[J]. 计算机工程与科学, 2016, 38(11):2342-2347. (WU S B, CHEN Z G, HUANG R. An improved ID3 algorithm based on correlation coefficients[J]. Computer Engineering & Science, 2016, 38(11):2342-2347.)
[8] YANG Y, CHEN W G. Taiga:performance optimization of the C4.5 decision tree construction algorithm[J]. Tsinghua Science and Technology, 2016, 21(4):415-425.
[9] 朱鹏飞, 胡清华, 于达仁.基于随机化属性选择和邻域覆盖约简的集成学习[J]. 电子学报, 2012, 40(2):273-279. (ZHU P F, HU Q H, YU D R. Ensemble learning based on randomized attribute selection and neighborhood covering reduction[J]. Acta Electronica Sinica, 2012, 40(2):273-279.)
[10] 刘学艺, 李平, 郜传厚.极限学习机的快速留一交叉验证算法[J]. 上海交通大学学报, 2011, 45(8):1140-1145. (LIU X Y, LI P, GAO C H. Fast leave-one-out cross-validation algorithm for extreme learning machine[J]. Journal of Shanghai Jiaotong University, 2011, 45(8):1140-1145.)
[11] SEIN M, 傅顺开, 吕天依, 等.一般贝叶斯网络分类器及其学习算法[J]. 计算机应用研究, 2016, 33(5):1327-1334. (SEIN M, FU S K, LYU T Y, et al. Algorithm for exact recovery of Bayesian network for classification[J]. Application Research of Computers, 2016, 33(5):1327-1334.)
[12] 王翔, 胡学钢, 杨秋洁.基于One-R的改进随机森林入侵检测模型研究[J]. 合肥工业大学学报(自然科学版), 2015(5):627-630. (WANG X, HU X G, YANG Q J. Research on improve intrusion detection model with random forest based on feature evaluation of One-R[J]. Journal of Hefei University of Technology (Natural Science), 2015(5):627-630).
[13] MIN F, HE H, QIAN Y, et al. Test-cost-sensitive attribute reduction[J]. Information Sciences, 2011, 181(22):4928-4942.
[14] LIU F, ZHANG B, CIUCCI D, et al. A comparison study of similarity measures for covering-based neighborhood classifiers[J]. Information Sciences, 2018, 448/449:1-17.
[15] 刘福伦, 闵帆, 张本文.代价敏感代表选举的邻域覆盖粗糙集分类方法[J]. 江苏科技大学学报(自然科学版), 2017, 31(2):190-195. (LIU F L, MIN F, ZHANG B W. Cost-sensitive representatives-based classification through covering-based neighborhood rough set[J]. Journal of Jiangsu University of Science and Technology (Natural Science Edition), 2017, 31(2):190-195.)

基于代表的留一法集成学习分类

Representative-based ensemble learning classification with leave-one-out

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	毛铭泽, 曹芮浩, 闫春钢. 基于权值多样性的半监督分类算法[J]. 计算机应用, 2021, 41(9): 2473-2480.
[2]	湛航, 何朗, 黄樟灿, 李华峰, 张蔷, 谈庆. 改进的基于层次距离的基因表达式编程特征选择分类算法[J]. 计算机应用, 2021, 41(9): 2658-2667.
[3]	王小荣, 张玉召, 张振江. 基于双论域粗糙集的快捷货物运输方案选择[J]. 计算机应用, 2021, 41(5): 1500-1505.
[4]	李杏峰, 黄玉清, 任珍文, 李毅红. 基于自适应邻域的鲁棒多视图聚类算法[J]. 计算机应用, 2021, 41(4): 1093-1099.
[5]	余东昌, 赵文芳, 聂凯, 张舸. 基于LightGBM算法的能见度预测模型[J]. 计算机应用, 2021, 41(4): 1035-1041.
[6]	彭莉, 张海清, 李代伟, 唐聃, 于曦, 何磊. 基于粗糙集理论的不完备数据分析方法的混合信息系统填补算法[J]. 计算机应用, 2021, 41(3): 677-685.
[7]	后云龙, 朱磊, 陈琴, 吕燧栋. 基于高斯差分特征网络的显著目标检测[J]. 计算机应用, 2021, 41(3): 706-713.
[8]	秦静, 左长青, 汪祖民, 季长清, 王宝凤. 基于堆叠分类器的心电异常监测模型设计[J]. 计算机应用, 2021, 41(3): 887-890.
[9]	罗长银, 陈学斌, 马春地, 王君宇. 面向区块链的在线联邦增量学习算法[J]. 计算机应用, 2021, 41(2): 363-371.
[10]	杨玮, 李然, 张堃. 基于变邻域模拟退火算法的多自动导引车任务分配优化[J]. 计算机应用, 2021, 41(10): 3056-3062.
[11]	李华峰, 黄樟灿, 张蔷, 湛航, 谈庆. 求解需求可拆分车辆路径问题的改进的金字塔演化策略[J]. 计算机应用, 2021, 41(1): 300-306.
[12]	周超然, 赵建平, 马太, 周欣. 基于注意力机制和集成学习的网页黑名单判别方法[J]. 计算机应用, 2021, 41(1): 133-138.
[13]	顾桐, 许国良, 李万林, 李家浩, 王志愿, 雒江涛. 基于集成LightGBM和贝叶斯优化策略的房价智能评估模型[J]. 计算机应用, 2020, 40(9): 2762-2767.
[14]	刘丹, 姚立霜, 王云锋, 裴作飞. 面向类不平衡流量数据的分类模型[J]. 计算机应用, 2020, 40(8): 2327-2333.
[15]	王磊. 改进粗糙集属性约简结合K-means聚类的网络入侵检测方法[J]. 计算机应用, 2020, 40(7): 1996-2002.