Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (6): 1605-1610.DOI: 10.11772/j.issn.1001-9081.2015.06.1605

Previous Articles     Next Articles

Weighted online sequential extreme learning machine based on imbalanced sample-reconstruction

WANG Jinwan1, MAO Wentao1,2,3, HE Ling1, WANG Liyun1   

  1. 1. College of Computer and Information Engineering, Henan Normal University, Xinxiang Henan 453007, China;
    2. Computational Intelligence and Data Mining Engineering Technology Research Center of Colleges and Universities in Henan Province, Xinxiang Henan 453007, China;
    3. School of Mechanics, Civil Engineering and Architecture, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China
  • Received:2014-12-25 Revised:2015-03-24 Published:2015-06-12

基于不均衡样本重构的加权在线贯序极限学习机

王金婉1, 毛文涛1,2,3, 何玲1, 王礼云1   

  1. 1. 河南师范大学 计算机与信息工程学院, 河南 新乡 453007;
    2. 河南省高校计算智能与数据挖掘工程技术研究中心, 河南 新乡 453007;
    3. 西北工业大学 力学与土木建筑学院, 西安 710129
  • 通讯作者: 毛文涛(1980-),男,河南新乡人,副教授,博士,CCF会员,主要研究方向:机器学习、弱信号检测;maowt.mail@gmail.com
  • 作者简介:王金婉(1991-),女,河南济源人,硕士研究生,CCF会员,主要研究方向:机器学习、模式识别;何玲(1990-),女,河南鹤壁人,硕士研究生,主要研究方向:泛化性理论;王礼云(1986-),女,河南驻马店人,硕士研究生,主要研究方向:智能信息处理。
  • 基金资助:

    国家自然科学基金资助项目(U1204609);中国博士后科学基金资助项目(2014M550508);河南省基础与前沿技术研究计划项目(132300410430)。

Abstract:

Many traditional machine learning methods tend to get biased classifier which leads to low classification precision for minor class in imbalanced online sequential data. To improve the classification accuracy of minor class, a new weighted online sequential extreme learning machine based on imbalanced sample-reconstruction was proposed. The algorithm started from exploiting distributed characteristics of online sequential data, and contained two stages. In offline stage, the principal curve was introduced to construct the confidence region, where over-sampling was achieved for minor class to construct the equilibrium sample set which was consistent with the sample distribution trend, and then the initial model was established. In online stage, a new weighted method was proposed to update sample weight dynamically, where the value of weight was related to training error. The proposed method was evaluated on UCI dataset and Macao meteorological data. Compared with the existing methods, such as Online Sequential-Extreme Learning Machine (OS-ELM), Extreme Learning Machine (ELM)and Meta-Cognitive Online Sequential- Extreme Learning Machine (MCOS-ELM), the experimental results show that the proposed method can identify the minor class with a higher ability. Moreover, the training time of the proposed method has not much difference compared with the others, which shows that the proposed method can greatly increase the minor prediction accuracy without affecting the complexity of algorithm.

Key words: sample-reconstruction, Extreme Learning Machine (ELM), principal curve, over-sampling, imbalanced data

摘要:

针对现有学习算法难以有效提高不均衡在线贯序数据中少类样本分类精度的问题,提出一种基于不均衡样本重构的加权在线贯序极限学习机。该算法从提取在线贯序数据的分布特性入手,主要包括离线和在线两个阶段:离线阶段主要采用主曲线构建少类样本的可信区域,并通过对该区域内样本进行过采样,来构建符合样本分布趋势的均衡样本集,进而建立初始模型;而在线阶段则对贯序到达的数据根据训练误差赋予各样本相应权重,同时动态更新网络权值。采用UCI标准数据集和澳门实测气象数据进行实验对比,结果表明,与现有在线贯序极限学习机(OS-ELM)、极限学习机(ELM)和元认知在线贯序极限学习机(MCOS-ELM)相比,所提算法对少类样本的识别能力更高,且所提算法的模型训练时间与其他三种算法相差不大。结果表明在不影响算法复杂度的情况下,所提算法能有效提高少类样本的分类精度。

关键词: 样本重构, 极限学习机, 主曲线, 过采样, 不均衡数据

CLC Number: