Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (7): 1862-1865.DOI: 10.11772/j.issn.1001-9081.2018010152

Previous Articles     Next Articles

Imbalanced image classification approach based on convolution neural network and cost-sensitivity

TAN Jiefan1, ZHU Yan1, CHEN Tung-shou2, CHANG Chin-chen3   

  1. 1. College of Information Science and Technology, Southwest Jiaotong University, Chengdu Sichuan 611756, China;
    2. Department of Computer Science and Information Engineering, Taichung University of Science and Technology, Taichung Taiwan 404, China;
    3. Department of Information Engineering and Computer Science, Feng Chia University, Taichung Taiwan 407, China
  • Received:2018-01-17 Revised:2018-03-06 Online:2018-07-10 Published:2018-07-12
  • Supported by:
    This work is partially supported by the Academic and Technological Leadership Training Foundation of Sichuan Province (WZ0100112371408, YH1500411031402), the Academic and Technological Leadership Research Foundation of Sichuan Province (WZ0100112371601/004), the Demonstration Project in Technology Service Industry of Sichuan Province (2016GFW0166).

基于卷积神经网络和代价敏感的不平衡图像分类方法

谭洁帆1, 朱焱1, 陈同孝2, 张真诚3   

  1. 1. 西南交通大学 信息科学与技术学院, 成都 611756;
    2. 台中科技大学 资讯工程系, 台湾 台中 404;
    3. 逢甲大学 资讯工程系, 台湾 台中407
  • 通讯作者: 朱焱
  • 作者简介:谭洁帆(1992-),女,四川长宁人,硕士研究生,主要研究方向:图像数据挖掘;朱焱(1965-),女,广西桂林人,教授,博士,CCF会员,主要研究方向:数据挖掘、Web异常发现、大数据管理与智能分析;陈同孝(1964-),男,安徽霍邱人,教授,博士,主要研究方向:影像处理、资料勘探、资讯安全;张真诚(1954-),男,台湾台中人,教授,博士,主要研究方向:资料库设计、电子商务安全、电子多媒体影像技术、密码学。
  • 基金资助:
    四川省学术和技术带头人后备人选科研基金资助项目(WZ0100112371408,YH1500411031402);四川省学术和技术带头人科研基金资助项目(WZ0100112371601/004);四川省科技服务业示范项目(2016GFW0166)。

Abstract: Focusing on the issues that the recall of minority class is low, the cost of classification is high and manual feature selection costs too much in imbalanced image classification, an imbalanced image classification approach based on Triplet-sampling Convolutional Neural Network (Triplet-sampling CNN) and Cost-Sensitive Support Vector Machine (CSSVM), called Triplet-CSSVM, was proposed. This method had two parts:feature learning and cost sensitive classification. Firstly, the coding method which mapped images to a Euclidean space end-to-end was learned by the CNN which used Triplet loss as loss function. Then, the dataset was rescaled by sampling method to balance the distribution. At last, the best classification result with the minimum cost was obtained by CSSVM classification algorithm which assigned different cost factors to different classes. Experiments with the portrait dataset FaceScrub on the deep learning framework Caffe were conducted. And the experimental results show that the precision is increased by 31 percentage points and the recall of the proposed method is increased by 71 percentage points compared with VGGNet-SVM (Visual Geometry Group Net-Support Vector Machine) in the condition of 1:3 imbalanced rate.

Key words: Convolution Neural Network (CNN), cost sensitive, image classification, data balance, Support Vector Machine (SVM)

摘要: 针对不平衡图像分类中少数类查全率低、分类结果总代价高,以及人工提取特征主观性强而且费时费力的问题,提出了一种基于Triplet-sampling的卷积神经网络(Triplet-sampling CNN)和代价敏感支持向量机(CSSVM)的不平衡图像分类方法——Triplet-CSSVM。该方法将分类过程分为特征学习和代价敏感分类两部分。首先,利用误差公式为三元损失函数的卷积神经网络端对端地学习将图像映射到欧几里得空间的编码方法;然后,结合采样方法重构数据集,使其分布平衡化;最后,使用CSSVM分类算法给不同类别赋以不同的代价因子,获得最佳代价最小的分类结果。在深度学习框架Caffe上使用人像数据集FaceScrub进行实验。实验结果表明,所提方法在1∶3的不平衡率下,与VGGNet-SVM方法相比,少数类的精确率提高了31个百分点,召回率提高了71个百分点。

关键词: 卷积神经网络, 代价敏感, 图像分类, 数据平衡, 支持向量机

CLC Number: