Imbalanced image classification approach based on convolution neural network and cost-sensitivity
TAN Jiefan1, ZHU Yan1, CHEN Tung-shou2, CHANG Chin-chen3
1. College of Information Science and Technology, Southwest Jiaotong University, Chengdu Sichuan 611756, China; 2. Department of Computer Science and Information Engineering, Taichung University of Science and Technology, Taichung Taiwan 404, China; 3. Department of Information Engineering and Computer Science, Feng Chia University, Taichung Taiwan 407, China
Abstract:Focusing on the issues that the recall of minority class is low, the cost of classification is high and manual feature selection costs too much in imbalanced image classification, an imbalanced image classification approach based on Triplet-sampling Convolutional Neural Network (Triplet-sampling CNN) and Cost-Sensitive Support Vector Machine (CSSVM), called Triplet-CSSVM, was proposed. This method had two parts:feature learning and cost sensitive classification. Firstly, the coding method which mapped images to a Euclidean space end-to-end was learned by the CNN which used Triplet loss as loss function. Then, the dataset was rescaled by sampling method to balance the distribution. At last, the best classification result with the minimum cost was obtained by CSSVM classification algorithm which assigned different cost factors to different classes. Experiments with the portrait dataset FaceScrub on the deep learning framework Caffe were conducted. And the experimental results show that the precision is increased by 31 percentage points and the recall of the proposed method is increased by 71 percentage points compared with VGGNet-SVM (Visual Geometry Group Net-Support Vector Machine) in the condition of 1:3 imbalanced rate.
[1] 谷琼,袁磊,熊启军,等.基于非均衡数据集的代价敏感学习算法比较研究[J].微电子学与计算机,2011,28(8):146-149.(GU Q, YUAN L, XIONG Q J, et al. A comparative study of cost-sensitive learning algorithm based on imbalanced data sets[J]. Micro Electronics & Computer, 2011, 28(8):146-149.) [2] 刘胥影.代价敏感学习方法的研究[D].南京:南京大学,2010:7.(LIU X Y. Research on cost-sensitive learning methods[D]. Nanjing:Nanjing University, 2010:7.) [3] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 2012 International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2012:1097-1105. [4] YAN Y, CHEN M, SHYU M L, et al. Deep learning for imbalanced multimedia data classification[C]//Proceedings of the 2015 IEEE International Symposium on Multimedia. Piscataway, NJ:IEEE, 2015:483-488. [5] CHUNG Y A, LIN H T, YANG S W. Cost-aware pre-training for multiclass cost-sensitive deep learning[J/OL]. arXiv preprint, 2015:arXiv:1511.09337[2017-06-15]. https://arxiv.org/abs/1511.09337. [6] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet:a unified embedding for face recognition and clustering[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:815-823. [7] 缪林松.基于代价敏感神经网络算法的软件缺陷预测[J].电子科技,2012,25(6):75-78.(MIAO L S. Software defect prediction based on cost-sensitive neural networks[J]. Electronic Science and Technology, 2012, 25(6):75-78.) [8] LIU X Y, ZHOU Z H. Learning with cost intervals[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2010:403-412. [9] WANG K J, MAKOND B, WANG K M. An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data[J]. BMC Medical Informatics and Decision Making, 2013, 13(1):124. [10] JIA Y Q. Deep learning framework by BAIR[EB/OL].[2017-09-12]. http://caffe.berkeleyvision.org/. [11] STEHMAN S V. Selecting and interpreting measures of thematic classification accuracy[J]. Remote Sensing of Environment, 1997, 62(1):77-89. [12] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint, 2014:arXiv:1409.1556[2017-06-03]. https://arxiv.org/abs/1409.1556.