Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (11): 3305-3311.

Data augmentation method based on conditional generative adversarial net model

1. School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing Jiangsu 210044, China
• Received:2018-05-14 Revised:2018-06-26 Online:2018-11-10 Published:2018-11-10
• Supported by:
This work is partially supported by the National Natural Science Foundation of China (61672291), the Beijige Foundation (BJG201504).

基于条件生成式对抗网络的数据增强方法

1. 南京信息工程大学 数学与统计学院, 南京 210044
• 通讯作者: 管正雄
• 作者简介:陈文兵(1964-),男,安徽东至人,副教授,硕士,主要研究方向:计算数学、模式识别、图像处理;管正雄(1993-),男,安徽芜湖人,硕士研究生,主要研究方向:模式识别、图像处理;陈允杰(1980-),男,江苏南京人,教授,博士,主要研究方向:计算数学、模式识别、图像处理。
• 基金资助:
国家自然科学基金资助项目（61672291）；北极阁基金资助项目（BJG201504）。

Abstract: Deep Convolutional Neural Network (CNN) is trained by large-scale labelled datasets. After training, the model can achieve high recognition rate or good classification effect. However, the training of CNN models with smaller-scale datasets usually occurs overfitting. In order to solve this problem, a novel data augmentation method called GMM-CGAN was proposed, which was integrated Gaussian Mixture Model (GMM) and CGAN (Conditional Generative Adversarial Net). Firstly, sample number was increased by randomly sliding sampling around the core region. Secondly, the random noise vector was supposed to submit to the distribution of GMM model, then it was used as the initial input to the CGAN generator and the image label was used as the CGAN condition to train the parameters of the CGAN and GMM models. Finally, the trained CGAN was used to generate a new dataset that matched the real distribution of the samples. The dataset was divided into 12 classes of 386 items. After implementing GMM-CGAN on the dataset, the total number of the new dataset was 38600. The experimental results show that compared with CNN's training datasets augmented by Affine transformation or CGAN, the average classification accuracy of the proposed method is 89.1%, which is improved by 18.2% and 14.1%, respectively.

CLC Number: