Survey of convolutional neural network

doi:10.11772/j.issn.1001-9081.2016.09.2508

Abstract

Abstract: In recent years, Convolutional Neural Network (CNN) has made a series of breakthrough research results in the fields of image classification, object detection, semantic segmentation and so on. The powerful ability of CNN for feature learning and classification attracts wide attention, it is of great value to review the works in this research field. A brief history and basic framework of CNN were introduced. Recent researches on CNN were thoroughly summarized and analyzed in four aspects: over-fitting problem, network structure, transfer learning and theoretic analysis. State-of-the-art CNN based methods for various applications were concluded and discussed. At last, some shortcomings of the current research on CNN were pointed out and some new insights for the future research of CNN were presented.

Key words: Convolutional Neural Network (CNN), deep learning, feature representation, neural network, transfer learning

摘要： 近年来，卷积神经网络在图像分类、目标检测、图像语义分割等领域取得了一系列突破性的研究成果，其强大的特征学习与分类能力引起了广泛的关注，具有重要的分析与研究价值。首先回顾了卷积神经网络的发展历史，介绍了卷积神经网络的基本结构和运行原理，重点针对网络过拟合、网络结构、迁移学习、原理分析四个方面对卷积神经网络在近期的研究进行了归纳与分析，总结并讨论了基于卷积神经网络的相关应用领域取得的最新研究成果，最后指出了卷积神经网络目前存在的不足以及未来的发展方向。

关键词: 卷积神经网络, 深度学习, 特征表达, 神经网络, 迁移学习

CLC Number:

TP181

LI Yandong, HAO Zongbo, LEI Hang. Survey of convolutional neural network[J]. Journal of Computer Applications, 2016, 36(9): 2508-2515.

李彦冬, 郝宗波, 雷航. 卷积神经网络研究综述[J]. 计算机应用, 2016, 36(9): 2508-2515.

References

[1] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[2] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets [J]. Neural Computation, 2006, 18(7): 1527-1554.
[3] LEE H, GROSSE R, RANGANATH R, et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations [C]// ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. New York: ACM, 2009: 609-616.
[4] HUANG G B, LEE H, ERIK G. Learning hierarchical representations for face verification with convolutional deep belief networks [C]// CVPR '12: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2012: 2518-2525.
[5] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2012: 1106-1114.
[6] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 580-587.
[7] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 3431-3440.
[8] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2015-11-04]. http://www.robots.ox.ac.uk:5000/~vgg/publications/2015/Simonyan15/simonyan15.pdf.
[9] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 1-8.
[10] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [EB/OL]. [2016-01-04]. https://www.researchgate.net/publication/286512696_Deep_Residual_Learning_for_Image_Recognition.
[11] PAN S J, YANG Q. A survey on transfer learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359.
[12] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch [J]. Journal of Machine Learning Research, 2011, 12(1): 2493-2537.
[13] OQUAB M, BOTTOU L, LAPTEV I, et al. Learning and transferring mid-level image representations using convolutional neural networks [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1717-1724.
[14] HUBEL D H, WIESEL T N. Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex [J]. Journal of Physiology, 1962, 160(1): 106-154.
[15] FUKUSHIMA K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position [J]. Biological Cybernetics, 1980, 36(4): 193-202.
[16] WAIBEL A, HANAZAWA T, HINTON G, et al. Phoneme recognition using time-delay neural networks [M]// Readings in Speech Recognition. Amsterdam: Elsvier, 1990: 393-404.
[17] VAILLANT R, MONROCQ C, LE CUN Y. Original approach for the localization of objects in images [J]. IEE Proceedings—Vision, Image and Signal Processing, 1994, 141(4): 245-250.
[18] LAWRENCE S, GILES C L, TSOI A C, et al. Face recognition: a convolutional neural-network approach [J]. IEEE Transactions on Neural Networks, 1997, 8(1): 98-113.
[19] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database [C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2009:248-255.
[20] DONAHUE J, HENDRICKS L A, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 2625-2634.
[21] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 3156-3164.
[22] MALINOWSKI M, ROHRBACH M, FRITZ M. Ask your neurons: a neural-based approach to answering questions about images [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 1-9.
[23] ANTOL S, AGRAWAL A, LU J, et al. VQA: visual question answering [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 2425-2433.
[24] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks [C]// Proceedings of European Conference on Computer Vision, LNCS 8689. Berlin: Springer, 2014: 818-833.
[25] JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221-231.
[26] LOWE D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[27] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection [C]// Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2005: 886-893.
[28] LECUN Y, BENGIO Y, HINTON G E. Deep learning [J]. Nature, 2015, 521(7553): 436-444.
[29] 孙志军,薛磊,许阳明,等.深度学习研究综述[J].计算机应用研究,2012,29(8):2806-2810.(SUN Z J, XUE L, XU Y M, et al. Overview of deep learning [J]. Application Research of Computers, 2012, 29(8): 2806-2810)
[30] DONAHUE J, JIA Y, VINYALS O, et al. DeCAF: a deep convolutional activation feature for generic visual recognition [J]. Computer Science, 2013, 50(1): 815-830.
[31] RAZAVIAN A S, AZIZPOUR H, SULLIVAN J, et al. CNN features off-the-shelf: an astounding baseline for recognition [EB/OL]. [2015-11-22]. http://www.csc.kth.se/~azizpour/papers/ha_cvpr14w.pdf.
[32] SERMANET P, KAVUKCUOGLU K, CHINTALA S, et al. Pedestrian detection with unsupervised multi-stage feature learning [C]// CVPR '13: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2013: 3626-3633.
[33] KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks [C]// CVPR '14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1725-1732.
[34] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1653-1660.
[35] KALCHBRENNER N, GREFENSTETTE E, BLUNSOM P. A convolutional neural network for modelling sentences [EB/OL]. [2016-01-07]. http://anthology.aclweb.org/P/P14/P14-1062.pdf.
[36] KIM Y. Convolutional neural networks for sentence classification [EB/OL]. [2016-01-07]. http://anthology.aclweb.org/D/D14/D14-1181.pdf.
[37] ABDEL-HAMID O, MOHAMMED A, JIANG H, et al. Convolutional neural networks for speech recognition [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014, 22(10): 1533-1545.
[38] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search [J]. Nature, 2016, 529(7587): 484-489.
[39] ZEILER M D, FERGUS R. Stochastic pooling for regularization of deep convolutional neural networks [EB/OL]. [2016-01-11]. http://www.matthewzeiler.com/pubs/iclr2013/iclr2013.pdf.
[40] MURPHY K P. Machine Learning: A Probabilistic Perspective [M]. Cambridge, MA: MIT Press, 2012: 82-92.
[41] CHATFIELD K, SIMONYAN K, VEDALDI A, et al. Return of the devil in the details: delving deep into convolutional nets [EB/OL]. [2016-01-12]. http://www.robots.ox.ac.uk/~vedaldi/assets/pubs/chatfield14return.pdf.
[42] GOODFELLOW I J, WARDE-FARLEY D, MIRZA M, et al. Maxout networks [EB/OL]. [2016-01-12]. http://www-etud.iro.umontreal.ca/~goodfeli/maxout.pdf.
[43] LIN M, CHEN Q, YAN S. Network in network [EB/OL]. [2016-01-12]. http://arxiv.org/pdf/1312.4400v3.pdf.
[44] MONTAVON G, ORR G, MVLLER K R. Neural Networks: Tricks of the Trade [M]. 2nd ed. London: Springer, 2012: 49-131.
[45] BENGIO Y, SIMARD P, FRASCONI P. Learning long-term dependencies with gradient descent is difficult [J]. IEEE Transactions on Neural Networks, 1994, 5(2): 157-166.
[46] HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 1026-1034.
[47] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaption of feature detectors [R/OL]. [2015-10-26]. http://arxiv.org/pdf/1207.0580v1.pdf.
[48] WAN L, ZEILER M, ZHANG S, et al. Regularization of neural networks using dropconnect [C]// Proceedings of the 2013 International Conference on Machine Learning. New York: ACM Press, 2013: 1058-1066.
[49] HE K, SUN J. Convolutional neural networks at constrained time cost [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 5353-5360.
[50] SPRINGENBERG J T, DOSOVITSKIY A, BROX T, et al. Striving for simplicity: the all convolutional net [EB/OL]. [2015-12-24]. http://arxiv.org/pdf/1412.6806.pdf.
[51] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE [EB/OL]. [2015-12-24]. http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf.
[52] OLIVA A, TORRALBA A. Modeling the shape of the scene: a holistic representation of the spatial envelope [J]. International Journal of Computer Vision, 2001, 42(3): 145-175.
[53] WANG J, YANG J, YU K. Locality-constrained linear coding for image classification [C]// Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2010: 3360-3367.
[54] ZEILER M D, TAYLOR G W, FERGUS R. Adaptive deconvolutional networks for mid and high level feature learning [C]// ICCV '11: Proceedings of the 2011 International Conference on Computer Vision. Piscataway, NJ: IEEE, 2011: 2018-2025.
[55] NGUYEN A, YOSINSKI J, CLUNE J, et al. Deep neural networks are easily fooled: high confidence predictions for unrecognizable images [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 427-436.
[56] FLOREANO D, MATTIUSSI C. Bio-inspired Artificial Intelligence: Theories Methods and Technologies [M]. Cambridge, MA: MIT Press, 2008: 1-97.
[57] 庄福振,罗平,何清,等.迁移学习研究进展[J].软件学报,2015,26(1):26-39.(ZHUANG F Z, LUO P, HE Q, et al. Survey on transfer learning research [J]. Journal of Software, 2015, 26(1): 26-39.)
[58] LI F, FERGUS R, PERONA P. One-shot learning of object categories [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4):594-611.
[59] GRIFFIN B G, HOLUB A, PERONA P. The Caltech-256 [R/OL]. [2016-01-03]. http://xueshu.baidu.com/s?wd=paperuri%3A%28699092e99ad6f96f8696507d539a51c8%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Fciteseer.ist.psu.edu%2Fshowciting%3Fcid%3D11093943&ie=utf-8&sc_us=16824823650146432853.
[60] ZHOU B, LAPEDRIZA A, XIAO J, et al. Learning deep features for scene recognition using places database [C]// Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press. 2014:487-495.
[61] LOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [EB/OL]. [2016-01-06]. http://jmlr.org/proceedings/papers/v37/ioffe15.pdf.
[62] GIRSHICK R B. Fast R-CNN [EB/OL]. [2016-01-06]. http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf.
[63] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [EB/OL]. [2016-01-06]. http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf.
[64] UIJLINGS J, SANDE K, GEVERS T, et al. Selective search for object recognition [J]. International Journal of Computer Vision, 2013, 104(2): 154-171.
[65] KHAN S H, BENNAMOUN M, SOHEL F, et al. Automatic feature learning for robust shadow detection [C]// CVPR'14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1939-1946.
[66] TAIGMAN Y, YANG M, RANZATO M, et al. DeepFace: closing the gap to human-level performance in face verification [C]// CVPR'14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1701-1708.
[67] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: a unified embedding for face recognition and clustering [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 815-823.
[68] LEVI G, HASSNER T. Age and gender classification using convolutional neural networks [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Washington, DC: IEEE Computer Society, 2015: 34-42.