计算机应用 ›› 2016, Vol. 36 ›› Issue (9): 2508-2515.DOI: 10.11772/j.issn.1001-9081.2016.09.2508
李彦冬, 郝宗波, 雷航
收稿日期:
2016-03-30
修回日期:
2016-04-20
发布日期:
2016-09-08
出版日期:
2016-09-10
通讯作者:
李彦冬
作者简介:
李彦冬(1984-),男,四川泸州人,博士研究生,主要研究方向:机器学习、计算机视觉;郝宗波(1977-),男,河南新乡人,副教授,博士,主要研究方向:图像理解、视频信息处理;雷航(1960-),男,四川自贡人,教授,博士,主要研究方向:图像处理。
基金资助:
LI Yandong, HAO Zongbo, LEI Hang
Received:
2016-03-30
Revised:
2016-04-20
Online:
2016-09-08
Published:
2016-09-10
Supported by:
摘要: 近年来,卷积神经网络在图像分类、目标检测、图像语义分割等领域取得了一系列突破性的研究成果,其强大的特征学习与分类能力引起了广泛的关注,具有重要的分析与研究价值。首先回顾了卷积神经网络的发展历史,介绍了卷积神经网络的基本结构和运行原理,重点针对网络过拟合、网络结构、迁移学习、原理分析四个方面对卷积神经网络在近期的研究进行了归纳与分析,总结并讨论了基于卷积神经网络的相关应用领域取得的最新研究成果,最后指出了卷积神经网络目前存在的不足以及未来的发展方向。
中图分类号:
李彦冬, 郝宗波, 雷航. 卷积神经网络研究综述[J]. 计算机应用, 2016, 36(9): 2508-2515.
LI Yandong, HAO Zongbo, LEI Hang. Survey of convolutional neural network[J]. Journal of Computer Applications, 2016, 36(9): 2508-2515.
[1] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [2] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets [J]. Neural Computation, 2006, 18(7): 1527-1554. [3] LEE H, GROSSE R, RANGANATH R, et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations [C]// ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. New York: ACM, 2009: 609-616. [4] HUANG G B, LEE H, ERIK G. Learning hierarchical representations for face verification with convolutional deep belief networks [C]// CVPR '12: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2012: 2518-2525. [5] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2012: 1106-1114. [6] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 580-587. [7] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 3431-3440. [8] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2015-11-04]. http://www.robots.ox.ac.uk:5000/~vgg/publications/2015/Simonyan15/simonyan15.pdf. [9] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 1-8. [10] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [EB/OL]. [2016-01-04]. https://www.researchgate.net/publication/286512696_Deep_Residual_Learning_for_Image_Recognition. [11] PAN S J, YANG Q. A survey on transfer learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359. [12] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch [J]. Journal of Machine Learning Research, 2011, 12(1): 2493-2537. [13] OQUAB M, BOTTOU L, LAPTEV I, et al. Learning and transferring mid-level image representations using convolutional neural networks [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1717-1724. [14] HUBEL D H, WIESEL T N. Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex [J]. Journal of Physiology, 1962, 160(1): 106-154. [15] FUKUSHIMA K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position [J]. Biological Cybernetics, 1980, 36(4): 193-202. [16] WAIBEL A, HANAZAWA T, HINTON G, et al. Phoneme recognition using time-delay neural networks [M]// Readings in Speech Recognition. Amsterdam: Elsvier, 1990: 393-404. [17] VAILLANT R, MONROCQ C, LE CUN Y. Original approach for the localization of objects in images [J]. IEE Proceedings—Vision, Image and Signal Processing, 1994, 141(4): 245-250. [18] LAWRENCE S, GILES C L, TSOI A C, et al. Face recognition: a convolutional neural-network approach [J]. IEEE Transactions on Neural Networks, 1997, 8(1): 98-113. [19] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database [C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2009:248-255. [20] DONAHUE J, HENDRICKS L A, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 2625-2634. [21] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 3156-3164. [22] MALINOWSKI M, ROHRBACH M, FRITZ M. Ask your neurons: a neural-based approach to answering questions about images [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 1-9. [23] ANTOL S, AGRAWAL A, LU J, et al. VQA: visual question answering [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 2425-2433. [24] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks [C]// Proceedings of European Conference on Computer Vision, LNCS 8689. Berlin: Springer, 2014: 818-833. [25] JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221-231. [26] LOWE D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110. [27] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection [C]// Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2005: 886-893. [28] LECUN Y, BENGIO Y, HINTON G E. Deep learning [J]. Nature, 2015, 521(7553): 436-444. [29] 孙志军,薛磊,许阳明,等.深度学习研究综述[J].计算机应用研究,2012,29(8):2806-2810.(SUN Z J, XUE L, XU Y M, et al. Overview of deep learning [J]. Application Research of Computers, 2012, 29(8): 2806-2810) [30] DONAHUE J, JIA Y, VINYALS O, et al. DeCAF: a deep convolutional activation feature for generic visual recognition [J]. Computer Science, 2013, 50(1): 815-830. [31] RAZAVIAN A S, AZIZPOUR H, SULLIVAN J, et al. CNN features off-the-shelf: an astounding baseline for recognition [EB/OL]. [2015-11-22]. http://www.csc.kth.se/~azizpour/papers/ha_cvpr14w.pdf. [32] SERMANET P, KAVUKCUOGLU K, CHINTALA S, et al. Pedestrian detection with unsupervised multi-stage feature learning [C]// CVPR '13: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2013: 3626-3633. [33] KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks [C]// CVPR '14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1725-1732. [34] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1653-1660. [35] KALCHBRENNER N, GREFENSTETTE E, BLUNSOM P. A convolutional neural network for modelling sentences [EB/OL]. [2016-01-07]. http://anthology.aclweb.org/P/P14/P14-1062.pdf. [36] KIM Y. Convolutional neural networks for sentence classification [EB/OL]. [2016-01-07]. http://anthology.aclweb.org/D/D14/D14-1181.pdf. [37] ABDEL-HAMID O, MOHAMMED A, JIANG H, et al. Convolutional neural networks for speech recognition [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014, 22(10): 1533-1545. [38] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search [J]. Nature, 2016, 529(7587): 484-489. [39] ZEILER M D, FERGUS R. Stochastic pooling for regularization of deep convolutional neural networks [EB/OL]. [2016-01-11]. http://www.matthewzeiler.com/pubs/iclr2013/iclr2013.pdf. [40] MURPHY K P. Machine Learning: A Probabilistic Perspective [M]. Cambridge, MA: MIT Press, 2012: 82-92. [41] CHATFIELD K, SIMONYAN K, VEDALDI A, et al. Return of the devil in the details: delving deep into convolutional nets [EB/OL]. [2016-01-12]. http://www.robots.ox.ac.uk/~vedaldi/assets/pubs/chatfield14return.pdf. [42] GOODFELLOW I J, WARDE-FARLEY D, MIRZA M, et al. Maxout networks [EB/OL]. [2016-01-12]. http://www-etud.iro.umontreal.ca/~goodfeli/maxout.pdf. [43] LIN M, CHEN Q, YAN S. Network in network [EB/OL]. [2016-01-12]. http://arxiv.org/pdf/1312.4400v3.pdf. [44] MONTAVON G, ORR G, MVLLER K R. Neural Networks: Tricks of the Trade [M]. 2nd ed. London: Springer, 2012: 49-131. [45] BENGIO Y, SIMARD P, FRASCONI P. Learning long-term dependencies with gradient descent is difficult [J]. IEEE Transactions on Neural Networks, 1994, 5(2): 157-166. [46] HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 1026-1034. [47] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaption of feature detectors [R/OL]. [2015-10-26]. http://arxiv.org/pdf/1207.0580v1.pdf. [48] WAN L, ZEILER M, ZHANG S, et al. Regularization of neural networks using dropconnect [C]// Proceedings of the 2013 International Conference on Machine Learning. New York: ACM Press, 2013: 1058-1066. [49] HE K, SUN J. Convolutional neural networks at constrained time cost [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 5353-5360. [50] SPRINGENBERG J T, DOSOVITSKIY A, BROX T, et al. Striving for simplicity: the all convolutional net [EB/OL]. [2015-12-24]. http://arxiv.org/pdf/1412.6806.pdf. [51] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE [EB/OL]. [2015-12-24]. http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf. [52] OLIVA A, TORRALBA A. Modeling the shape of the scene: a holistic representation of the spatial envelope [J]. International Journal of Computer Vision, 2001, 42(3): 145-175. [53] WANG J, YANG J, YU K. Locality-constrained linear coding for image classification [C]// Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2010: 3360-3367. [54] ZEILER M D, TAYLOR G W, FERGUS R. Adaptive deconvolutional networks for mid and high level feature learning [C]// ICCV '11: Proceedings of the 2011 International Conference on Computer Vision. Piscataway, NJ: IEEE, 2011: 2018-2025. [55] NGUYEN A, YOSINSKI J, CLUNE J, et al. Deep neural networks are easily fooled: high confidence predictions for unrecognizable images [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 427-436. [56] FLOREANO D, MATTIUSSI C. Bio-inspired Artificial Intelligence: Theories Methods and Technologies [M]. Cambridge, MA: MIT Press, 2008: 1-97. [57] 庄福振,罗平,何清,等.迁移学习研究进展[J].软件学报,2015,26(1):26-39.(ZHUANG F Z, LUO P, HE Q, et al. Survey on transfer learning research [J]. Journal of Software, 2015, 26(1): 26-39.) [58] LI F, FERGUS R, PERONA P. One-shot learning of object categories [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4):594-611. [59] GRIFFIN B G, HOLUB A, PERONA P. The Caltech-256 [R/OL]. [2016-01-03]. http://xueshu.baidu.com/s?wd=paperuri%3A%28699092e99ad6f96f8696507d539a51c8%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Fciteseer.ist.psu.edu%2Fshowciting%3Fcid%3D11093943&ie=utf-8&sc_us=16824823650146432853. [60] ZHOU B, LAPEDRIZA A, XIAO J, et al. Learning deep features for scene recognition using places database [C]// Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press. 2014:487-495. [61] LOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [EB/OL]. [2016-01-06]. http://jmlr.org/proceedings/papers/v37/ioffe15.pdf. [62] GIRSHICK R B. Fast R-CNN [EB/OL]. [2016-01-06]. http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf. [63] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [EB/OL]. [2016-01-06]. http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf. [64] UIJLINGS J, SANDE K, GEVERS T, et al. Selective search for object recognition [J]. International Journal of Computer Vision, 2013, 104(2): 154-171. [65] KHAN S H, BENNAMOUN M, SOHEL F, et al. Automatic feature learning for robust shadow detection [C]// CVPR'14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1939-1946. [66] TAIGMAN Y, YANG M, RANZATO M, et al. DeepFace: closing the gap to human-level performance in face verification [C]// CVPR'14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1701-1708. [67] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: a unified embedding for face recognition and clustering [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 815-823. [68] LEVI G, HASSNER T. Age and gender classification using convolutional neural networks [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Washington, DC: IEEE Computer Society, 2015: 34-42. |
[1] | 黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969. |
[2] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[3] | 王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918. |
[4] | 方介泼, 陶重犇. 应对零日攻击的混合车联网入侵检测系统[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2763-2769. |
[5] | 杨航, 李汪根, 张根生, 王志格, 开新. 基于图神经网络的多层信息交互融合算法用于会话推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2719-2725. |
[6] | 王娜, 蒋林, 李远成, 朱筠. 基于图形重写和融合探索的张量虚拟机算符融合优化[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2802-2809. |
[7] | 潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877. |
[8] | 李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910. |
[9] | 唐廷杰, 黄佳进, 秦进. 基于图辅助学习的会话推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2711-2718. |
[10] | 张睿, 张鹏云, 高美蓉. 自优化双模态多通路非深度前庭神经鞘瘤识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2975-2982. |
[11] | 杨兴耀, 陈羽, 于炯, 张祖莲, 陈嘉颖, 王东晓. 结合自我特征和对比学习的推荐模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2704-2710. |
[12] | 李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703. |
[13] | 姚光磊, 熊菊霞, 杨国武. 基于神经网络优化的花朵授粉算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2829-2837. |
[14] | 黄颖, 杨佳宇, 金家昊, 万邦睿. 用于RGBT跟踪的孪生混合信息融合算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2878-2885. |
[15] | 杜郁, 朱焱. 构建预训练动态图神经网络预测学术合作行为消失[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2726-2731. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||