基于卷积神经网络的图像分类算法综述

doi:10.11772/j.issn.1001-9081.2021071273

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (4): 1044-1049.DOI: 10.11772/j.issn.1001-9081.2021071273

所属专题： CCF第36届中国计算机应用大会 (CCF NCCA 2021)

• CCF第36届中国计算机应用大会 (CCF NCCA 2021) • 上一篇下一篇

基于卷积神经网络的图像分类算法综述

季长清¹^,², 高志勇², 秦静³, 汪祖民²()

^1.大连大学物理科学与技术学院，辽宁大连 116622
^2.大连大学信息工程学院，辽宁大连 116622
^3.大连大学软件工程学院，辽宁大连 116622

收稿日期:2021-07-14 修回日期:2021-08-18 接受日期:2021-08-27 发布日期:2022-04-15 出版日期:2022-04-10
通讯作者: 汪祖民
作者简介:季长清（1980—），男，辽宁庄河人，副教授，博士，CCF会员，主要研究方向：人工智能、大数据分析、空间数据库、智慧医疗
高志勇（1996—），男，山东聊城人，硕士研究生，CCF会员，主要研究方向：人工智能
秦静（1981—），女，甘肃张掖人，副教授，博士，CCF会员，主要研究方向：信号处理、大数据分析
基金资助:
国家自然科学基金资助项目(62002038)

Review of image classification algorithms based on convolutional neural network

Changqing JI¹^,², Zhiyong GAO², Jing QIN³, Zumin WANG²()

^1.College of Physical Science and Technology，Dalian University，Dalian Liaoning 116622，China
^2.College of Information Engineering，Dalian University，Dalian Liaoning 116622，China
^3.College of Software Engineering，Dalian University，Dalian Liaoning 116622，China

Received:2021-07-14 Revised:2021-08-18 Accepted:2021-08-27 Online:2022-04-15 Published:2022-04-10
Contact: Zumin WANG
About author:JI Changqing， born in 1980， Ph. D.， associate professor. His research interests include artificial intelligence， big data analysis， spatial data base， smart healthcare.
GAO Zhiyong， born in 1996， M. S. candidate. His research interests include artificial intelligence.
QIN Jing， born in 1981， Ph. D.， associate professor. Her research interests include signal processing， big data analysis.
Supported by:
National Natural Science Foundation of China(62002038)

摘要/Abstract

摘要：

卷积神经网络（CNN）是目前基于深度学习的计算机视觉领域中重要的研究方向之一。它在图像分类和分割、目标检测等的应用中表现出色，其强大的特征学习与特征表达能力越来越受到研究者的推崇。然而，CNN仍存在特征提取不完整、样本训练过拟合等问题。针对这些问题，介绍了CNN的发展、CNN经典的网络模型及其组件，并提供了解决上述问题的方法。通过对CNN模型在图像分类中研究现状的综述，为CNN的进一步发展及研究方向提供了建议。

关键词: 深度学习, 卷积神经网络, 图像分类, 特征提取, 过拟合

Abstract:

Convolutional Neural Network （CNN） is one of the important research directions in the field of computer vision based on deep learning at present. It performs well in applications such as image classification and segmentation， target detection. Its powerful feature learning and feature representation capability are admired by researchers increasingly. However， CNN still has problems such as incomplete feature extraction and overfitting of sample training. Aiming at these issues， the development of CNN， classical CNN network models and their components were introduced， and the methods to solve the above issues were provided. By reviewing the current status of research on CNN models in image classification， the suggestions were provided for further development and research directions of CNN.

Key words: deep learning, Convolutional Neural Network (CNN), image classification, feature extraction, overfitting

中图分类号:

TP181

季长清, 高志勇, 秦静, 汪祖民. 基于卷积神经网络的图像分类算法综述[J]. 计算机应用, 2022, 42(4): 1044-1049.

Changqing JI, Zhiyong GAO, Jing QIN, Zumin WANG. Review of image classification algorithms based on convolutional neural network[J]. Journal of Computer Applications, 2022, 42(4): 1044-1049.

图/表 7

图1 深度神经网络模型发展历程

Fig. 1 Development history of deep neural network model

图2 ReLU函数

Fig. 2 ReLU function

图3 LeNet-5模型的结构

Fig. 3 Structure of LeNet-5 model

图4 全连接层与Dropout层原理模型

Fig. 4 Principle model of fully connected layer and Dropout layer

图5 Inception模块概念模型

Fig. 5 Inception module conceptual model

图6 残差模块结构

Fig. 6 Residual module structure

图7 深度可分离卷积层结构

Fig. 7 Deep separable convolution layer structure

参考文献 44

1	HUANG B， HE B Y， WU L N， et al. A deep learning approach to detecting ships from high-resolution aerial remote sensing images［J］. Journal of Coastal Research， 2020， 111（SI）： 16-20. 10.2112/jcr-si111-003.1
2	LI X F， LIU B， ZHENG G， et al. Deep-learning-based information mining from ocean remote-sensing imagery［J］. National Science Review， 2020， 7（10）： 1584-1605. 10.1093/nsr/nwaa047
3	谢志华，江鹏，余新河，等. 基于VGGNet和多谱带循环网络的高光谱人脸识别系统［J］. 计算机应用， 2019， 39（2）：388-391. 10.11772/j.issn.1001-9081.2018081788
	XIE Z H， JIANG P， YU X H， et al. Hyperspectral face recognition system based on VGGNet and multi-band recurrent network［J］. Journal of Computer Applications， 2019， 39（2）：388-391. 10.11772/j.issn.1001-9081.2018081788
4	FU K S， ROSENFELD. Pattern recognition and image processing［J］. IEEE Transactions on Computers， 1976， C-25（12）： 1336-1346. 10.1109/tc.1976.1674602
5	RUCK D W， ROGERS S K， KABRISKY M. Feature selection using a multilayer perceptron［J］. Journal of Neural Network Computing， 1990， 2（2）： 40-48. 10.1109/ijcnn.1990.137802
6	HINTON G E， SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks［J］. Science， 2006， 313（5786）： 504-507. 10.1126/science.1127647
7	NIU X X， SUEN C Y. A novel hybrid CNN-SVM classifier for recognizing handwritten digits［J］. Pattern Recognition， 2012， 45（4）： 1318-1325. 10.1016/j.patcog.2011.09.021
8	RUMELHART D E， HINTON G E， WILLIAMS R J. Learning representations by back-propagating errors［J］. Nature， 1986， 323（6088）： 533-536. 10.1038/323533a0
9	HE K M， ZHANG X Y， REN S Q， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916. 10.1109/tpami.2015.2389824
10	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2012： 1097-1105.
11	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［EB/OL］. （2015-04-10）［2021-06-20］.. 10.5244/c.28.6
12	SZEGEDY C， LIU W， JIA Y Q， et al. Going deeper with convolutions［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1-9. 10.1109/cvpr.2015.7298594
13	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
14	HOWARD A G， ZHU M L， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications［EB/OL］. （2017-04-17）［2021-06-20］.. 10.1109/cvpr.2018.00286
15	ZHANG L， WANG X S， YANG D， et al. Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation［J］. IEEE Transactions on Medical Imaging， 2020， 39（7）： 2531-2540. 10.1109/tmi.2020.2973595
16	孔令军，王茜雯，包云超，等. 基于深度学习的医疗图像分割综述［J］.无线电通信技术， 2021， 47（2）：121-130. 10.3969/j.issn.1003-3114.2021.02.001
	KONG L J， WANG Q W， BAO Y C， et al. A survey on medical image segmentation based on deep learning［J］. Radio Communications Technology， 2021， 47（2）：121-130. 10.3969/j.issn.1003-3114.2021.02.001
17	田锦，袁家政，刘宏哲. 基于实例分割的车道线检测及自适应拟合算法［J］. 计算机应用， 2020， 40（7）：1932-1937. 10.1109/cvidl51233.2020.00-92
	TIAN J， YUAN J Z， LIU H Z. Instance segmentation based lane line detection and adaptive fitting algorithm［J］. Journal of Computer Applications， 2020， 40（7）：1932-1937. 10.1109/cvidl51233.2020.00-92
18	樊玮，刘挺，黄睿，等. 卷积神经网络低层特征辅助的图像实例分割方法［J］. 计算机科学， 2020， 47（11）：186-191. 10.11896/jsjkx.191200063
	FAN W， LIU T， HUANG R， et al. Low-level CNN feature aided image instance segmentation［J］. Computer Science， 2020， 47（11）：186-191. 10.11896/jsjkx.191200063
19	LE Q V， NGIAM J Q， CHEN Z H， et al. Tiled convolutional neural networks［C］// Proceedings of the 23rd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2010： 1279-1287.
20	ZEILER M D， FERGUS R. Visualizing and understanding convolutional networks［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8689. Cham： Springer， 2014： 818-833.
21	YU F， KOLTUN V. Multi-scale context aggregation by dilated convolutions［EB/OL］. （2016-04-30）［2021-06-20］.. 10.4236/psych.2020.1110096
22	NOH H， HONG S， HAN B. Learning deconvolution network for semantic segmentation［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1520-1528. 10.1109/iccv.2015.178
23	CHEN R， WANG M L， LAI Y. Analysis of the role and robustness of artificial intelligence in commodity image recognition under deep learning neural network［J］. PLoS ONE， 2020， 15（7）： No.e0235783. 10.1371/journal.pone.0235783
24	FUKUSHIMA K， MIYAKE S. Neocognitron： a new algorithm for pattern recognition tolerant of deformations and shifts in position［J］. Pattern Recognition， 1982， 15（6）： 455-469. 10.1016/0031-3203(82)90024-3
25	ZHANG J M， BARGAL S A， LIN Z， et al. Top-down neural attention by excitation backprop［J］. International Journal of Computer Vision， 2018， 126（10）： 1084-1102. 10.1007/s11263-017-1059-x
26	McCULLOCH W S， PITTS W. A logical calculus of the ideas immanent in nervous activity［J］. The Bulletin of Mathematical Biophysics， 1943， 5（4）： 115-133. 10.1007/bf02478259
27	XU B， WANG N Y， CHEN T Q， et al. Empirical evaluation of rectified activations in convolutional network［EB/OL］. （2015-11-27）［2021-06-20］..
28	CLEVERT D A， UNTERTHINER T， HOCHREITER S. Fast and accurate deep network learning by Exponential Linear Units （ELUs）［EB/OL］. （2016-02-22）［2021-06-20］..
29	MAAS A L， HANNUN A Y， NG A Y. Rectifier nonlinearities improve neural network acoustic models［C/OL］// Proceedings of the 30th International Conference on Machine Learning. ［2021-06-20］..
30	IOFFE S， SZEGEDY C. Batch normalization： accelerating deep network training by reducing internal covariate shift［C］// Proceedings of the 2015 International Conference on Machine Learning. New York： JMLR.org， 2015： 448-456.
31	GRAHAM B. Fractional max-pooling［EB/OL］. （2015-05-12）［2021-06-20］..
32	ZHAI S F， WU H， KUMAR A， et al. S3Pool： pooling with stochastic spatial sampling［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 4003-4033.
33	KIM K J， KIM P K， CHUNG Y S， et al. Performance enhancement of YOLOv3 by adding prediction layers with spatial pyramid pooling for vehicle detection［C］// Proceedings of the 15th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway：IEEE， 2018： 1-6. 10.1109/avss.2018.8639438
34	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252. 10.1007/s11263-015-0816-y
35	LeCUN Y， BOTTOU L， BENGIO Y， et al. Gradient-based learning applied to document recognition［J］. Proceedings of the IEEE， 1998， 86（11）： 2278-2324. 10.1109/5.726791
36	SZEGEDY C， VANHOUCKE V， IOFFE S， et al. Rethinking the inception architecture for computer vision［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 2818-2826. 10.1109/cvpr.2016.308
37	SZEGEDY C， IOFFE S， VANHOUCKE V， et al. Inception-v4， Inception-ResNet and the impact of residual connections on learning［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017：4278-4284.
38	SANDLER M， HOWARD A， ZHU M， et al. MobileNetv2： inverted residuals and linear bottlenecks［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520. 10.1109/cvpr.2018.00474
39	HOWARD A， SANDLER M， CHU G， et al. Searching for MobileNetv3［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1314-1324. 10.1109/iccv.2019.00140
40	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：6000-6010. 10.1016/s0262-4079(17)32358-8
41	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： transformers for image recognition at scale［EB/OL］. （2021-06-03）［2021-06-20］..
42	TOUVRON H， CORD M， DOUZE M， et al. Training data-efficient image transformers & distillation through attention［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 10347-10357. 10.1109/iccv48922.2021.00091
43	WANG W H， XIE E Z， LI X， et al. Pyramid vision transformer： a versatile backbone for dense prediction without convolutions［EB/OL］. （2021-02-24）［2021-06-20］ . 10.1109/iccv48922.2021.00061
44	LIU Z， LIN Y T， CAO Y， et al. Swin transformer： hierarchical vision transformer using shifted windows［EB/OL］. （2021-03-25）［2021-06-20］.. 10.1109/iccv48922.2021.00986

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[3]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[4]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[5]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[6]	杨鑫, 陈雪妮, 吴春江, 周世杰. 结合变种残差模型和Transformer的城市公路短时交通流预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2947-2951.
[7]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[8]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[9]	付帅, 郭小英, 白茹意, 闫涛, 陈斌. 改进的CloFormer模型与有序回归相结合的年龄评估方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2372-2380.
[10]	陈彤, 杨丰玉, 熊宇, 严荭, 邱福星. 基于多尺度频率通道注意力融合的声纹库构建方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2407-2413.
[11]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[12]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[13]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[14]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[15]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.

基于卷积神经网络的图像分类算法综述

Review of image classification algorithms based on convolutional neural network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 44

相关文章 15

编辑推荐

Metrics