《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (4): 1044-1049.DOI: 10.11772/j.issn.1001-9081.2021071273
所属专题: CCF第36届中国计算机应用大会 (CCF NCCA 2021)
• CCF第36届中国计算机应用大会 (CCF NCCA 2021) • 上一篇 下一篇
收稿日期:
2021-07-14
修回日期:
2021-08-18
接受日期:
2021-08-27
发布日期:
2022-04-15
出版日期:
2022-04-10
通讯作者:
汪祖民
作者简介:
季长清(1980—),男,辽宁庄河人,副教授,博士,CCF会员,主要研究方向:人工智能、大数据分析、空间数据库、智慧医疗基金资助:
Changqing JI1,2, Zhiyong GAO2, Jing QIN3, Zumin WANG2()
Received:
2021-07-14
Revised:
2021-08-18
Accepted:
2021-08-27
Online:
2022-04-15
Published:
2022-04-10
Contact:
Zumin WANG
About author:
JI Changqing, born in 1980, Ph. D., associate professor. His research interests include artificial intelligence, big data analysis, spatial data base, smart healthcare.Supported by:
摘要:
卷积神经网络(CNN)是目前基于深度学习的计算机视觉领域中重要的研究方向之一。它在图像分类和分割、目标检测等的应用中表现出色,其强大的特征学习与特征表达能力越来越受到研究者的推崇。然而,CNN仍存在特征提取不完整、样本训练过拟合等问题。针对这些问题,介绍了CNN的发展、CNN经典的网络模型及其组件,并提供了解决上述问题的方法。通过对CNN模型在图像分类中研究现状的综述,为CNN的进一步发展及研究方向提供了建议。
中图分类号:
季长清, 高志勇, 秦静, 汪祖民. 基于卷积神经网络的图像分类算法综述[J]. 计算机应用, 2022, 42(4): 1044-1049.
Changqing JI, Zhiyong GAO, Jing QIN, Zumin WANG. Review of image classification algorithms based on convolutional neural network[J]. Journal of Computer Applications, 2022, 42(4): 1044-1049.
1 | HUANG B, HE B Y, WU L N, et al. A deep learning approach to detecting ships from high-resolution aerial remote sensing images[J]. Journal of Coastal Research, 2020, 111(SI): 16-20. 10.2112/jcr-si111-003.1 |
2 | LI X F, LIU B, ZHENG G, et al. Deep-learning-based information mining from ocean remote-sensing imagery[J]. National Science Review, 2020, 7(10): 1584-1605. 10.1093/nsr/nwaa047 |
3 | 谢志华,江鹏,余新河,等. 基于VGGNet和多谱带循环网络的高光谱人脸识别系统[J]. 计算机应用, 2019, 39(2):388-391. 10.11772/j.issn.1001-9081.2018081788 |
XIE Z H, JIANG P, YU X H, et al. Hyperspectral face recognition system based on VGGNet and multi-band recurrent network[J]. Journal of Computer Applications, 2019, 39(2):388-391. 10.11772/j.issn.1001-9081.2018081788 | |
4 | FU K S, ROSENFELD. Pattern recognition and image processing[J]. IEEE Transactions on Computers, 1976, C-25(12): 1336-1346. 10.1109/tc.1976.1674602 |
5 | RUCK D W, ROGERS S K, KABRISKY M. Feature selection using a multilayer perceptron[J]. Journal of Neural Network Computing, 1990, 2(2): 40-48. 10.1109/ijcnn.1990.137802 |
6 | HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. 10.1126/science.1127647 |
7 | NIU X X, SUEN C Y. A novel hybrid CNN-SVM classifier for recognizing handwritten digits[J]. Pattern Recognition, 2012, 45(4): 1318-1325. 10.1016/j.patcog.2011.09.021 |
8 | RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088): 533-536. 10.1038/323533a0 |
9 | HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. 10.1109/tpami.2015.2389824 |
10 | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2012: 1097-1105. |
11 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10) [2021-06-20].. 10.5244/c.28.6 |
12 | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1-9. 10.1109/cvpr.2015.7298594 |
13 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
14 | HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17) [2021-06-20].. 10.1109/cvpr.2018.00286 |
15 | ZHANG L, WANG X S, YANG D, et al. Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation[J]. IEEE Transactions on Medical Imaging, 2020, 39(7): 2531-2540. 10.1109/tmi.2020.2973595 |
16 | 孔令军,王茜雯,包云超,等. 基于深度学习的医疗图像分割综述[J].无线电通信技术, 2021, 47(2):121-130. 10.3969/j.issn.1003-3114.2021.02.001 |
KONG L J, WANG Q W, BAO Y C, et al. A survey on medical image segmentation based on deep learning[J]. Radio Communications Technology, 2021, 47(2):121-130. 10.3969/j.issn.1003-3114.2021.02.001 | |
17 | 田锦,袁家政,刘宏哲. 基于实例分割的车道线检测及自适应拟合算法[J]. 计算机应用, 2020, 40(7):1932-1937. 10.1109/cvidl51233.2020.00-92 |
TIAN J, YUAN J Z, LIU H Z. Instance segmentation based lane line detection and adaptive fitting algorithm[J]. Journal of Computer Applications, 2020, 40(7):1932-1937. 10.1109/cvidl51233.2020.00-92 | |
18 | 樊玮,刘挺,黄睿,等. 卷积神经网络低层特征辅助的图像实例分割方法[J]. 计算机科学, 2020, 47(11):186-191. 10.11896/jsjkx.191200063 |
FAN W, LIU T, HUANG R, et al. Low-level CNN feature aided image instance segmentation[J]. Computer Science, 2020, 47(11):186-191. 10.11896/jsjkx.191200063 | |
19 | LE Q V, NGIAM J Q, CHEN Z H, et al. Tiled convolutional neural networks[C]// Proceedings of the 23rd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2010: 1279-1287. |
20 | ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks[C]// Proceedings of the 2014 European Conference on Computer Vision, LNCS 8689. Cham: Springer, 2014: 818-833. |
21 | YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[EB/OL]. (2016-04-30) [2021-06-20].. 10.4236/psych.2020.1110096 |
22 | NOH H, HONG S, HAN B. Learning deconvolution network for semantic segmentation[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1520-1528. 10.1109/iccv.2015.178 |
23 | CHEN R, WANG M L, LAI Y. Analysis of the role and robustness of artificial intelligence in commodity image recognition under deep learning neural network[J]. PLoS ONE, 2020, 15(7): No.e0235783. 10.1371/journal.pone.0235783 |
24 | FUKUSHIMA K, MIYAKE S. Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position[J]. Pattern Recognition, 1982, 15(6): 455-469. 10.1016/0031-3203(82)90024-3 |
25 | ZHANG J M, BARGAL S A, LIN Z, et al. Top-down neural attention by excitation backprop[J]. International Journal of Computer Vision, 2018, 126(10): 1084-1102. 10.1007/s11263-017-1059-x |
26 | McCULLOCH W S, PITTS W. A logical calculus of the ideas immanent in nervous activity[J]. The Bulletin of Mathematical Biophysics, 1943, 5(4): 115-133. 10.1007/bf02478259 |
27 | XU B, WANG N Y, CHEN T Q, et al. Empirical evaluation of rectified activations in convolutional network[EB/OL]. (2015-11-27) [2021-06-20].. |
28 | CLEVERT D A, UNTERTHINER T, HOCHREITER S. Fast and accurate deep network learning by Exponential Linear Units (ELUs)[EB/OL]. (2016-02-22) [2021-06-20].. |
29 | MAAS A L, HANNUN A Y, NG A Y. Rectifier nonlinearities improve neural network acoustic models[C/OL]// Proceedings of the 30th International Conference on Machine Learning. [2021-06-20].. |
30 | IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the 2015 International Conference on Machine Learning. New York: JMLR.org, 2015: 448-456. |
31 | GRAHAM B. Fractional max-pooling[EB/OL]. (2015-05-12) [2021-06-20].. |
32 | ZHAI S F, WU H, KUMAR A, et al. S3Pool: pooling with stochastic spatial sampling[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4003-4033. |
33 | KIM K J, KIM P K, CHUNG Y S, et al. Performance enhancement of YOLOv3 by adding prediction layers with spatial pyramid pooling for vehicle detection[C]// Proceedings of the 15th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway:IEEE, 2018: 1-6. 10.1109/avss.2018.8639438 |
34 | RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. 10.1007/s11263-015-0816-y |
35 | LeCUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. 10.1109/5.726791 |
36 | SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2818-2826. 10.1109/cvpr.2016.308 |
37 | SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, Inception-ResNet and the impact of residual connections on learning[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2017:4278-4284. |
38 | SANDLER M, HOWARD A, ZHU M, et al. MobileNetv2: inverted residuals and linear bottlenecks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4510-4520. 10.1109/cvpr.2018.00474 |
39 | HOWARD A, SANDLER M, CHU G, et al. Searching for MobileNetv3[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1314-1324. 10.1109/iccv.2019.00140 |
40 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017:6000-6010. 10.1016/s0262-4079(17)32358-8 |
41 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2021-06-03) [2021-06-20].. |
42 | TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 10347-10357. 10.1109/iccv48922.2021.00091 |
43 | WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[EB/OL]. (2021-02-24) [2021-06-20] . 10.1109/iccv48922.2021.00061 |
44 | LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[EB/OL]. (2021-03-25) [2021-06-20].. 10.1109/iccv48922.2021.00986 |
[1] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[2] | 王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918. |
[3] | 潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877. |
[4] | 李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910. |
[5] | 黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969. |
[6] | 杨鑫, 陈雪妮, 吴春江, 周世杰. 结合变种残差模型和Transformer的城市公路短时交通流预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2947-2951. |
[7] | 李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703. |
[8] | 张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371. |
[9] | 付帅, 郭小英, 白茹意, 闫涛, 陈斌. 改进的CloFormer模型与有序回归相结合的年龄评估方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2372-2380. |
[10] | 陈彤, 杨丰玉, 熊宇, 严荭, 邱福星. 基于多尺度频率通道注意力融合的声纹库构建方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2407-2413. |
[11] | 刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557. |
[12] | 顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625. |
[13] | 石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650. |
[14] | 赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429. |
[15] | 陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||