基于深度卷积神经网络的物体识别算法

doi:10.11772/j.issn.1001-9081.2016.12.3333

计算机应用 ›› 2016, Vol. 36 ›› Issue (12): 3333-3340.DOI: 10.11772/j.issn.1001-9081.2016.12.3333

基于深度卷积神经网络的物体识别算法

黄斌, 卢金金, 王建华, 吴星明, 陈伟海

北京航空航天大学自动化科学与电气工程学院, 北京 100191

收稿日期:2016-04-26 修回日期:2016-07-11 发布日期:2016-12-08 出版日期:2016-12-10
通讯作者: 陈伟海
作者简介:黄斌(1989-),男,安徽安庆人,博士研究生,主要研究方向:计算机视觉、深度学习、机器学习;卢金金(1990-),女,安徽安庆人,硕士研究生,主要研究方向:计算机视觉、深度学习、机器学习;王建华(1962-),男,北京人,副教授,博士,主要研究方向:智能机器人检测和控制、计算机视觉;吴星明(1962-),男,北京人,副教授,博士,主要研究方向:智能机器人检测和控制、计算机视觉;陈伟海(1955-),男,浙江象山人,教授,博士,主要研究方向:康复机器人、计算机视觉。
基金资助:
国家自然科学基金资助项目（61573048）；重点国际（地区）合作研究项目（61620106012）；国家国际科技合作专项（2015DFG12650）。

Object recognition algorithm based on deep convolution neural networks

HUANG Bin, LU Jinjin, WANG Jianhua, WU Xingming, CHEN Weihai

School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China

Received:2016-04-26 Revised:2016-07-11 Online:2016-12-08 Published:2016-12-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61573048), the Major International (Regional) Joint Research Project (61620106012), the International Scientific and Technological Cooperation Projects of China under Grant (2015DFG12650).

摘要/Abstract

摘要： 针对传统物体识别算法中人工设计出来的特征易受物体形态多样性、光照和背景的影响，提出了一种基于深度卷神经网络的物体识别算法。该算法基于NYU Depth V2场景数据库，首先将单通道深度信息转换为三通道；再用训练集中的彩色图片和转换后的三通道深度图片分别微调两个深度卷积神经网络模型；然后用训练好的模型对重采样训练集中的彩色和深度图片提取模型第一个全连接层的特征，并将两种模态的特征串联起来，训练线性支持向量机（LinSVM）；最后将所提算法应用到场景理解任务中的超像素特征提取。所提方法在测试集上的物体分类准确度可达到91.4%，比SAE-RNN方法提高4.1个百分点。实验结果表明所提方法可提取彩色和深度图片高层特征，有效提高物体分类准确度。

关键词: 计算机视觉, 卷积神经网络, 特征提取, 线性支持向量机, 物体识别, 场景理解

Abstract: Focused on the problem of traditional object recognition algorithm that the artificially designed features were more susceptible to diversity of object shapes, illumination and background, a deep convolutional neural network algorithm was proposed for object recognition. Firstly, this algorithm was trained with NYU Depth V2 dataset, and single depth information was transformed into three channels. Then color images and transformed depth images in the training set were used to fine-tune two deep convolutional neural networks, respectively. Next, color and depth image features were extracted from the first fully connected layers of the two trained models, and the two features from the resampling training set were combined to train a Linear Support Vector Machine (LinSVM) classifier. Finally, the proposed object recognition algorithm was used to extract super-pixel features in scene understanding task. The proposed method can achieve a classification accuracy of 91.4% on the test set which is 4.1 percentage points higher than SAE-RNN (Sparse Auto-Encoder with the Recursive Neural Networks). The experimental results show that the proposed method is effective in extracting color and depth image features, and can effectively improve classification accuracy.

Key words: computer vision, Convolutional Neural Network (CNN), feature extraction, Linear Support Vector Machine (LinSVM), object recognition, scene understanding

中图分类号:

TP391.4

黄斌, 卢金金, 王建华, 吴星明, 陈伟海. 基于深度卷积神经网络的物体识别算法[J]. 计算机应用, 2016, 36(12): 3333-3340.

HUANG Bin, LU Jinjin, WANG Jianhua, WU Xingming, CHEN Weihai. Object recognition algorithm based on deep convolution neural networks[J]. Journal of Computer Applications, 2016, 36(12): 3333-3340.

参考文献

[1] CHATFIELD K, SIMONYAN K, VEDALDI A, et al. Return of the devil in the details:Delving deep into convolutional nets[EB/OL].[2016-01-20]. http://arxiv.org/pdf/1405.3531v4.pdf.
[2] LIN M, CHEN Q, YAN S. Network in network[EB/OL].[2016-01-20]. http://arxiv.org/pdf/1312.4400v3.pdf.
[3] RAZAVIAN A S, AZIZPOUR H, SULLIVAN J, et al. CNN features off-the-shelf:an astounding baseline for recognition[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Washington, DC:IEEE Computer Society, 2014:512-519.
[4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2014:580-587.
[5] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//Proceedings of the 13th European Conference on Computer Vision. Berlin:Springer, 2014:346-361.
[6] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2015:1440-1448.
[7] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, PP(99):1-9.
[8] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once:unified, real-time object detection[EB/OL].[2016-01-20]. http://ai2-website.s3.amazonaws.com/publications/YOLO.pdf.
[9] LAI K, BO L, REN X, et al. A large-scale hierarchical multi-view RGB-D object dataset[C]//Proceedings of the 28th IEEE International Conference on Robotics and Automation. Piscataway, NJ:IEEE, 2011:1817-1824.
[10] BO L, REN X, FOX D. Depth kernel descriptors for object recognition[C]//Proceedings of the 23th IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ:IEEE, 2011:821-826.
[11] BO L, REN X, FOX D. Unsupervised feature learning for RGB-D based object recognition[C]//Proceedings of the 13th International Symposium on Experimental Robotics. Berlin:Springer, 2013:387-402.
[12] SOCHER R, HUVAL B, BATH B, et al. Convolutional-recursive deep learning for 3D object classification[EB/OL].[2016-01-20]. http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2012_0304.pdf.
[13] BAI J, WU Y. SAE-RNN deep learning for RGB-D based object recognition[C]//Proceedings of the 10th International Conference on Intelligent Computing Theory. Berlin:Springer, 2014:235-240.
[14] LIANG M, HU X. Recurrent convolutional neural network for object recognition[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2015:3367-3375.
[15] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J/OL]. ArXiv.[2016-01-20]. http://www.robots.ox.ac.uk:5000/~vgg/publications/2015/Simonyan15/simonyan15.pdf
[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25. Piscataway, NJ:IEEE:1106-1114.
[17] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. Computer Science, 2012, 3(4):212-223.
[18] JIA Y, SHELHAMER E, DONAHUE J, et al. Caffe:convolutional architecture for fast feature embedding[C]//Proceedings of the 22th ACM International Conference on Multimedia. New York:ACM, 2014:675-678.
[19] GUPTA S, GIRSHICK R, ARBELÁEZ P, et al. Learning rich features from RGB-D images for object detection and segmentation[C]//Proceedings of the 15th European Conference on Computer Vision. Berlin:Springer, 2014:345-360.
[20] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE International Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2014:580-587.
[21] SILBERMAN, N, HOIEM, D, KOHLI, P, et al. Indoor segmentation and support inference from RGB-D images[C]//Proceedings of the 13th European Conference on Computer Vision. Berlin:Springer, 2012:746-760
[22] REN X, BO L, FOX D. RGB-(D) scene labeling:features and algorithms[C]//Proceedings of the 2012 IEEE International Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2012:2759-2766.
[23] ARBELÁEZ P, MAIRE M, FOWLKES C, et al. Contour detection and hierarchical image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(5):898-916.
[24] GUPTA S, ARBELAEZ P, MALIK J. Perceptual organization and recognition of indoor scenes from RGB-D images[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2013:564-571.
[25] BLUM M, SPRINGENBERG J T, WVLFING J, et al. A learned feature descriptor for object recognition in RGB-D data[C]//Proceedings of the 2012 IEEE International Conference on Robotics and Automation. Piscataway, NJ:IEEE, 2012:1298-1303.

基于深度卷积神经网络的物体识别算法

Object recognition algorithm based on deep convolution neural networks

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	杨鑫, 陈雪妮, 吴春江, 周世杰. 结合变种残差模型和Transformer的城市公路短时交通流预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2947-2951.
[2]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[3]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[4]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[5]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[6]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[7]	付帅, 郭小英, 白茹意, 闫涛, 陈斌. 改进的CloFormer模型与有序回归相结合的年龄评估方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2372-2380.
[8]	陈彤, 杨丰玉, 熊宇, 严荭, 邱福星. 基于多尺度频率通道注意力融合的声纹库构建方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2407-2413.
[9]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[10]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[11]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[12]	龙伍丹, 彭博, 胡节, 申颖, 丁丹妮. 基于加强特征提取的道路病害检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2264-2270.
[13]	刘瑞华, 郝子赫, 邹洋杨. 基于多层级精细特征融合的步态识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2250-2257.
[14]	施赛龙, 方智文. 基于多尺度聚合和共享注意力的注视估计模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2047-2054.
[15]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.