基于改进的Mask R-CNN的行人细粒度检测算法

doi:10.11772/j.issn.1001-9081.2019051051

计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3210-3215.DOI: 10.11772/j.issn.1001-9081.2019051051

• 2019年中国计算机学会人工智能会议(CCFAI2019)论文 • 上一篇下一篇

基于改进的Mask R-CNN的行人细粒度检测算法

朱繁, 王洪元, 张继

常州大学信息科学与工程学院, 江苏常州 213164

收稿日期:2019-05-24 修回日期:2019-06-20 出版日期:2019-11-10 发布日期:2019-09-11
通讯作者: 王洪元
作者简介:朱繁(1994-),女,江苏淮安人,硕士研究生,主要研究方向:计算机视觉;王洪元(1960-),男,江苏常熟人,教授,博士,CCF会员,主要研究方向:计算机视觉;张继(1981-),男,江苏常州人,讲师,硕士,CCF会员,主要研究方向:计算机视觉。
基金资助:
国家自然科学基金资助项目（61572085）。

Fine-grained pedestrian detection algorithm based on improved Mask R-CNN

ZHU Fan, WANG Hongyuan, ZHANG Ji

College of Information Science and Engineering, Changzhou University, Changzhou Jiangsu 213164, China

Received:2019-05-24 Revised:2019-06-20 Online:2019-11-10 Published:2019-09-11
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61572085).

摘要/Abstract

摘要： 针对复杂场景下行人检测效果差的问题，采用基于深度学习的目标检测中领先的研究成果，提出了一种基于改进Mask R-CNN框架的行人检测算法。首先，采用K-means算法对行人数据集的目标框进行聚类得到合适的长宽比，通过增加一组长宽比（2：5）使12种anchors适应图像中行人的尺寸；然后，结合细粒度图像识别技术，实现行人的高定位精度；其次，采用全卷积网络（FCN）分割前景对象，并进行像素预测获得行人的局部掩码（上半身、下半身），实现对行人的细粒度检测；最后，通过学习行人的局部特征获得行人的整体掩码。为了验证改进算法的有效性，将其与当前具有代表性的目标检测方法（如更快速的区域卷积神经网络（Faster R-CNN）、YOLOv2、R-FCN）在同数据集上进行对比。实验结果表明，改进的算法提高了行人检测的速度和精度，并且降低了误检率。

关键词: Mask R-CNN, 行人检测, K-means算法, 细粒度, 全卷积网络

Abstract: Aiming at the problem of poor pedestrian detection effect in complex scenes, a pedestrian detection algorithm based on improved Mask R-CNN framework was proposed with the use of the leading research results in deep learning-based object detection. Firstly, K-means algorithm was used to cluster the object frames of the pedestrian datasets to obtain the appropriate aspect ratio. By adding the set of aspect ratio (2:5), 12 anchors were able to be adapted to the size of the pedestrian in the image. Secondly, combined with the technology of fine-grained image recognition, the high accuracy of pedestrian positioning was realized. Thirdly, the foreground object was segmented by the Full Convolutional Network (FCN), and pixel prediction was performed to obtain the local mask (upper body, lower body) of the pedestrian, so as to achieve the fine-grained detection of pedestrians. Finally, the overall mask of the pedestrian was obtained by learning the local features of the pedestrian. In order to verify the effectiveness of the improved algorithm, the proposed algorithm was compared with the current representative object detection methods (such as Faster Region-based Convolutional Neural Network (Faster R-CNN), YOLOv2 and R-FCN (Region-based Fully Convolutional Network)) on the same dataset. The experimental results show that the improved algorithm increases the speed and accuracy of pedestrian detection and reduces the false positive rate.

Key words: Mask R-CNN (Region with Convolutional Neural Network), pedestrian detection, K-means algorithm, fine-grained, Fully Convolutional Network (FCN)

中图分类号:

TP391.41

朱繁, 王洪元, 张继. 基于改进的Mask R-CNN的行人细粒度检测算法[J]. 计算机应用, 2019, 39(11): 3210-3215.

ZHU Fan, WANG Hongyuan, ZHANG Ji. Fine-grained pedestrian detection algorithm based on improved Mask R-CNN[J]. Journal of Computer Applications, 2019, 39(11): 3210-3215.

参考文献

[1] PAPAGEORGIOU C P, OREN M, POGGIO T. A general framework for object detection[C]//Proceedings of the 6th IEEE International Conference on Computer Vision. Piscatway:IEEE, 1998:555-562.
[2] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2005:886-893.
[3] WANG X Y, HAN T, YAN S C. An HOG-LBP human detector with partial occlusion handling[C]//Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Piscataway:IEEE, 2009:32-39.
[4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(1):142-158.
[5] LOWE D G. Object recognition from local scale-invariant features[C]//Proceedings of the 1999 International Conference on Computer Vision. Piscataway:IEEE, 1999:1150-1157.
[6] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110.
[7] WANG S F, YAN J H, WANG Z G. Improved moving object detection algorithm based on local united feature[J]. Chinese Journal of Scientific Instrument, 2015, 36(10):2241-2248.
[8] VIOLA P A, JONES M J. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2001:511-518.
[9] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2005:886-893.
[10] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2015:1440-1448.
[11] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[12] HE K M, GKIOXARI G, GIRSHICK R, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:2980-2988.
[13] REDMON J, DIVVALA S K, GIRSHICK R, et al. You only look once:unified, real-time object detection[C]//Proceedings of the 29th IEEE Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:779-788.
[14] REDMON J, FARHADI A. YOLO9000:better, faster, stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6517-6525.
[15] REDMON J, FARHADI A. YOLOv3:an incremental improvement[EB/OL].[2019-03-26]. https://arxiv.org/pdf/1804.02767.pdf.
[16] LIU W, ANGUELOV D, ERHAN D, et al. SSD:single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin:Springer, 2016:21-37.
[17] 张中宝, 王洪元, 杨薇. 基于Faster-RCNN的遥感图像飞机检测算法[J]. 南京师大学报(自然科学版), 2018, 41(4):79-86.(ZHANG Z B, WANG H Y, YANG W. Remote sensing image aircraft detection algorithm based on Faster RCNN[J]. Journal of Nanjing Normal University (Natural Science Edition), 2018, 41(4):79-86.)
[18] YANG W, ZHANG J, ZHANG Z B, et al. Research on real-time vehicle detection algorithm based on deep learning[C]//Proceedings of the 2018 Chinese Conference on Pattern Recognition and Computer Vision. Berlin:Springer, 2018:126-127.
[19] YANG W, ZHANG J, WANG H Y, et al. A vehicle real-time detection algorithm based on YOLOv2 framework[C]//Proceedings of the 2018 Real-Time Image and Video Processing. Bellingham, WA:SPIE, 2018:106700N.
[20] PHAM M T, LEFEVRE S. Buried object detection from B-Scan ground penetrating radar data using Faster-RCNN[C]//Proceedings of the 2018 IEEE International Geoscience and Remote Sensing Symposium. Piscataway:IEEE, 2018:6804-6807.
[21] KIM J, BATCHULUUN G, PARK K. Pedestrian detection based on Faster R-CNN in nighttime by fusing deep convolutional features of successive images[J]. Expert Systems with Applications, 2018, 114:15-33.
[22] SCHWEITZER D, AGRAWAL R. Multi-class object detection from aerial images using Mask R-CNN[C]//Proceedings of the 2018 IEEE International Conference on Big Data. Piscataway:IEEE, 2018:3470-3477.
[23] WEI X, XIE C, WU J. Mask-CNN:localizing parts and selecting descriptors for fine-grained bird species categorization[J]. Pattern Recognition, 2018, 76:704-714.
[24] ANGELOVA A, ZHU S H, LIN Y Q. Image segmentation for large-scale subcategory flower recognition[C]//Proceedings of the 2013 IEEE Workshop on Applications of Computer Vision. Piscataway:IEEE, 2013:39-45.
[25] KRAUSE J, STARK M, DENG J, et al. 3D object representations for fine-grained categorization[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops. Washington, DC:IEEE Computer Society, 2013:554-561.
[26] HUANG S, XU Z, TAO D, et al. Part-stacked CNN for fine-grained visual categorization[C]//Proceedings of the 29th IEEE Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:1173-1182.
[27] LIN D, SHEN Y, LU C, et al. Deep LAC:deep localization, alignment and classification for fine-grained recognition[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:1666-1674.
[28] ZHANG Y, WEI X, WU J, et al. Weakly supervised fine-grained categorization with part-based image representation[J]. IEEE Transactions on Image Processing, 2016, 25(4):1713-1725.
[29] XIE G, ZHANG X, YANG W, et al. LG-CNN:from local parts to global discrimination for fine-grained recognition[J]. Pattern Recognition, 2017, 71:118-131.
[30] LEE S, CHAN C, MAYO S J, et al. How deep learning extracts and learns leaf features for plant classification[J]. Pattern Recognition, 2017, 71:1-13.
[31] DAI J, HE K, SUN J. Instance-aware semantic segmentation via multi-task network cascades[C]//Proceedings of the 29th IEEE Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:3150-3158.
[32] LI Y, QI H Z, DAI J, et al. Fully convolutional instance-aware semantic segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:4438-4446.
[33] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:3431-3440.

基于改进的Mask R-CNN的行人细粒度检测算法

Fine-grained pedestrian detection algorithm based on improved Mask R-CNN

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陆鑫伟, 余鹏飞, 李海燕, 李红松, 丁文谦. 基于注意力自身线性融合的弱监督细粒度图像分类算法[J]. 计算机应用, 2021, 41(5): 1319-1325.
[2]	覃俊, 罗一凡, 帖军, 郑禄, 吕伟龙. 基于超列注意力机制的京剧人物识别[J]. 计算机应用, 2021, 41(4): 1027-1034.
[3]	边小勇, 江沛龄, 赵敏, 丁胜, 张晓龙. 基于多分支神经网络模型的弱监督细粒度图像分类方法[J]. 计算机应用, 2020, 40(5): 1295-1300.
[4]	傅泰铭, 陈燕, 李陶深. 基于线性分配的难负样本挖掘度量学习[J]. 计算机应用, 2020, 40(2): 352-357.
[5]	冯涛, 陈斌, 张跃飞. 基于改进的Mask R-CNN的染色体图像分割框架[J]. 计算机应用, 2020, 40(11): 3332-3339.
[6]	程凯, 王妍, 刘剑飞. 基于生成对抗网络的自动细胞核分割半监督学习方法[J]. 计算机应用, 2020, 40(10): 2917-2922.
[7]	陈美云, 王必胜, 曹国, 梁永博. 基于像素级注意力机制的人群计数方法[J]. 计算机应用, 2020, 40(1): 56-61.
[8]	赵瑞祥, 侯宏花, 张鹏程, 刘祎, 田珠, 桂志国. 结合全卷积网络和K均值聚类的球栅阵列焊球边缘气泡分割[J]. 计算机应用, 2019, 39(9): 2580-2585.
[9]	张洋硕, 苗壮, 王家宝, 李阳. 基于Movidius神经计算棒的行人检测方法[J]. 计算机应用, 2019, 39(8): 2230-2234.
[10]	陈万志, 徐东升, 张静, 唐雨. 结合优化支持向量机与K-means++的工控系统入侵检测方法[J]. 计算机应用, 2019, 39(4): 1089-1094.
[11]	韩江洪, 袁稼轩, 卫星, 陆阳. 基于深度学习的井下巷道行人视觉定位算法[J]. 计算机应用, 2019, 39(3): 688-694.
[12]	曹震寰, 蔡小孩, 顾梦鹤, 顾小卓, 李晓伟. 基于访问控制列表机制的Android权限管控方案[J]. 计算机应用, 2019, 39(11): 3316-3322.
[13]	陈光喜, 王佳鑫, 黄勇, 詹益俊, 詹宝莹. 基于级联网络的行人检测方法[J]. 计算机应用, 2019, 39(1): 186-191.
[14]	刘尚旺, 郜翔. 基于深度模型迁移的细粒度图像分类方法[J]. 计算机应用, 2018, 38(8): 2198-2204.
[15]	邹承明, 罗莹, 徐晓龙. 基于多特征组合的细粒度图像分类方法[J]. 计算机应用, 2018, 38(7): 1853-1856.