基于非对称沙漏网络结构的目标检测算法

doi:10.11772/j.issn.1001-9081.2020050641

计算机应用 ›› 2020, Vol. 40 ›› Issue (12): 3526-3533.DOI: 10.11772/j.issn.1001-9081.2020050641

基于非对称沙漏网络结构的目标检测算法

刘子威^1,2,3, 邓春华^1,2,3, 刘静^1,2,3

1. 武汉科技大学计算机科学与技术学院, 武汉 430065;
2. 武汉科技大学大数据科学与工程研究院, 武汉 430065;
3. 智能信息处理与实时工业系统湖北省重点实验室(武汉科技大学), 武汉 430065

收稿日期:2020-05-15 修回日期:2020-07-20 出版日期:2020-12-10 发布日期:2020-08-14
通讯作者: 刘静(1984-),女,湖北孝感人,讲师,博士,主要研究方向:调度、容错、实时系统、边缘计算。luijing_cs@wust.edu.cn
作者简介:刘子威(1996-),男,湖北孝感人,硕士研究生,主要研究方向:计算机视觉、机器学习;邓春华(1984-),男,湖南郴州人,副教授,博士,主要研究方向:计算机视觉、机器学习
基金资助:
国家自然科学基金资助项目（61806150）；湖北省科技厅计划项目（2018CFB195）；湖北省教育厅科学技术研究计划青年人才项目（Q20181104）；智能信息处理与实时工业系统湖北省重点实验室开放基金资助项目（znxx2018QN09）；武汉科技大学国防预研基金资助项目（GF201814）。

Object detection algorithm based on asymmetric hourglass network structure

LIU Ziwei^1,2,3, DENG Chunhua^1,2,3, LIU Jing^1,2,3

1. School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan Hubei 430065, China;
2. Institute of Big Data Science and Engineering, Wuhan University of Science and Technology, Wuhan Hubei 430065, China;
3. Hubei Key Laboratory of Intelligent Information Processing and Real-time Industrial System(Wuhan University of Science and Technology), Wuhan Hubei 430065, China

Received:2020-05-15 Revised:2020-07-20 Online:2020-12-10 Published:2020-08-14
Supported by:
This work is partially supported by the National Natural Science Foundation of China （61806150）， the Hubei Provincial Department of Science and Technology Program （2018CFB195）， the Hubei Provincial Department of Education Science and Technology Research Program Young Talent Project （Q20181104）， the Open Foundation of Hubei Key Laboratory of Intelligent Information Processing and Real-time Industrial System （znxx2018QN09）， the National Defense Advanced Research Foundation of Wuhan University of Science and Technology （GF201814）.

摘要/Abstract

摘要： 基于无锚框深度学习的目标检测是一种主流的单阶段目标检测算法。融合多层监督信息的沙漏网络结构能够显著提升无锚框目标检测算法的精度，然而其速度却远低于同层次的普通网络的速度，并且不同尺度目标间的特征会互相干扰。针对上述问题，提出了一种非对称沙漏网络结构的目标检测算法。该算法在融合不同网络层的特征时不受形状大小的约束，能够快速高效抽象出网络的语义信息，使模型更容易学习到各种尺度之间的差异。针对不同尺度目标检测问题，设计了一种多尺度输出的沙漏网络结构用来解决不同尺度目标间特征互相干扰的问题，并精细化输出的检测结果。另外，针对多尺度输出使用了一种特殊的非极大值抑制算法以提高检测算法的召回率。实验结果表明，所提算法在COCO数据集上的AP50指标达到61.3%，相较于无锚框网络CenterNet提升了4.2个百分点。所提算法在精度与时间的平衡上超越了原始算法，尤其适用于对工业场景的目标进行实时检测。

关键词: 深度学习, 机器视觉, 卷积神经网络, 单阶段目标检测, 锚框, 沙漏网络

Abstract: Anchor-free deep learning based object detection is a mainstream single-stage object detection algorithm. An hourglass network structure that incorporates multiple layers of supervisory information can significantly improve the accuracy of the anchor-free object detection algorithm, but its speed is much lower than that of a common network at the same level, and the features of different scale objects will interfere with each other. In order to solve the above problems, an object detection algorithm based on asymmetric hourglass network structure was proposed. The proposed algorithm is not constrained by the shape and size when fusing the features of different network layers, and can quickly and efficiently abstract the semantic information of network, making it easier for the model to learn the differences between various scales. Aiming at the problem of object detection at different scales, a multi-scale output hourglass network structure was designed to solve the problem of feature mutual interference between different scale objects and refine the output detection results. In addition, a special non-maximum suppression algorithm for multi-scale outputs was used to improve the recall rate of the detection algorithm. Experimental results show that the AP50 index of the proposed algorithm on Common Objects in COntext (COCO) dataset reaches 61.3%, which is 4.2 percentage points higher than that of anchor-free network CenterNet. The proposed algorithm surpasses the original algorithm in the balance of accuracy and time, and is particularly suitable for real-time object detection in industry.

Key words: deep learning, computer vision, convolutional neural network, single-stage object detection, anchor, hourglass network

中图分类号:

TP391.41

刘子威, 邓春华, 刘静. 基于非对称沙漏网络结构的目标检测算法[J]. 计算机应用, 2020, 40(12): 3526-3533.

LIU Ziwei, DENG Chunhua, LIU Jing. Object detection algorithm based on asymmetric hourglass network structure[J]. Journal of Computer Applications, 2020, 40(12): 3526-3533.

参考文献

[1] 张玉璞, 杨旗, 张旗. 基于计算机视觉的图像多尺度识别方法[J]. 计算机应用, 2015, 35(2):502-505, 549.(ZHANG Y P, YANG Q,ZHANG Q. Image multi-scale recognition method based oncomputer vision[J]. Journal of Computer Applications,2015, 35(2):502-505,549.)
[2] 郭川磊, 何嘉. 基于转置卷积操作改进的单阶段多边框目标检测方法[J]. 计算机应用, 2018, 38(10):2833-2838.(GUO C L, HENG J. Improved single shot multibox detector based on the transposed convolution[J]. Journal of Computer Applications, 2018,38(10):2833-2838.)
[3] CAO Z,HIDALGO G,SIMON T,et al. OpenPose:realtime multiperson 2D pose estimation using part affinity fields[EB/OL]. https://openaccess.thecvf.com/content_cvpr_2017/papers/Cao_Realtime_Multi-Person_2D_CVPR_2017_paper.pdf.
[4] 宋小娜, 芮挺, 王新晴. 结合语义边界信息的道路环境语义分割方法[J]. 计算机应用, 2019, 39(9):2505-2510.(SONG X N, RUI T, WANG X Q. Semantic segmentation method of road environmentcombined semantic boundary information[J]. Journal of Computer Applications,2019,39(9):2505-2510.)
[5] REDMON J,DIVVALA S,GIRSHICK R,et al. You Only Look Once:unified,real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:779-788.
[6] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2015:1440-1448.
[7] REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards realtime object detection with region proposalnetworks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017, 39(6):1137-1149.
[8] FU C Y,LIU W,RANGA A,et al. DSSD:deconvolutional single shot detector[EB/OL].[2019-10-20]. https://arxiv.org/pdf/1701.06659.pdf.
[9] LIU W,ANGUELOV D,ERHAN D,et al. SSD:Single Shot MultiBox Detector[C]//Proceedings of the 201614th European Conference on Computer Vision,LNCS 9905. Cham:Springer, 2016:21-37.
[10] LEE K,CHOI J,JEONG J,et al. Residual features and unified predictionnetwork for single stage detection[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1707.05031v3.pdf.
[11] LI Z,PENG C,YU G,et al. Light-head R-CNN:in defense of two-stage object detector[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1711.07264.pdf
[12] CAI Z,VASCONCELOS N. Cascade R-CNN:delving into high quality object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6154-6162.
[13] LI Y,CHEN Y,WANG N,et al. Scale-aware tridentnetworks for object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE, 2019:6053-6062.
[14] TAN M,PANG R,LE Q V,et al. EfficientDet:scalable and efficient object detection[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1911.09070.pdf.
[15] ZHOU X,WANG D,KRÄHENBÜHL P,et al. Objects as points[EB/OL].[2019-11-21]. https://arxiv.org/pdf/1904.07850.pdf.
[16] LAW H, DENG J. CornerNet:detecting objects as paired keypoints[C]//Proceedings of the 201815th European Conference on Computer Vision,LNCS 11218. Cham:Springer, 2018:765-781.
[17] DUAN K,BAI S,XIE L,et al. CenterNet:keypoint triplets for object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE, 2019:6568-6577.
[18] ZHU C,HE Y,SAVVIDES M,et al. Feature selective anchorfree module for single-shot object detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition。Piscataway:IEEE,2019:840-849.
[19] TIAN Z,SHEN C,CHEN H,et al. FCOS:fully convolutional one-stage object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE,2019:9626-9635.
[20] GIRSHICK R,DONAHUE J,DARRELL T,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:580-587.
[21] REDMON J,FARHADI A. YOLO9000:better,faster,stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:6517-6525.
[22] REDMON J, FARHADI A. YOLOv3:an incremental improvement[EB/OL].[2019-11-21]. https://arxiv.org/pdf/1804.02767.pdf.
[23] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[24] YU F,WANG D,SHELHAMER E,et al. Deep layer aggregation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:2403-2412.
[25] NEWELL A, YANG K, DENG J, et al. Stacked hourglassnetworks for human pose estimation[C]//Proceedings of the 201614th European Conference on Computer Vision,LNCS 9912. Cham:Springer,2016:483-499.
[26] IOFFE S,SZEGEDY C. Batch normalization:accelerating deepnetwork training by reducing internal covariate shift[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1502.03167v3.pdf.
[27] DAI J, QI H, XIONG Y, et al. Deformable convolutionalnetworks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:764-773.
[28] ZHU X,HU H,LIN S,et al. Deformable ConvNets V2:more deformable,better results[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:9308-9316.
[29] HE K,ZHANG X,REN S,et al. Spatial pyramid pooling in deep convolutionalnetworks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015, 37(9):1904-1916.
[30] LIN T Y,DOLLÁR P,GIRSHICK R,et al. Feature pyramidnetworks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:936-944.
[31] BODLA N,SINGH B,CHELLAPPA R,et al. Soft-NMS-improving object detection with one line of code[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:5562-5570.
[32] EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al. The PASCAL Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision,2010,88(2):303-338.
[33] LIN T Y,MAIRE M,BELONGIE S,et al. Microsoft COCO:common objects in context[C]//Proceedings of the 201413th European Conference on Computer Vision,LNCS 8693. Cham:Springer,2014:740-755.
[34] LIN T Y,GOYAL P,GIRSHICK R,et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2999-3007.
[35] DAI J,LI Y,HE K,et al. R-FCN:object detection via regionbased fully convolutionalnetworks[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1605.06409v2.pdf.

基于非对称沙漏网络结构的目标检测算法

Object detection algorithm based on asymmetric hourglass network structure

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[2]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[3]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[4]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[5]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[6]	牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779.
[7]	谢德峰, 吉建民. 融入句法感知表示进行句法增强的语义解析[J]. 计算机应用, 2021, 41(9): 2489-2495.
[8]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[9]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[10]	郑志强, 胡鑫, 翁智, 王雨禾, 程曦. 基于改进DenseNet的牛眼图像特征提取方法[J]. 计算机应用, 2021, 41(9): 2780-2784.
[11]	陈成瑞, 孙宁, 何世彪, 廖勇. 面向C-V2X通信的基于深度学习的联合信道估计与均衡算法[J]. 计算机应用, 2021, 41(9): 2687-2693.
[12]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[13]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[14]	黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.
[15]	何正海, 线岩团, 王蒙, 余正涛. 融合句法指导与字符注意力机制的案情阅读理解方法[J]. 计算机应用, 2021, 41(8): 2427-2431.