Object detection algorithm based on asymmetric hourglass network structure

doi:10.11772/j.issn.1001-9081.2020050641

Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (12): 3526-3533.DOI: 10.11772/j.issn.1001-9081.2020050641

• Artificial intelligence • Previous Articles Next Articles

Object detection algorithm based on asymmetric hourglass network structure

LIU Ziwei^1,2,3, DENG Chunhua^1,2,3, LIU Jing^1,2,3

1. School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan Hubei 430065, China;
2. Institute of Big Data Science and Engineering, Wuhan University of Science and Technology, Wuhan Hubei 430065, China;
3. Hubei Key Laboratory of Intelligent Information Processing and Real-time Industrial System(Wuhan University of Science and Technology), Wuhan Hubei 430065, China

Received:2020-05-15 Revised:2020-07-20 Online:2020-08-14 Published:2020-12-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China （61806150）， the Hubei Provincial Department of Science and Technology Program （2018CFB195）， the Hubei Provincial Department of Education Science and Technology Research Program Young Talent Project （Q20181104）， the Open Foundation of Hubei Key Laboratory of Intelligent Information Processing and Real-time Industrial System （znxx2018QN09）， the National Defense Advanced Research Foundation of Wuhan University of Science and Technology （GF201814）.

基于非对称沙漏网络结构的目标检测算法

刘子威^1,2,3, 邓春华^1,2,3, 刘静^1,2,3

1. 武汉科技大学计算机科学与技术学院, 武汉 430065;
2. 武汉科技大学大数据科学与工程研究院, 武汉 430065;
3. 智能信息处理与实时工业系统湖北省重点实验室(武汉科技大学), 武汉 430065

通讯作者: 刘静(1984-),女,湖北孝感人,讲师,博士,主要研究方向:调度、容错、实时系统、边缘计算。luijing_cs@wust.edu.cn
作者简介:刘子威(1996-),男,湖北孝感人,硕士研究生,主要研究方向:计算机视觉、机器学习;邓春华(1984-),男,湖南郴州人,副教授,博士,主要研究方向:计算机视觉、机器学习
基金资助:
国家自然科学基金资助项目（61806150）；湖北省科技厅计划项目（2018CFB195）；湖北省教育厅科学技术研究计划青年人才项目（Q20181104）；智能信息处理与实时工业系统湖北省重点实验室开放基金资助项目（znxx2018QN09）；武汉科技大学国防预研基金资助项目（GF201814）。

Abstract

Abstract: Anchor-free deep learning based object detection is a mainstream single-stage object detection algorithm. An hourglass network structure that incorporates multiple layers of supervisory information can significantly improve the accuracy of the anchor-free object detection algorithm, but its speed is much lower than that of a common network at the same level, and the features of different scale objects will interfere with each other. In order to solve the above problems, an object detection algorithm based on asymmetric hourglass network structure was proposed. The proposed algorithm is not constrained by the shape and size when fusing the features of different network layers, and can quickly and efficiently abstract the semantic information of network, making it easier for the model to learn the differences between various scales. Aiming at the problem of object detection at different scales, a multi-scale output hourglass network structure was designed to solve the problem of feature mutual interference between different scale objects and refine the output detection results. In addition, a special non-maximum suppression algorithm for multi-scale outputs was used to improve the recall rate of the detection algorithm. Experimental results show that the AP50 index of the proposed algorithm on Common Objects in COntext (COCO) dataset reaches 61.3%, which is 4.2 percentage points higher than that of anchor-free network CenterNet. The proposed algorithm surpasses the original algorithm in the balance of accuracy and time, and is particularly suitable for real-time object detection in industry.

Key words: deep learning, computer vision, convolutional neural network, single-stage object detection, anchor, hourglass network

摘要： 基于无锚框深度学习的目标检测是一种主流的单阶段目标检测算法。融合多层监督信息的沙漏网络结构能够显著提升无锚框目标检测算法的精度，然而其速度却远低于同层次的普通网络的速度，并且不同尺度目标间的特征会互相干扰。针对上述问题，提出了一种非对称沙漏网络结构的目标检测算法。该算法在融合不同网络层的特征时不受形状大小的约束，能够快速高效抽象出网络的语义信息，使模型更容易学习到各种尺度之间的差异。针对不同尺度目标检测问题，设计了一种多尺度输出的沙漏网络结构用来解决不同尺度目标间特征互相干扰的问题，并精细化输出的检测结果。另外，针对多尺度输出使用了一种特殊的非极大值抑制算法以提高检测算法的召回率。实验结果表明，所提算法在COCO数据集上的AP50指标达到61.3%，相较于无锚框网络CenterNet提升了4.2个百分点。所提算法在精度与时间的平衡上超越了原始算法，尤其适用于对工业场景的目标进行实时检测。

关键词: 深度学习, 机器视觉, 卷积神经网络, 单阶段目标检测, 锚框, 沙漏网络

CLC Number:

TP391.41

LIU Ziwei, DENG Chunhua, LIU Jing. Object detection algorithm based on asymmetric hourglass network structure[J]. Journal of Computer Applications, 2020, 40(12): 3526-3533.

刘子威, 邓春华, 刘静. 基于非对称沙漏网络结构的目标检测算法[J]. 计算机应用, 2020, 40(12): 3526-3533.

References

[1] 张玉璞, 杨旗, 张旗. 基于计算机视觉的图像多尺度识别方法[J]. 计算机应用, 2015, 35(2):502-505, 549.(ZHANG Y P, YANG Q,ZHANG Q. Image multi-scale recognition method based oncomputer vision[J]. Journal of Computer Applications,2015, 35(2):502-505,549.)
[2] 郭川磊, 何嘉. 基于转置卷积操作改进的单阶段多边框目标检测方法[J]. 计算机应用, 2018, 38(10):2833-2838.(GUO C L, HENG J. Improved single shot multibox detector based on the transposed convolution[J]. Journal of Computer Applications, 2018,38(10):2833-2838.)
[3] CAO Z,HIDALGO G,SIMON T,et al. OpenPose:realtime multiperson 2D pose estimation using part affinity fields[EB/OL]. https://openaccess.thecvf.com/content_cvpr_2017/papers/Cao_Realtime_Multi-Person_2D_CVPR_2017_paper.pdf.
[4] 宋小娜, 芮挺, 王新晴. 结合语义边界信息的道路环境语义分割方法[J]. 计算机应用, 2019, 39(9):2505-2510.(SONG X N, RUI T, WANG X Q. Semantic segmentation method of road environmentcombined semantic boundary information[J]. Journal of Computer Applications,2019,39(9):2505-2510.)
[5] REDMON J,DIVVALA S,GIRSHICK R,et al. You Only Look Once:unified,real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:779-788.
[6] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2015:1440-1448.
[7] REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards realtime object detection with region proposalnetworks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017, 39(6):1137-1149.
[8] FU C Y,LIU W,RANGA A,et al. DSSD:deconvolutional single shot detector[EB/OL].[2019-10-20]. https://arxiv.org/pdf/1701.06659.pdf.
[9] LIU W,ANGUELOV D,ERHAN D,et al. SSD:Single Shot MultiBox Detector[C]//Proceedings of the 201614th European Conference on Computer Vision,LNCS 9905. Cham:Springer, 2016:21-37.
[10] LEE K,CHOI J,JEONG J,et al. Residual features and unified predictionnetwork for single stage detection[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1707.05031v3.pdf.
[11] LI Z,PENG C,YU G,et al. Light-head R-CNN:in defense of two-stage object detector[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1711.07264.pdf
[12] CAI Z,VASCONCELOS N. Cascade R-CNN:delving into high quality object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6154-6162.
[13] LI Y,CHEN Y,WANG N,et al. Scale-aware tridentnetworks for object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE, 2019:6053-6062.
[14] TAN M,PANG R,LE Q V,et al. EfficientDet:scalable and efficient object detection[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1911.09070.pdf.
[15] ZHOU X,WANG D,KRÄHENBÜHL P,et al. Objects as points[EB/OL].[2019-11-21]. https://arxiv.org/pdf/1904.07850.pdf.
[16] LAW H, DENG J. CornerNet:detecting objects as paired keypoints[C]//Proceedings of the 201815th European Conference on Computer Vision,LNCS 11218. Cham:Springer, 2018:765-781.
[17] DUAN K,BAI S,XIE L,et al. CenterNet:keypoint triplets for object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE, 2019:6568-6577.
[18] ZHU C,HE Y,SAVVIDES M,et al. Feature selective anchorfree module for single-shot object detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition。Piscataway:IEEE,2019:840-849.
[19] TIAN Z,SHEN C,CHEN H,et al. FCOS:fully convolutional one-stage object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE,2019:9626-9635.
[20] GIRSHICK R,DONAHUE J,DARRELL T,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:580-587.
[21] REDMON J,FARHADI A. YOLO9000:better,faster,stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:6517-6525.
[22] REDMON J, FARHADI A. YOLOv3:an incremental improvement[EB/OL].[2019-11-21]. https://arxiv.org/pdf/1804.02767.pdf.
[23] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[24] YU F,WANG D,SHELHAMER E,et al. Deep layer aggregation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:2403-2412.
[25] NEWELL A, YANG K, DENG J, et al. Stacked hourglassnetworks for human pose estimation[C]//Proceedings of the 201614th European Conference on Computer Vision,LNCS 9912. Cham:Springer,2016:483-499.
[26] IOFFE S,SZEGEDY C. Batch normalization:accelerating deepnetwork training by reducing internal covariate shift[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1502.03167v3.pdf.
[27] DAI J, QI H, XIONG Y, et al. Deformable convolutionalnetworks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:764-773.
[28] ZHU X,HU H,LIN S,et al. Deformable ConvNets V2:more deformable,better results[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:9308-9316.
[29] HE K,ZHANG X,REN S,et al. Spatial pyramid pooling in deep convolutionalnetworks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015, 37(9):1904-1916.
[30] LIN T Y,DOLLÁR P,GIRSHICK R,et al. Feature pyramidnetworks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:936-944.
[31] BODLA N,SINGH B,CHELLAPPA R,et al. Soft-NMS-improving object detection with one line of code[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:5562-5570.
[32] EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al. The PASCAL Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision,2010,88(2):303-338.
[33] LIN T Y,MAIRE M,BELONGIE S,et al. Microsoft COCO:common objects in context[C]//Proceedings of the 201413th European Conference on Computer Vision,LNCS 8693. Cham:Springer,2014:740-755.
[34] LIN T Y,GOYAL P,GIRSHICK R,et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2999-3007.
[35] DAI J,LI Y,HE K,et al. R-FCN:object detection via regionbased fully convolutionalnetworks[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1605.06409v2.pdf.

Object detection algorithm based on asymmetric hourglass network structure

基于非对称沙漏网络结构的目标检测算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[2]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[3]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[4]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[5]	Ying HUANG, Jiayu YANG, Jiahao JIN, Bangrui WAN. Siamese mixed information fusion algorithm for RGBT tracking [J]. Journal of Computer Applications, 2024, 44(9): 2878-2885.
[6]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[7]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[8]	Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682.
[9]	Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499.
[10]	Chunxue ZHANG, Liqing QIU, Cheng’ai SUN, Caixia JING. Purchase behavior prediction model based on two-stage dynamic interest recognition [J]. Journal of Computer Applications, 2024, 44(8): 2365-2371.
[11]	Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380.
[12]	Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557.
[13]	Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625.
[14]	Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650.
[15]	Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318.