Object detection algorithm based on asymmetric hourglass network structure
LIU Ziwei1,2,3, DENG Chunhua1,2,3, LIU Jing1,2,3
1. School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan Hubei 430065, China; 2. Institute of Big Data Science and Engineering, Wuhan University of Science and Technology, Wuhan Hubei 430065, China; 3. Hubei Key Laboratory of Intelligent Information Processing and Real-time Industrial System(Wuhan University of Science and Technology), Wuhan Hubei 430065, China
Abstract:Anchor-free deep learning based object detection is a mainstream single-stage object detection algorithm. An hourglass network structure that incorporates multiple layers of supervisory information can significantly improve the accuracy of the anchor-free object detection algorithm, but its speed is much lower than that of a common network at the same level, and the features of different scale objects will interfere with each other. In order to solve the above problems, an object detection algorithm based on asymmetric hourglass network structure was proposed. The proposed algorithm is not constrained by the shape and size when fusing the features of different network layers, and can quickly and efficiently abstract the semantic information of network, making it easier for the model to learn the differences between various scales. Aiming at the problem of object detection at different scales, a multi-scale output hourglass network structure was designed to solve the problem of feature mutual interference between different scale objects and refine the output detection results. In addition, a special non-maximum suppression algorithm for multi-scale outputs was used to improve the recall rate of the detection algorithm. Experimental results show that the AP50 index of the proposed algorithm on Common Objects in COntext (COCO) dataset reaches 61.3%, which is 4.2 percentage points higher than that of anchor-free network CenterNet. The proposed algorithm surpasses the original algorithm in the balance of accuracy and time, and is particularly suitable for real-time object detection in industry.
刘子威, 邓春华, 刘静. 基于非对称沙漏网络结构的目标检测算法[J]. 计算机应用, 2020, 40(12): 3526-3533.
LIU Ziwei, DENG Chunhua, LIU Jing. Object detection algorithm based on asymmetric hourglass network structure. Journal of Computer Applications, 2020, 40(12): 3526-3533.
[1] 张玉璞, 杨旗, 张旗. 基于计算机视觉的图像多尺度识别方法[J]. 计算机应用, 2015, 35(2):502-505, 549.(ZHANG Y P, YANG Q,ZHANG Q. Image multi-scale recognition method based oncomputer vision[J]. Journal of Computer Applications,2015, 35(2):502-505,549.) [2] 郭川磊, 何嘉. 基于转置卷积操作改进的单阶段多边框目标检测方法[J]. 计算机应用, 2018, 38(10):2833-2838.(GUO C L, HENG J. Improved single shot multibox detector based on the transposed convolution[J]. Journal of Computer Applications, 2018,38(10):2833-2838.) [3] CAO Z,HIDALGO G,SIMON T,et al. OpenPose:realtime multiperson 2D pose estimation using part affinity fields[EB/OL]. https://openaccess.thecvf.com/content_cvpr_2017/papers/Cao_Realtime_Multi-Person_2D_CVPR_2017_paper.pdf. [4] 宋小娜, 芮挺, 王新晴. 结合语义边界信息的道路环境语义分割方法[J]. 计算机应用, 2019, 39(9):2505-2510.(SONG X N, RUI T, WANG X Q. Semantic segmentation method of road environmentcombined semantic boundary information[J]. Journal of Computer Applications,2019,39(9):2505-2510.) [5] REDMON J,DIVVALA S,GIRSHICK R,et al. You Only Look Once:unified,real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:779-788. [6] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2015:1440-1448. [7] REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards realtime object detection with region proposalnetworks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017, 39(6):1137-1149. [8] FU C Y,LIU W,RANGA A,et al. DSSD:deconvolutional single shot detector[EB/OL].[2019-10-20]. https://arxiv.org/pdf/1701.06659.pdf. [9] LIU W,ANGUELOV D,ERHAN D,et al. SSD:Single Shot MultiBox Detector[C]//Proceedings of the 201614th European Conference on Computer Vision,LNCS 9905. Cham:Springer, 2016:21-37. [10] LEE K,CHOI J,JEONG J,et al. Residual features and unified predictionnetwork for single stage detection[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1707.05031v3.pdf. [11] LI Z,PENG C,YU G,et al. Light-head R-CNN:in defense of two-stage object detector[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1711.07264.pdf [12] CAI Z,VASCONCELOS N. Cascade R-CNN:delving into high quality object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6154-6162. [13] LI Y,CHEN Y,WANG N,et al. Scale-aware tridentnetworks for object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE, 2019:6053-6062. [14] TAN M,PANG R,LE Q V,et al. EfficientDet:scalable and efficient object detection[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1911.09070.pdf. [15] ZHOU X,WANG D,KRÄHENBÜHL P,et al. Objects as points[EB/OL].[2019-11-21]. https://arxiv.org/pdf/1904.07850.pdf. [16] LAW H, DENG J. CornerNet:detecting objects as paired keypoints[C]//Proceedings of the 201815th European Conference on Computer Vision,LNCS 11218. Cham:Springer, 2018:765-781. [17] DUAN K,BAI S,XIE L,et al. CenterNet:keypoint triplets for object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE, 2019:6568-6577. [18] ZHU C,HE Y,SAVVIDES M,et al. Feature selective anchorfree module for single-shot object detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition。Piscataway:IEEE,2019:840-849. [19] TIAN Z,SHEN C,CHEN H,et al. FCOS:fully convolutional one-stage object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE,2019:9626-9635. [20] GIRSHICK R,DONAHUE J,DARRELL T,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:580-587. [21] REDMON J,FARHADI A. YOLO9000:better,faster,stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:6517-6525. [22] REDMON J, FARHADI A. YOLOv3:an incremental improvement[EB/OL].[2019-11-21]. https://arxiv.org/pdf/1804.02767.pdf. [23] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778. [24] YU F,WANG D,SHELHAMER E,et al. Deep layer aggregation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:2403-2412. [25] NEWELL A, YANG K, DENG J, et al. Stacked hourglassnetworks for human pose estimation[C]//Proceedings of the 201614th European Conference on Computer Vision,LNCS 9912. Cham:Springer,2016:483-499. [26] IOFFE S,SZEGEDY C. Batch normalization:accelerating deepnetwork training by reducing internal covariate shift[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1502.03167v3.pdf. [27] DAI J, QI H, XIONG Y, et al. Deformable convolutionalnetworks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:764-773. [28] ZHU X,HU H,LIN S,et al. Deformable ConvNets V2:more deformable,better results[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:9308-9316. [29] HE K,ZHANG X,REN S,et al. Spatial pyramid pooling in deep convolutionalnetworks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015, 37(9):1904-1916. [30] LIN T Y,DOLLÁR P,GIRSHICK R,et al. Feature pyramidnetworks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:936-944. [31] BODLA N,SINGH B,CHELLAPPA R,et al. Soft-NMS-improving object detection with one line of code[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:5562-5570. [32] EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al. The PASCAL Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision,2010,88(2):303-338. [33] LIN T Y,MAIRE M,BELONGIE S,et al. Microsoft COCO:common objects in context[C]//Proceedings of the 201413th European Conference on Computer Vision,LNCS 8693. Cham:Springer,2014:740-755. [34] LIN T Y,GOYAL P,GIRSHICK R,et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2999-3007. [35] DAI J,LI Y,HE K,et al. R-FCN:object detection via regionbased fully convolutionalnetworks[EB/OL].[2019-11-20]. https://arxiv.org/pdf/1605.06409v2.pdf.