[1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:580-587. [2] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916. [3] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2015:1440-1448. [4] REN S, HE K, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. [5] LIU W, ANGUELOV D, ERHAN D, et al. SSD:single shot multibox detector[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham:Springer, 2016:21-37. [6] FU C, LIU W, RANGA A, et al. DSSD:deconvolutional single shot detector[EB/OL].[2019-12-15].https://arxiv.org/pdf/1701.06659.pdf. [7] LIU S, HUANG D, WANG Y. Receptive field block net for accurate and fast object detection[C]//Proceedings of the 2018 European Conference on Computer Vision, LNCS 11215. Cham:Springer, 2018:404-419. [8] ZHANG S, WEN L, BIAN X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:4203-4212. [9] REN J, CHEN X, LIU J, et al. Accurate single stage detector using recurrent rolling convolution[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:752-760. [10] LI Z, ZHOU F. FSSD:feature fusion single shot multibox detector[EB/OL].[2020-12-25].https://arxiv.org/pdf/1712.00960.pdf. [11] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once:unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:779-788. [12] REDMON J, FARHADI A. YOLO9000:better, faster, stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6517-6525. [13] REDMON R, FARHADI A. YOLO v3:an incremental improvement[EB/OL].[2019-12-25].https://arxiv.org/pdf/1804.02767.pdf. [14] CHOI J, CHUN D, KIM H, et al. Gaussian YOLO v3:an accurate and fast object detector using localization uncertainty for autonomous driving[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:502-511. [15] VINYALS O, TOSHEY A, BENGIO S, et al. Show and tell:a neural image caption generator[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:3156-3164. [16] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2014:2204-2212. [17] ZAREMBA W, SUTSKEYER I, VINYALS O. Recurrent neural network regularization[EB/OL].[2020-01-05].https://arxiv.org/pdf/1409.2329.pdf. [18] XU K, BA J, KIROS R, et al. Show, attend and tell:neural image caption generation with visual attention[C]//Proceedings of 32nd International Conference on Machine Learning. New York:International Machine Learning Society, 2015:2048-2057. [19] WANG F, JIANG M, QIAN C, et al. Residual attention network for image classification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6450-6458. [20] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:7794-7803. [21] WOO S, PARK J, LEE J Y, et al. CBAM:convolutional block attention module[C]//Proceedings of the 2018 European Conference on Computer Vision, LNCS 11211. Cham:Springer, 2018:3-19. [22] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:7132-7141. [23] GAN S-H, CHENG M-M, ZHAO K, et al. Res2Net:a new multi-scale backbone architecture[EB/OL]. (2019-09-01)[2020-01-05]. https://arxiv.org/pdf/1904.01169.pdf. [24] 沈文祥,秦品乐,曾建潮. 基于多级特征和混合注意力机制的室内人群检测网络[J]. 计算机应用, 2019, 39(12):3496-3502. (SHEN W X, QIN P L, ZENG J C. Indoor crowd detection network based on multi-level features and fusion attention mechanism[J]. Journal of Computer Applications, 2019, 39(12):3496-3502.) [25] YU F, CHEN H, WANG X, et al. BDD100K:a diverse driving video database with scalable annotation tooling[EB/OL].[2020-01-15].https://arxiv.org/pdf/1805.04687.pdf. [26] 徐诚极,王晓峰,杨亚东. Attention-YOLO:引入注意力机制的YOLO检测算法[J]. 计算机工程与应用, 2019, 55(6):13-23. XU C J, WANG X F, YANG Y D. Attention-YOLO:YOLO detection algorithm that introduces attention mechanism[J]. Computer Engineering and Applications, 2019, 55(6):13-23. [27] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916. [28] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:936-944. |