嵌入注意力和特征交织模块的Gaussian-YOLO v3目标检测

doi:10.11772/j.issn.1001-9081.2020010030

计算机应用 ›› 2020, Vol. 40 ›› Issue (8): 2225-2230.DOI: 10.11772/j.issn.1001-9081.2020010030

嵌入注意力和特征交织模块的Gaussian-YOLO v3目标检测

刘丹¹, 吴亚娟¹, 罗南超², 郑伯川³

1. 西华师范大学计算机学院, 四川南充 637009;
2. 阿坝师范学院计算机科学与技术学院, 四川阿坝 623002;
3. 西华师范大学数学与信息学院, 四川南充 637009

收稿日期:2020-01-16 修回日期:2020-04-01 出版日期:2020-08-10 发布日期:2020-04-09
通讯作者: 郑伯川(1974-),男,四川自贡人,教授,博士,CCF会员,主要研究方向:机器学习、深度学习、计算机视觉。zhengbc@vip.163.com
作者简介:刘丹(1996-),女,四川广安人,硕士研究生,CCF会员,主要研究方向:深度学习、目标检测;吴亚娟(1974-),女,四川大竹人,教授,博士,主要研究方向:图像处理、数值计算;罗南超(1975-),男,四川富顺人,教授,主要研究方向:云计算、数据分析与挖掘、图形图像处理。
基金资助:
四川省科技计划项目（2019YFG0299）；西华师范大学基本科研项目（19B045）；西华师范大学大学生创新创业项目（cxcy2018305）。

Object detection of Gaussian-YOLO v3 implanting attention and feature intertwine modules

LIU Dan¹, WU Yajuan¹, LUO Nanchao², ZHENG Bochuan³

1. School of Computer Science, China West Normal University, Nanchong Sichuan 637002, China;
2. School of Computer Science and Technology, Aba Teachers University, Aba Sichuan 623002, China;
3. School of Mathematics and Information, China West Normal University, Nanchong Sichuan 637002, China

Received:2020-01-16 Revised:2020-04-01 Online:2020-08-10 Published:2020-04-09
Supported by:
This work is partially supported by the Sichuan Science and Technology Program (2019YFG0299), the Fundamental Research Project of China West Normal University (19B045), the Innovation and Entrepreneurship Project of College Students of China West Normal University (cxcy2018305).

摘要/Abstract

摘要： 错误的目标检测可能导致严重事故，因此高精度的目标检测在汽车自动驾驶中至关重要。提出了一种嵌入注意力和特征交织模块的Gaussian-YOLO v3目标检测方法。该方法主要对Gaussian-YOLO v3的几个特定特征图进行了改进：首先在特征图中添加注意力模块以自主学习每个通道的权重，增强关键特征、抑制冗余特征，从而加强网络对前景目标和背景的区分能力；其次，同时将特征图的不同通道进行特征交织得到更具表征性的特征；最后，把注意力和特征交织模块分别得到的特征融合构成新的特征图。实验结果表明，所提方法在BDD100K数据集上达到了20.81%的平均精确率均值（mAP）和18.17%的F₁分数，使误报率减少了3.5%，意味着误报率得到了有效降低。由此可见，所提方法的检测性能优于YOLO v3和Gaussian-YOLO v3。

关键词: Gaussian-YOLO v3, 注意力机制, 特征交织, 自动驾驶, 目标检测

Abstract: Wrong object detection may lead to serious accidents, so high-precision object detection is very important in autonomous driving. An object detection method of Gaussian-YOLO v3 combining attention and feature intertwine module was proposed, in which several specific feature maps were mainly improved. First, the attention module was added to the feature map to learn the weight of each channel autonomously, enhancing the key features and suppressing the redundant features, so as to enhance the network ability to distinguish foreground object and background. Second, at the same time, different channels of the feature map were intertwined to obtain more representative features. Finally, the features obtained by the attention and feature intertwine modules were fused to form a new feature map. Experimental results show that the proposed method achieves mAP (mean Average Precision) of 20.81% and F₁ score of 18.17% on BDD100K dataset, and has the false alarm rate decreased by 3.5 percentage points, reducing the false alarm rate effectively. It can be seen that the detection performance of the proposed method is better than those of YOLO v3 and Gaussian-YOLO v3.

Key words: Gaussian-YOLO v3, attention mechanism, feature intertwine, autonomous driving, object detection

中图分类号:

刘丹, 吴亚娟, 罗南超, 郑伯川. 嵌入注意力和特征交织模块的Gaussian-YOLO v3目标检测[J]. 计算机应用, 2020, 40(8): 2225-2230.

LIU Dan, WU Yajuan, LUO Nanchao, ZHENG Bochuan. Object detection of Gaussian-YOLO v3 implanting attention and feature intertwine modules[J]. Journal of Computer Applications, 2020, 40(8): 2225-2230.

参考文献

[1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:580-587.
[2] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[3] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2015:1440-1448.
[4] REN S, HE K, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[5] LIU W, ANGUELOV D, ERHAN D, et al. SSD:single shot multibox detector[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham:Springer, 2016:21-37.
[6] FU C, LIU W, RANGA A, et al. DSSD:deconvolutional single shot detector[EB/OL].[2019-12-15].https://arxiv.org/pdf/1701.06659.pdf.
[7] LIU S, HUANG D, WANG Y. Receptive field block net for accurate and fast object detection[C]//Proceedings of the 2018 European Conference on Computer Vision, LNCS 11215. Cham:Springer, 2018:404-419.
[8] ZHANG S, WEN L, BIAN X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:4203-4212.
[9] REN J, CHEN X, LIU J, et al. Accurate single stage detector using recurrent rolling convolution[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:752-760.
[10] LI Z, ZHOU F. FSSD:feature fusion single shot multibox detector[EB/OL].[2020-12-25].https://arxiv.org/pdf/1712.00960.pdf.
[11] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once:unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:779-788.
[12] REDMON J, FARHADI A. YOLO9000:better, faster, stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6517-6525.
[13] REDMON R, FARHADI A. YOLO v3:an incremental improvement[EB/OL].[2019-12-25].https://arxiv.org/pdf/1804.02767.pdf.
[14] CHOI J, CHUN D, KIM H, et al. Gaussian YOLO v3:an accurate and fast object detector using localization uncertainty for autonomous driving[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:502-511.
[15] VINYALS O, TOSHEY A, BENGIO S, et al. Show and tell:a neural image caption generator[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:3156-3164.
[16] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2014:2204-2212.
[17] ZAREMBA W, SUTSKEYER I, VINYALS O. Recurrent neural network regularization[EB/OL].[2020-01-05].https://arxiv.org/pdf/1409.2329.pdf.
[18] XU K, BA J, KIROS R, et al. Show, attend and tell:neural image caption generation with visual attention[C]//Proceedings of 32nd International Conference on Machine Learning. New York:International Machine Learning Society, 2015:2048-2057.
[19] WANG F, JIANG M, QIAN C, et al. Residual attention network for image classification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6450-6458.
[20] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:7794-7803.
[21] WOO S, PARK J, LEE J Y, et al. CBAM:convolutional block attention module[C]//Proceedings of the 2018 European Conference on Computer Vision, LNCS 11211. Cham:Springer, 2018:3-19.
[22] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:7132-7141.
[23] GAN S-H, CHENG M-M, ZHAO K, et al. Res2Net:a new multi-scale backbone architecture[EB/OL]. (2019-09-01)[2020-01-05]. https://arxiv.org/pdf/1904.01169.pdf.
[24] 沈文祥,秦品乐,曾建潮. 基于多级特征和混合注意力机制的室内人群检测网络[J]. 计算机应用, 2019, 39(12):3496-3502. (SHEN W X, QIN P L, ZENG J C. Indoor crowd detection network based on multi-level features and fusion attention mechanism[J]. Journal of Computer Applications, 2019, 39(12):3496-3502.)
[25] YU F, CHEN H, WANG X, et al. BDD100K:a diverse driving video database with scalable annotation tooling[EB/OL].[2020-01-15].https://arxiv.org/pdf/1805.04687.pdf.
[26] 徐诚极,王晓峰,杨亚东. Attention-YOLO:引入注意力机制的YOLO检测算法[J]. 计算机工程与应用, 2019, 55(6):13-23. XU C J, WANG X F, YANG Y D. Attention-YOLO:YOLO detection algorithm that introduces attention mechanism[J]. Computer Engineering and Applications, 2019, 55(6):13-23.
[27] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[28] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:936-944.

嵌入注意力和特征交织模块的Gaussian-YOLO v3目标检测

Object detection of Gaussian-YOLO v3 implanting attention and feature intertwine modules

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[2]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[3]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[4]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[5]	马佳良, 陈斌, 孙晓飞. 基于改进的Faster R-CNN的通用目标检测框架[J]. 计算机应用, 2021, 41(9): 2712-2719.
[6]	刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522.
[7]	陈静, 毛莺池, 陈豪, 王龙宝, 王子成. 基于改进单点多盒检测器的大坝缺陷目标检测方法[J]. 计算机应用, 2021, 41(8): 2366-2372.
[8]	樊玮, 李晨炫, 邢艳, 黄睿, 彭洪健. 航空发动机损伤图像的二分类到多分类递进式检测网络[J]. 计算机应用, 2021, 41(8): 2352-2357.
[9]	党伟超, 李涛, 白尚旺, 高改梅, 刘春霞. 基于自注意力长短期记忆网络的Web软件系统实时剩余寿命预测方法[J]. 计算机应用, 2021, 41(8): 2346-2351.
[10]	侯笑晗, 金国栋, 谭力宁, 薛远亮. 基于自适应和最优特征的合成孔径雷达舰船检测方法[J]. 计算机应用, 2021, 41(7): 2150-2155.
[11]	高钦泉, 黄炳城, 刘文哲, 童同. 基于改进CenterNet的竹条表面缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1933-1938.
[12]	李朝, 兰海, 魏宪. 基于注意力的毫米波-激光雷达融合目标检测[J]. 计算机应用, 2021, 41(7): 2137-2144.
[13]	李扬志, 袁家政, 刘宏哲. 基于时空注意力图卷积网络模型的人体骨架动作识别算法[J]. 计算机应用, 2021, 41(7): 1915-1921.
[14]	张洋, 江铭虎. 基于注意力机制的文本作者识别[J]. 计算机应用, 2021, 41(7): 1897-1901.
[15]	肖振远, 王逸涵, 罗建桥, 熊鹰, 李柏林. 基于部分加权损失函数的RefineDet[J]. 计算机应用, 2021, 41(7): 1928-1932.