《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (4): 1114-1120.DOI: 10.11772/j.issn.1001-9081.2023081042
所属专题: 人工智能
收稿日期:
2023-08-08
修回日期:
2023-12-04
发布日期:
2023-12-18
出版日期:
2024-04-10
通讯作者:
朱家煊
作者简介:
陈天华(1966—),男,湖南长沙人,教授,硕士,主要研究方向:图像处理、模式识别、测控技术基金资助:
Tianhua CHEN1, Jiaxuan ZHU1(), Jie YIN2
Received:
2023-08-08
Revised:
2023-12-04
Online:
2023-12-18
Published:
2024-04-10
Contact:
Jiaxuan ZHU
About author:
CHEN Tianhua, born in 1966, M. S., professor. His research interests include image processing, pattern recognition, measurement and control technology.Supported by:
摘要:
针对现有细粒度鸟类目标识别算法准确率不高的问题,提出一种鸟类目标检测算法YOLOv5-Bird。首先,在YOLOv5主干网络中引入基于混合域的坐标注意力(CA)机制,增大有价值的通道权重,以区分目标特征和背景中的冗余特征;其次,在原始主干网络中采用双层路由注意力(BRA)模块替换原网络中的部分C3模块,过滤低相关度的键值对信息,获得高效的长距离依赖关系;最后,使用WIoU(Wise-Intersection over Union)损失函数,增强算法对目标的定位能力。实验结果表明,YOLOv5-Bird在自建数据集上取得了82.8%的精确率和77.0%的召回率,比YOLOv5算法分别提高4.3和7.6个百分点,也优于增加其他注意力机制的算法。验证了YOLOv5-Bird在鸟类目标检测场景中具有较好的性能。
中图分类号:
陈天华, 朱家煊, 印杰. 基于注意力机制的鸟类识别算法[J]. 计算机应用, 2024, 44(4): 1114-1120.
Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism[J]. Journal of Computer Applications, 2024, 44(4): 1114-1120.
名称 | 参数 |
---|---|
操作系统 | Windows 10 |
CPU | Intel Core i5-10400F |
GPU | NVIDIA GeForce GTX 1070 |
软件 | Anaconda、PyCharm2021 |
深度学习平台 | Python 3.7 |
深度学习框架 | PyTorch 1.8.0 |
GPU加速库 | CUDA 11.7 |
表1 实验环境
Tab. 1 Experimental environment
名称 | 参数 |
---|---|
操作系统 | Windows 10 |
CPU | Intel Core i5-10400F |
GPU | NVIDIA GeForce GTX 1070 |
软件 | Anaconda、PyCharm2021 |
深度学习平台 | Python 3.7 |
深度学习框架 | PyTorch 1.8.0 |
GPU加速库 | CUDA 11.7 |
算法 | 精确率 | 召回率 |
---|---|---|
YOLOv5 | 78.5 | 69.4 |
Faster R-CNN | 77.7 | 70.2 |
SSD | 63.1 | 64.7 |
YOLOv7 | 78.8 | 70.3 |
EfficientDet | 78.3 | 67.6 |
YOLOv5+CBAM | 79.0 | 70.8 |
YOLOv5+SE | 79.5 | 70.9 |
YOLOv5+CA | 79.5 | 71.1 |
YOLOv5+BRA | 80.5 | 70.4 |
YOLOv5+Swin Transformer | 80.0 | 70.5 |
YOLOv5+CotNet | 79.9 | 69.8 |
YOLOv5+MobileViT | 79.5 | 69.9 |
YOLOv5-Bird | 82.8 | 77.0 |
表2 不同算法实验数据对比 (%)
Tab. 2 Experimental data comparison of different algorithms
算法 | 精确率 | 召回率 |
---|---|---|
YOLOv5 | 78.5 | 69.4 |
Faster R-CNN | 77.7 | 70.2 |
SSD | 63.1 | 64.7 |
YOLOv7 | 78.8 | 70.3 |
EfficientDet | 78.3 | 67.6 |
YOLOv5+CBAM | 79.0 | 70.8 |
YOLOv5+SE | 79.5 | 70.9 |
YOLOv5+CA | 79.5 | 71.1 |
YOLOv5+BRA | 80.5 | 70.4 |
YOLOv5+Swin Transformer | 80.0 | 70.5 |
YOLOv5+CotNet | 79.9 | 69.8 |
YOLOv5+MobileViT | 79.5 | 69.9 |
YOLOv5-Bird | 82.8 | 77.0 |
CA | WIoU | BRA | 精确率 | 召回率 | mAP@0.5 | mAP@0.5:0.95 |
---|---|---|---|---|---|---|
78.5 | 69.4 | 73.6 | 54.3 | |||
√ | 79.5 | 71.1 | 74.5 | 56.1 | ||
√ | 79.7 | 71.9 | 76.5 | 57.5 | ||
√ | 80.5 | 70.4 | 75.5 | 54.7 | ||
√ | √ | 80.7 | 74.3 | 79.6 | 58.0 | |
√ | √ | 80.0 | 72.3 | 76.7 | 57.1 | |
√ | √ | 81.4 | 74.2 | 79.7 | 58.4 | |
√ | √ | √ | 82.8 | 77.0 | 80.7 | 59.4 |
表3 消融实验结果 (%)
Tab. 3 Ablation experimental results
CA | WIoU | BRA | 精确率 | 召回率 | mAP@0.5 | mAP@0.5:0.95 |
---|---|---|---|---|---|---|
78.5 | 69.4 | 73.6 | 54.3 | |||
√ | 79.5 | 71.1 | 74.5 | 56.1 | ||
√ | 79.7 | 71.9 | 76.5 | 57.5 | ||
√ | 80.5 | 70.4 | 75.5 | 54.7 | ||
√ | √ | 80.7 | 74.3 | 79.6 | 58.0 | |
√ | √ | 80.0 | 72.3 | 76.7 | 57.1 | |
√ | √ | 81.4 | 74.2 | 79.7 | 58.4 | |
√ | √ | √ | 82.8 | 77.0 | 80.7 | 59.4 |
1 | 李祎可,王强,李星醇,等. 边缘效应对湿地中鸟类的影响机制研究进展[J]. 湿地科学, 2022, 20(5): 613-621. |
LI Y K, WANG Q, LI X C, et al. Progress on the impact mechanism of edge effect on birds in wetlands[J]. Wetland Science, 2022, 20(5): 613-621. | |
2 | 唐鑫鑫. 基于深度学习的鸟类识别研究[D].贵阳:贵州大学,2022:002606. |
TANG X X. Research on bird recognition based on deep learning [D]. Guiyang: Guizhou University, 2022:002606. | |
3 | 李华超,康彬,王磊. 常识辅助细粒度数据增强方法[J]. 计算机工程与应用, 2024, 60(6):214-221. 10.3778/j.issn.1002-8331.2210-0361 |
LI H C, KANG B, WANG L. Commonsense oriented fine-grained data augmentation[J]. Computer Engineering and Applicaions, 2024, 60(6):214-221. 10.3778/j.issn.1002-8331.2210-0361 | |
4 | 李柯泉,陈燕,刘佳晨,等.基于深度学习的目标检测算法综述[J].计算机工程,2022,48(7):1-12. |
LI K Q, CHEN Y, LIU J C, et al. Survey of deep learning-based object detection algorithms[J]. Computer Engineering, 2022,48(7):1-12. | |
5 | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587. 10.1109/cvpr.2014.81 |
6 | GIRSHICK R. Fast R-CNN[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015:1440-1448. 10.1109/iccv.2015.169 |
7 | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. 10.1109/tpami.2016.2577031 |
8 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. 10.1109/cvpr.2016.91 |
9 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017:6517-6525. 10.1109/cvpr.2017.690 |
10 | REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL].(2018-04-08)[2023-07-30]. . 10.1109/cvpr.2017.690 |
11 | BOCHKOVSKIY A, WANG C-Y, LIAO H-Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-07-30]. . |
12 | LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8759-8768. 10.1109/cvpr.2018.00913 |
13 | 许德刚,王露,李凡. 深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用, 2021, 57(8): 10-25. |
XU D G, WANG L, LI F. Review of typical object detection algorithms for deep learning[J]. Computer Engineering and Applications, 2021, 57(8): 10-25. | |
14 | HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021:13713-13722. 10.1109/cvpr46437.2021.01350 |
15 | TONG Z, CHEN Y, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[EB/OL]. (2023-01-24) [2023-02-06]. . |
16 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018:7132-7141. 10.1109/cvpr.2018.00745 |
17 | WOO S, PARK J, LEE J-Y, et al. CBAM: convolutional block attention module[C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19. 10.1007/978-3-030-01234-2_1 |
18 | 黄子杰,欧阳,江德港,等. 面向牵引座焊缝表面质量检测的轻量型深度学习算法[J]. 计算机应用, 2024, 44(3):983-988. |
HUANG Z J, OU Y, JIANG D G,et al. Lightweight deep learning algorithm orienting for weld seam surface quality inspection of traction seat[J]. Journal of Computer Applications, 2024, 44(3):983-988. | |
19 | 任欢,王旭光.注意力机制综述[J].计算机应用,2021,41(S1):1-6. 10.11772/j.issn.1001-9081.2020101634 |
REN H, WANG X G. Review of attention mechanism[J]. Journal of Computer Applications, 2021, 41(S1):1-6. 10.11772/j.issn.1001-9081.2020101634 | |
20 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words:Transformers for image recognition at scale[EB/OL]. [2023-07-30]. . |
21 | 顾勇翔, 蓝鑫, 伏博毅, 等. 基于几何适应与全局感知的遥感图像目标检测算法[J]. 计算机应用, 2023, 43(3): 916-922. |
GU Y X, LAN X, FU B Y, et al. Object detection algorithm for remote sensing images based on geometric adaptation and global perception[J]. Journal of Computer Applications, 2023, 43(3): 916-922. | |
22 | LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision.Piscataway: IEEE, 2021: 10012-10022. 10.1109/iccv48922.2021.00986 |
23 | LI Y, YAO T, PAN Y, et al. Contextual transformer networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(2): 1489-1500. 10.1109/tpami.2022.3164083 |
24 | MEHTA S, RASTEGARI M. MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer [EB/OL]. [2023-07-30]. . 10.1109/cvpr.2019.00941 |
25 | 王越,冯振.基于CAM与双线性网络的鸟类图像识别方法[J].重庆理工大学学报(自然科学),2021,35(11):136-141,239. |
WANG Y, FENG Z. Bird image recognition method based on CAM and bilinear network[J]. Journal of Chongqing University of Technology (Natural Science), 2021,35(11):136-141,239. | |
26 | 林梦翔, 林志玮, 黄秀萍,等. 融合全局与随机局部特征的鸟类姿态识别模型[J]. 计算机辅助设计与图形学学报, 2022,34(4):581-591. |
LIN M X, LIN Z W, HUANG X P, et al. Bird postures recognition model fusing global and random local features[J]. Journal of Computer-Aided Design & Computer Graphics, 2022,34(4):581-591. | |
27 | 吴洋铭,洪翠,高伟.基于雷达点云与视觉图像融合的输电线路探鸟驱鸟技术[J].高电压技术, 2023, 49(8): 3446-3457. |
WU Y M, HONG C, GAO W. Bird detecting and bird repelling technology for transmission lines based on the fusion of radar point cloud and visual image[J]. High Voltage Engineering, 2023, 49(8): 3446-3457. | |
28 | 王蕊,史玉龙,孙辉,等.基于轻量化的高分辨率鸟群识别深度学习网络[J].华中科技大学学报(自然科学版), 2023, 51(5): 81-87. |
WANG R, SHI Y L, SUN H, et al. Lightweight-based high resolution bird flocking recognition deep learning network[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2023, 51(5): 81-87. | |
29 | 邓亚平,李迎江 .YOLO算法及其在自动驾驶场景中目标检测研究综述[J/OL].计算机应用: 1-12 [2023-07-30]. . 10.11772/j.issn.1001-9081.2023060889 |
DENG Y P, LI Y J. Review of YOLO algorithm and its application to object detection in autonomous driving scenes[J/OL].Journal of Computer Applications: 1-12 [2023-07-30].. 10.11772/j.issn.1001-9081.2023060889 | |
30 | WANG C-Y, LIAO H-Y M, WU Y-H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2020: 1571-1580. 10.1109/cvprw50498.2020.00203 |
31 | 李建, 杜建强, 朱彦陈, 等. 基于Transformer的目标检测算法综述[J]. 计算机工程与应用, 2023, 59(10): 48-64. 10.3778/j.issn.1002-8331.2211-0133 |
LI J, DU J Q, ZHU Y C, et al. Survey of Transformer-based object detection algorithms[J]. Computer Engineering and Applications, 2023, 59(10): 48-64. 10.3778/j.issn.1002-8331.2211-0133 | |
32 | ZHU L, WANG X, KE Z, et al. BiFormer: vision transformer with bi-level routing attention[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023:10323-10333. 10.1109/cvpr52729.2023.00995 |
33 | REN S, ZHOU D, HE S, et al. Shunted self-attention via multi-scale token aggregation[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10853-10862. 10.1109/cvpr52688.2022.01058 |
34 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// Proceedings of the 14th European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37. |
35 | TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787. 10.1109/cvpr42600.2020.01079 |
36 | WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475. 10.1109/cvpr52729.2023.00721 |
[1] | 潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877. |
[2] | 李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910. |
[3] | 赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892. |
[4] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[5] | 李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738. |
[6] | 李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587. |
[7] | 陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499. |
[8] | 张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371. |
[9] | 薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392. |
[10] | 汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399. |
[11] | 高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406. |
[12] | 李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594. |
[13] | 邓凯丽, 魏伟波, 潘振宽. 改进掩码自编码器的工业缺陷检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2595-2603. |
[14] | 莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617. |
[15] | 赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||