基于注意力机制的鸟类识别算法

doi:10.11772/j.issn.1001-9081.2023081042

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (4): 1114-1120.DOI: 10.11772/j.issn.1001-9081.2023081042

所属专题：人工智能

基于注意力机制的鸟类识别算法

陈天华¹, 朱家煊¹(), 印杰²

^1.北京工商大学人工智能学院，北京 100048
^2.江苏警官学院计算机信息与网络安全系，南京 210031

收稿日期:2023-08-08 修回日期:2023-12-04 发布日期:2023-12-18 出版日期:2024-04-10
通讯作者: 朱家煊
作者简介:陈天华（1966—），男，湖南长沙人，教授，硕士，主要研究方向：图像处理、模式识别、测控技术
朱家煊（1997—），男，江苏南通人，硕士研究生，主要研究方向：模式识别、图像处理 408199640@qq.com
印杰（1977—），男，江苏南京人，高级工程师，硕士，主要研究方向：机器学习、大数据、网络安全。
基金资助:
国家自然科学基金资助项目（62272203）；北京市自然科学基金-北京市教育委员会科技计划重点项目联合项目（KZ202110011015）。

Bird recognition algorithm based on attention mechanism

Tianhua CHEN¹, Jiaxuan ZHU¹(), Jie YIN²

^1.School of Artificial Intelligence，Beijing Technology and Business University，Beijing 100048，China
^2.Department of Computer Information and Cybersecurity，Jiangsu Police Institute，Nanjing Jiangsu 210031，China

Received:2023-08-08 Revised:2023-12-04 Online:2023-12-18 Published:2024-04-10
Contact: Jiaxuan ZHU
About author:CHEN Tianhua， born in 1966， M. S.， professor. His research interests include image processing， pattern recognition， measurement and control technology.
ZHU Jiaxuan， born in 1997， M. S. candidate. His research interests include pattern recognition， image processing.
YIN Jie， born in 1977， M. S.， senior engineer. His research interests include machine learning， big data， cybersecurity.
Supported by:
National Natural Science Foundation of China(62272203);Joint Project of Beijing Natural Science Foundation and Beijing Municipal Education Commission(KZ202110011015)

摘要/Abstract

摘要：

针对现有细粒度鸟类目标识别算法准确率不高的问题，提出一种鸟类目标检测算法YOLOv5-Bird。首先，在YOLOv5主干网络中引入基于混合域的坐标注意力（CA）机制，增大有价值的通道权重，以区分目标特征和背景中的冗余特征；其次，在原始主干网络中采用双层路由注意力（BRA）模块替换原网络中的部分C3模块，过滤低相关度的键值对信息，获得高效的长距离依赖关系；最后，使用WIoU（Wise-Intersection over Union）损失函数，增强算法对目标的定位能力。实验结果表明，YOLOv5-Bird在自建数据集上取得了82.8%的精确率和77.0%的召回率，比YOLOv5算法分别提高4.3和7.6个百分点，也优于增加其他注意力机制的算法。验证了YOLOv5-Bird在鸟类目标检测场景中具有较好的性能。

关键词: 目标检测, 生物识别, 卷积神经网络, 注意力机制, 损失函数

Abstract:

Aiming at the low accuracy problem of existing algorithms for fine-grained target bird recognition tasks， a target detection algorithm for bird targets called YOLOv5-Bird， was proposed. Firstly， a mixed domain based Coordinate Attention （CA） mechanism was introduced in the backbone of YOLOv5 to increase the weights of valuable channels and distinguish the features of the target from the redundant features in the background. Secondly， Bi-level Routing Attention （BRA） modules were used to replace part C3 modules in the original backbone to filter the low correlated key-value pair information and obtain efficient long-distance dependencies. Finally， WIoU （Wise-Intersection over Union） function was used as loss function to enhance the localization ability of algorithm. Experimental results show that the detection precision of YOLOv5-Bird reaches 82.8%， and the recall reaches 77.0% on the self-constructed dataset， which are 4.3 and 7.6 percentage points higher than those of YOLOv5 algorithm. Compared with the algorithms adding other attention mechanisms， YOLOv5-Bird also has performance advantages.It is verified that YOLOv5-Bird has better performance in bird target detection scenarios.

Key words: target detection, biological recognition, Convolutional Neural Network (CNN), attention mechanism, loss function

中图分类号:

TP391.41

陈天华, 朱家煊, 印杰. 基于注意力机制的鸟类识别算法[J]. 计算机应用, 2024, 44(4): 1114-1120.

Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism[J]. Journal of Computer Applications, 2024, 44(4): 1114-1120.

图/表 12

图1 YOLOv5-Bird网络结构

Fig. 1 YOLOv5-Bird network structure

图2 坐标注意力机制流程

Fig. 2 Coordinate attention process

图3 坐标注意力机制可视化

Fig. 3 Visualization of coordinate attention mechanism

图4 ViT可视化

Fig. 4 Visualization of ViT

图5 BRA机制可视化

Fig. 5 Visualization of BRA mechanism

图6 BRA模块流程

Fig. 6 BRA block process

表1 实验环境

Tab. 1 Experimental environment

名称	参数
操作系统	Windows 10
CPU	Intel Core i5-10400F
GPU	NVIDIA GeForce GTX 1070
软件	Anaconda、PyCharm2021
深度学习平台	Python 3.7
深度学习框架	PyTorch 1.8.0
GPU加速库	CUDA 11.7

图7 训练和验证损失曲线

Fig. 7 Curves of training loss and validation loss

图8 不同算法的实验结果对比

Fig. 8 Experimental results comparison of different algorithms

图9 可视化分析

Fig. 9 Visualization analysis

表2 不同算法实验数据对比 (%)

Tab. 2 Experimental data comparison of different algorithms

算法	精确率	召回率
YOLOv5	78.5	69.4
Faster R-CNN	77.7	70.2
SSD	63.1	64.7
YOLOv7	78.8	70.3
EfficientDet	78.3	67.6
YOLOv5+CBAM	79.0	70.8
YOLOv5+SE	79.5	70.9
YOLOv5+CA	79.5	71.1
YOLOv5+BRA	80.5	70.4
YOLOv5+Swin Transformer	80.0	70.5
YOLOv5+CotNet	79.9	69.8
YOLOv5+MobileViT	79.5	69.9
YOLOv5-Bird	82.8	77.0

表3 消融实验结果 (%)

Tab. 3 Ablation experimental results

CA	WIoU	BRA	精确率	召回率	mAP@0.5	mAP@0.5：0.95
			78.5	69.4	73.6	54.3
√			79.5	71.1	74.5	56.1
	√		79.7	71.9	76.5	57.5
		√	80.5	70.4	75.5	54.7
√	√		80.7	74.3	79.6	58.0
√		√	80.0	72.3	76.7	57.1
	√	√	81.4	74.2	79.7	58.4
√	√	√	82.8	77.0	80.7	59.4

参考文献 36

1	李祎可，王强，李星醇，等. 边缘效应对湿地中鸟类的影响机制研究进展［J］. 湿地科学， 2022， 20（5）： 613-621.
	LI Y K， WANG Q， LI X C， et al. Progress on the impact mechanism of edge effect on birds in wetlands［J］. Wetland Science， 2022， 20（5）： 613-621.
2	唐鑫鑫. 基于深度学习的鸟类识别研究［D］.贵阳：贵州大学，2022：002606.
	TANG X X. Research on bird recognition based on deep learning ［D］. Guiyang： Guizhou University， 2022：002606.
3	李华超，康彬，王磊. 常识辅助细粒度数据增强方法［J］. 计算机工程与应用， 2024， 60（6）：214-221. 10.3778/j.issn.1002-8331.2210-0361
	LI H C， KANG B， WANG L. Commonsense oriented fine-grained data augmentation［J］. Computer Engineering and Applicaions， 2024， 60（6）：214-221. 10.3778/j.issn.1002-8331.2210-0361
4	李柯泉，陈燕，刘佳晨，等.基于深度学习的目标检测算法综述［J］.计算机工程，2022，48（7）：1-12.
	LI K Q， CHEN Y， LIU J C， et al. Survey of deep learning-based object detection algorithms［J］. Computer Engineering， 2022，48（7）：1-12.
5	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
6	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015：1440-1448. 10.1109/iccv.2015.169
7	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）：1137-1149. 10.1109/tpami.2016.2577031
8	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
9	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：6517-6525. 10.1109/cvpr.2017.690
10	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］.（2018-04-08）［2023-07-30］. . 10.1109/cvpr.2017.690
11	BOCHKOVSKIY A， WANG C-Y， LIAO H-Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. ［2023-07-30］. .
12	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
13	许德刚，王露，李凡. 深度学习的典型目标检测算法研究综述［J］. 计算机工程与应用， 2021， 57（8）： 10-25.
	XU D G， WANG L， LI F. Review of typical object detection algorithms for deep learning［J］. Computer Engineering and Applications， 2021， 57（8）： 10-25.
14	HOU Q， ZHOU D， FENG J. Coordinate attention for efficient mobile network design［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021：13713-13722. 10.1109/cvpr46437.2021.01350
15	TONG Z， CHEN Y， XU Z， et al. Wise-IoU： bounding box regression loss with dynamic focusing mechanism［EB/OL］. （2023-01-24）［2023-02-06］. .
16	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018：7132-7141. 10.1109/cvpr.2018.00745
17	WOO S， PARK J， LEE J-Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 3-19. 10.1007/978-3-030-01234-2_1
18	黄子杰，欧阳，江德港，等. 面向牵引座焊缝表面质量检测的轻量型深度学习算法［J］. 计算机应用， 2024， 44（3）：983-988.
	HUANG Z J， OU Y， JIANG D G，et al. Lightweight deep learning algorithm orienting for weld seam surface quality inspection of traction seat［J］. Journal of Computer Applications， 2024， 44（3）：983-988.
19	任欢，王旭光.注意力机制综述［J］.计算机应用，2021，41（S1）：1-6. 10.11772/j.issn.1001-9081.2020101634
	REN H， WANG X G. Review of attention mechanism［J］. Journal of Computer Applications， 2021， 41（S1）：1-6. 10.11772/j.issn.1001-9081.2020101634
20	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words：Transformers for image recognition at scale［EB/OL］. ［2023-07-30］. .
21	顾勇翔，蓝鑫，伏博毅，等. 基于几何适应与全局感知的遥感图像目标检测算法［J］. 计算机应用， 2023， 43（3）： 916-922.
	GU Y X， LAN X， FU B Y， et al. Object detection algorithm for remote sensing images based on geometric adaptation and global perception［J］. Journal of Computer Applications， 2023， 43（3）： 916-922.
22	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision Transformer using shifted windows［C］// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision.Piscataway： IEEE， 2021： 10012-10022. 10.1109/iccv48922.2021.00986
23	LI Y， YAO T， PAN Y， et al. Contextual transformer networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 45（2）： 1489-1500. 10.1109/tpami.2022.3164083
24	MEHTA S， RASTEGARI M. MobileViT： light-weight， general-purpose， and mobile-friendly vision transformer ［EB/OL］. ［2023-07-30］. . 10.1109/cvpr.2019.00941
25	王越，冯振.基于CAM与双线性网络的鸟类图像识别方法［J］.重庆理工大学学报（自然科学），2021，35（11）：136-141，239.
	WANG Y， FENG Z. Bird image recognition method based on CAM and bilinear network［J］. Journal of Chongqing University of Technology （Natural Science）， 2021，35（11）：136-141，239.
26	林梦翔，林志玮，黄秀萍，等. 融合全局与随机局部特征的鸟类姿态识别模型［J］. 计算机辅助设计与图形学学报， 2022，34（4）：581-591.
	LIN M X， LIN Z W， HUANG X P， et al. Bird postures recognition model fusing global and random local features［J］. Journal of Computer-Aided Design & Computer Graphics， 2022，34（4）：581-591.
27	吴洋铭，洪翠，高伟.基于雷达点云与视觉图像融合的输电线路探鸟驱鸟技术［J］.高电压技术， 2023， 49（8）： 3446-3457.
	WU Y M， HONG C， GAO W. Bird detecting and bird repelling technology for transmission lines based on the fusion of radar point cloud and visual image［J］. High Voltage Engineering， 2023， 49（8）： 3446-3457.
28	王蕊，史玉龙，孙辉，等.基于轻量化的高分辨率鸟群识别深度学习网络［J］.华中科技大学学报（自然科学版）， 2023， 51（5）： 81-87.
	WANG R， SHI Y L， SUN H， et al. Lightweight-based high resolution bird flocking recognition deep learning network［J］. Journal of Huazhong University of Science and Technology （Natural Science Edition）， 2023， 51（5）： 81-87.
29	邓亚平，李迎江．YOLO算法及其在自动驾驶场景中目标检测研究综述［J/OL］．计算机应用： 1-12 ［2023-07-30］. . 10.11772/j.issn.1001-9081.2023060889
	DENG Y P， LI Y J. Review of YOLO algorithm and its application to object detection in autonomous driving scenes［J/OL］.Journal of Computer Applications： 1-12 ［2023-07-30］.. 10.11772/j.issn.1001-9081.2023060889
30	WANG C-Y， LIAO H-Y M， WU Y-H， et al. CSPNet： a new backbone that can enhance learning capability of CNN［C］// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2020： 1571-1580. 10.1109/cvprw50498.2020.00203
31	李建，杜建强，朱彦陈，等. 基于Transformer的目标检测算法综述［J］. 计算机工程与应用， 2023， 59（10）： 48-64. 10.3778/j.issn.1002-8331.2211-0133
	LI J， DU J Q， ZHU Y C， et al. Survey of Transformer-based object detection algorithms［J］. Computer Engineering and Applications， 2023， 59（10）： 48-64. 10.3778/j.issn.1002-8331.2211-0133
32	ZHU L， WANG X， KE Z， et al. BiFormer： vision transformer with bi-level routing attention［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023：10323-10333. 10.1109/cvpr52729.2023.00995
33	REN S， ZHOU D， HE S， et al. Shunted self-attention via multi-scale token aggregation［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 10853-10862. 10.1109/cvpr52688.2022.01058
34	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot MultiBox detector［C］// Proceedings of the 14th European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
35	TAN M X， PANG R M， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079
36	WANG C Y， BOCHKOVSKIY A， LIAO H Y M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 7464-7475. 10.1109/cvpr52729.2023.00721

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[3]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[4]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[5]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[6]	李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587.
[7]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[8]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[9]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[10]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[11]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[12]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[13]	邓凯丽, 魏伟波, 潘振宽. 改进掩码自编码器的工业缺陷检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2595-2603.
[14]	莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617.
[15]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.

基于注意力机制的鸟类识别算法

Bird recognition algorithm based on attention mechanism

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 36

相关文章 15

编辑推荐

Metrics