基于注意力机制的鸟类识别算法

doi:10.11772/j.issn.1001-9081.2023081042

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (4): 1114-1120.DOI: 10.11772/j.issn.1001-9081.2023081042

• 人工智能 • 上一篇

基于注意力机制的鸟类识别算法

陈天华¹, 朱家煊¹(), 印杰²

^1.北京工商大学人工智能学院，北京 100048
^2.江苏警官学院计算机信息与网络安全系，南京 210031

收稿日期:2023-08-08 修回日期:2023-12-04 发布日期:2023-12-18 出版日期:2024-04-10
通讯作者: 朱家煊
作者简介:陈天华（1966—），男，湖南长沙人，教授，硕士，主要研究方向：图像处理、模式识别、测控技术
朱家煊（1997—），男，江苏南通人，硕士研究生，主要研究方向：模式识别、图像处理 408199640@qq.com
印杰（1977—），男，江苏南京人，高级工程师，硕士，主要研究方向：机器学习、大数据、网络安全。
基金资助:
国家自然科学基金资助项目（62272203）；北京市自然科学基金-北京市教育委员会科技计划重点项目联合项目（KZ202110011015）。

Bird recognition algorithm based on attention mechanism

Tianhua CHEN¹, Jiaxuan ZHU¹(), Jie YIN²

^1.School of Artificial Intelligence，Beijing Technology and Business University，Beijing 100048，China
^2.Department of Computer Information and Cybersecurity，Jiangsu Police Institute，Nanjing Jiangsu 210031，China

Received:2023-08-08 Revised:2023-12-04 Online:2023-12-18 Published:2024-04-10
Contact: Jiaxuan ZHU
About author:CHEN Tianhua， born in 1966， M. S.， professor. His research interests include image processing， pattern recognition， measurement and control technology.
ZHU Jiaxuan， born in 1997， M. S. candidate. His research interests include pattern recognition， image processing.
YIN Jie， born in 1977， M. S.， senior engineer. His research interests include machine learning， big data， cybersecurity.
Supported by:
National Natural Science Foundation of China(62272203);Joint Project of Beijing Natural Science Foundation and Beijing Municipal Education Commission(KZ202110011015)

摘要/Abstract

摘要：

针对现有细粒度鸟类目标识别算法准确率不高的问题，提出一种鸟类目标检测算法YOLOv5-Bird。首先，在YOLOv5主干网络中引入基于混合域的坐标注意力（CA）机制，增大有价值的通道权重，以区分目标特征和背景中的冗余特征；其次，在原始主干网络中采用双层路由注意力（BRA）模块替换原网络中的部分C3模块，过滤低相关度的键值对信息，获得高效的长距离依赖关系；最后，使用WIoU（Wise-Intersection over Union）损失函数，增强算法对目标的定位能力。实验结果表明，YOLOv5-Bird在自建数据集上取得了82.8%的精确率和77.0%的召回率，比YOLOv5算法分别提高4.3和7.6个百分点，也优于增加其他注意力机制的算法。验证了YOLOv5-Bird在鸟类目标检测场景中具有较好的性能。

关键词: 目标检测, 生物识别, 卷积神经网络, 注意力机制, 损失函数

Abstract:

Aiming at the low accuracy problem of existing algorithms for fine-grained target bird recognition tasks， a target detection algorithm for bird targets called YOLOv5-Bird， was proposed. Firstly， a mixed domain based Coordinate Attention （CA） mechanism was introduced in the backbone of YOLOv5 to increase the weights of valuable channels and distinguish the features of the target from the redundant features in the background. Secondly， Bi-level Routing Attention （BRA） modules were used to replace part C3 modules in the original backbone to filter the low correlated key-value pair information and obtain efficient long-distance dependencies. Finally， WIoU （Wise-Intersection over Union） function was used as loss function to enhance the localization ability of algorithm. Experimental results show that the detection precision of YOLOv5-Bird reaches 82.8%， and the recall reaches 77.0% on the self-constructed dataset， which are 4.3 and 7.6 percentage points higher than those of YOLOv5 algorithm. Compared with the algorithms adding other attention mechanisms， YOLOv5-Bird also has performance advantages.It is verified that YOLOv5-Bird has better performance in bird target detection scenarios.

Key words: target detection, biological recognition, Convolutional Neural Network (CNN), attention mechanism, loss function

中图分类号:

TP391.41

陈天华, 朱家煊, 印杰. 基于注意力机制的鸟类识别算法[J]. 计算机应用, 2024, 44(4): 1114-1120.

Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism[J]. Journal of Computer Applications, 2024, 44(4): 1114-1120.

图/表 12

图1 YOLOv5-Bird网络结构

Fig. 1 YOLOv5-Bird network structure

图2 坐标注意力机制流程

Fig. 2 Coordinate attention process

图3 坐标注意力机制可视化

Fig. 3 Visualization of coordinate attention mechanism

图4 ViT可视化

Fig. 4 Visualization of ViT

图5 BRA机制可视化

Fig. 5 Visualization of BRA mechanism

图6 BRA模块流程

Fig. 6 BRA block process

表1 实验环境

Tab. 1 Experimental environment

名称	参数
操作系统	Windows 10
CPU	Intel Core i5-10400F
GPU	NVIDIA GeForce GTX 1070
软件	Anaconda、PyCharm2021
深度学习平台	Python 3.7
深度学习框架	PyTorch 1.8.0
GPU加速库	CUDA 11.7

图7 训练和验证损失曲线

Fig. 7 Curves of training loss and validation loss

图8 不同算法的实验结果对比

Fig. 8 Experimental results comparison of different algorithms

图9 可视化分析

Fig. 9 Visualization analysis

表2 不同算法实验数据对比 (%)

Tab. 2 Experimental data comparison of different algorithms

算法	精确率	召回率
YOLOv5	78.5	69.4
Faster R-CNN	77.7	70.2
SSD	63.1	64.7
YOLOv7	78.8	70.3
EfficientDet	78.3	67.6
YOLOv5+CBAM	79.0	70.8
YOLOv5+SE	79.5	70.9
YOLOv5+CA	79.5	71.1
YOLOv5+BRA	80.5	70.4
YOLOv5+Swin Transformer	80.0	70.5
YOLOv5+CotNet	79.9	69.8
YOLOv5+MobileViT	79.5	69.9
YOLOv5-Bird	82.8	77.0

表3 消融实验结果 (%)

Tab. 3 Ablation experimental results

CA	WIoU	BRA	精确率	召回率	mAP@0.5	mAP@0.5：0.95
			78.5	69.4	73.6	54.3
√			79.5	71.1	74.5	56.1
	√		79.7	71.9	76.5	57.5
		√	80.5	70.4	75.5	54.7
√	√		80.7	74.3	79.6	58.0
√		√	80.0	72.3	76.7	57.1
	√	√	81.4	74.2	79.7	58.4
√	√	√	82.8	77.0	80.7	59.4

参考文献 36

1	李祎可，王强，李星醇，等. 边缘效应对湿地中鸟类的影响机制研究进展［J］. 湿地科学， 2022， 20（5）： 613-621.
	LI Y K， WANG Q， LI X C， et al. Progress on the impact mechanism of edge effect on birds in wetlands［J］. Wetland Science， 2022， 20（5）： 613-621.
2	唐鑫鑫. 基于深度学习的鸟类识别研究［D］.贵阳：贵州大学，2022：002606.
	TANG X X. Research on bird recognition based on deep learning ［D］. Guiyang： Guizhou University， 2022：002606.
3	李华超，康彬，王磊. 常识辅助细粒度数据增强方法［J］. 计算机工程与应用， 2024， 60（6）：214-221. 10.3778/j.issn.1002-8331.2210-0361
	LI H C， KANG B， WANG L. Commonsense oriented fine-grained data augmentation［J］. Computer Engineering and Applicaions， 2024， 60（6）：214-221. 10.3778/j.issn.1002-8331.2210-0361
4	李柯泉，陈燕，刘佳晨，等.基于深度学习的目标检测算法综述［J］.计算机工程，2022，48（7）：1-12.
	LI K Q， CHEN Y， LIU J C， et al. Survey of deep learning-based object detection algorithms［J］. Computer Engineering， 2022，48（7）：1-12.
5	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
6	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015：1440-1448. 10.1109/iccv.2015.169
7	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）：1137-1149. 10.1109/tpami.2016.2577031
8	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
9	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：6517-6525. 10.1109/cvpr.2017.690
10	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］.（2018-04-08）［2023-07-30］. . 10.1109/cvpr.2017.690
11	BOCHKOVSKIY A， WANG C-Y， LIAO H-Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. ［2023-07-30］. .
12	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
13	许德刚，王露，李凡. 深度学习的典型目标检测算法研究综述［J］. 计算机工程与应用， 2021， 57（8）： 10-25.
	XU D G， WANG L， LI F. Review of typical object detection algorithms for deep learning［J］. Computer Engineering and Applications， 2021， 57（8）： 10-25.
14	HOU Q， ZHOU D， FENG J. Coordinate attention for efficient mobile network design［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021：13713-13722. 10.1109/cvpr46437.2021.01350
15	TONG Z， CHEN Y， XU Z， et al. Wise-IoU： bounding box regression loss with dynamic focusing mechanism［EB/OL］. （2023-01-24）［2023-02-06］. .
16	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018：7132-7141. 10.1109/cvpr.2018.00745
17	WOO S， PARK J， LEE J-Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 3-19. 10.1007/978-3-030-01234-2_1
18	黄子杰，欧阳，江德港，等. 面向牵引座焊缝表面质量检测的轻量型深度学习算法［J］. 计算机应用， 2024， 44（3）：983-988.
	HUANG Z J， OU Y， JIANG D G，et al. Lightweight deep learning algorithm orienting for weld seam surface quality inspection of traction seat［J］. Journal of Computer Applications， 2024， 44（3）：983-988.
19	任欢，王旭光.注意力机制综述［J］.计算机应用，2021，41（S1）：1-6. 10.11772/j.issn.1001-9081.2020101634
	REN H， WANG X G. Review of attention mechanism［J］. Journal of Computer Applications， 2021， 41（S1）：1-6. 10.11772/j.issn.1001-9081.2020101634
20	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words：Transformers for image recognition at scale［EB/OL］. ［2023-07-30］. .
21	顾勇翔，蓝鑫，伏博毅，等. 基于几何适应与全局感知的遥感图像目标检测算法［J］. 计算机应用， 2023， 43（3）： 916-922.
	GU Y X， LAN X， FU B Y， et al. Object detection algorithm for remote sensing images based on geometric adaptation and global perception［J］. Journal of Computer Applications， 2023， 43（3）： 916-922.
22	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision Transformer using shifted windows［C］// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision.Piscataway： IEEE， 2021： 10012-10022. 10.1109/iccv48922.2021.00986
23	LI Y， YAO T， PAN Y， et al. Contextual transformer networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 45（2）： 1489-1500. 10.1109/tpami.2022.3164083
24	MEHTA S， RASTEGARI M. MobileViT： light-weight， general-purpose， and mobile-friendly vision transformer ［EB/OL］. ［2023-07-30］. . 10.1109/cvpr.2019.00941
25	王越，冯振.基于CAM与双线性网络的鸟类图像识别方法［J］.重庆理工大学学报（自然科学），2021，35（11）：136-141，239.
	WANG Y， FENG Z. Bird image recognition method based on CAM and bilinear network［J］. Journal of Chongqing University of Technology （Natural Science）， 2021，35（11）：136-141，239.
26	林梦翔，林志玮，黄秀萍，等. 融合全局与随机局部特征的鸟类姿态识别模型［J］. 计算机辅助设计与图形学学报， 2022，34（4）：581-591.
	LIN M X， LIN Z W， HUANG X P， et al. Bird postures recognition model fusing global and random local features［J］. Journal of Computer-Aided Design & Computer Graphics， 2022，34（4）：581-591.
27	吴洋铭，洪翠，高伟.基于雷达点云与视觉图像融合的输电线路探鸟驱鸟技术［J］.高电压技术， 2023， 49（8）： 3446-3457.
	WU Y M， HONG C， GAO W. Bird detecting and bird repelling technology for transmission lines based on the fusion of radar point cloud and visual image［J］. High Voltage Engineering， 2023， 49（8）： 3446-3457.
28	王蕊，史玉龙，孙辉，等.基于轻量化的高分辨率鸟群识别深度学习网络［J］.华中科技大学学报（自然科学版）， 2023， 51（5）： 81-87.
	WANG R， SHI Y L， SUN H， et al. Lightweight-based high resolution bird flocking recognition deep learning network［J］. Journal of Huazhong University of Science and Technology （Natural Science Edition）， 2023， 51（5）： 81-87.
29	邓亚平，李迎江．YOLO算法及其在自动驾驶场景中目标检测研究综述［J/OL］．计算机应用： 1-12 ［2023-07-30］. . 10.11772/j.issn.1001-9081.2023060889
	DENG Y P， LI Y J. Review of YOLO algorithm and its application to object detection in autonomous driving scenes［J/OL］.Journal of Computer Applications： 1-12 ［2023-07-30］.. 10.11772/j.issn.1001-9081.2023060889
30	WANG C-Y， LIAO H-Y M， WU Y-H， et al. CSPNet： a new backbone that can enhance learning capability of CNN［C］// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2020： 1571-1580. 10.1109/cvprw50498.2020.00203
31	李建，杜建强，朱彦陈，等. 基于Transformer的目标检测算法综述［J］. 计算机工程与应用， 2023， 59（10）： 48-64. 10.3778/j.issn.1002-8331.2211-0133
	LI J， DU J Q， ZHU Y C， et al. Survey of Transformer-based object detection algorithms［J］. Computer Engineering and Applications， 2023， 59（10）： 48-64. 10.3778/j.issn.1002-8331.2211-0133
32	ZHU L， WANG X， KE Z， et al. BiFormer： vision transformer with bi-level routing attention［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023：10323-10333. 10.1109/cvpr52729.2023.00995
33	REN S， ZHOU D， HE S， et al. Shunted self-attention via multi-scale token aggregation［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 10853-10862. 10.1109/cvpr52688.2022.01058
34	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot MultiBox detector［C］// Proceedings of the 14th European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
35	TAN M X， PANG R M， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079
36	WANG C Y， BOCHKOVSKIY A， LIAO H Y M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 7464-7475. 10.1109/cvpr52729.2023.00721

[1]	袁泉, 陈昌平, 陈泽, 詹林峰. 基于BERT的两次注意力机制远程监督关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1080-1085.
[2]	张鹏飞, 韩李涛, 冯恒健, 李洪梅. 基于注意力机制和全局特征优化的点云语义分割[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1086-1092.
[3]	王杰, 孟华. 基于点云整体拓扑结构的图像分类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1107-1113.
[4]	肖斌, 甘昀, 汪敏, 张兴鹏, 王照星. 基于端口注意力与通道空间注意力的网络异常流量检测[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1027-1034.
[5]	杨先凤, 汤依磊, 李自强. 基于交替注意力机制和图卷积网络的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1058-1064.
[6]	王海涵, 朱焱. 融合反讽机制的攻击性言论检测[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1065-1071.
[7]	蒋占军, 吴佰靖, 马龙, 廉敬. 多尺度特征和极化自注意力的Faster-RCNN水漂垃圾识别[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 938-944.
[8]	周景贤, 李希娜. 基于改进卷积神经网络和射频指纹的无人机检测与识别[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 876-882.
[9]	黄子杰, 欧阳, 江德港, 郭彩玲, 李柏林. 面向牵引座焊缝表面质量检测的轻量型深度学习算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 983-988.
[10]	郑宇亮, 陈云华, 白伟杰, 陈平华. 融合事件数据和图像帧的车辆目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 931-937.
[11]	赵奎, 仇慧琪, 李旭, 徐知非. 结合注意力和多路径融合的实时肺结节检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 945-952.
[12]	侯瑞峰, 张鹏程, 张丽媛, 桂志国, 刘祎, 张浩文, 王书斌. 基于全变分正则项展开的迭代去噪网络[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 916-921.
[13]	孙滔, 段张甜, 朱浩楠, 郭沛豪, 孙鹤立. 基于新奇度量的社交事件推荐方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 760-766.
[14]	尚爱国, 朱欣娟. 基于多任务学习的意图检测和槽位填充联合方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 690-695.
[15]	王伟, 赵春辉, 唐心瑶, 席刘钢. 自适应地平线约束下的车辆三维检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 909-915.

基于注意力机制的鸟类识别算法

Bird recognition algorithm based on attention mechanism

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 36

相关文章 15

编辑推荐

Metrics