Bird recognition algorithm based on attention mechanism

doi:10.11772/j.issn.1001-9081.2023081042

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (4): 1114-1120.DOI: 10.11772/j.issn.1001-9081.2023081042

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Bird recognition algorithm based on attention mechanism

Tianhua CHEN¹, Jiaxuan ZHU¹(), Jie YIN²

^1.School of Artificial Intelligence，Beijing Technology and Business University，Beijing 100048，China
^2.Department of Computer Information and Cybersecurity，Jiangsu Police Institute，Nanjing Jiangsu 210031，China

Received:2023-08-08 Revised:2023-12-04 Online:2023-12-18 Published:2024-04-10
Contact: Jiaxuan ZHU
About author:CHEN Tianhua， born in 1966， M. S.， professor. His research interests include image processing， pattern recognition， measurement and control technology.
ZHU Jiaxuan， born in 1997， M. S. candidate. His research interests include pattern recognition， image processing.
YIN Jie， born in 1977， M. S.， senior engineer. His research interests include machine learning， big data， cybersecurity.
Supported by:
National Natural Science Foundation of China(62272203);Joint Project of Beijing Natural Science Foundation and Beijing Municipal Education Commission(KZ202110011015)

基于注意力机制的鸟类识别算法

陈天华¹, 朱家煊¹(), 印杰²

^1.北京工商大学人工智能学院，北京 100048
^2.江苏警官学院计算机信息与网络安全系，南京 210031

通讯作者: 朱家煊
作者简介:陈天华（1966—），男，湖南长沙人，教授，硕士，主要研究方向：图像处理、模式识别、测控技术
朱家煊（1997—），男，江苏南通人，硕士研究生，主要研究方向：模式识别、图像处理 408199640@qq.com
印杰（1977—），男，江苏南京人，高级工程师，硕士，主要研究方向：机器学习、大数据、网络安全。
基金资助:
国家自然科学基金资助项目（62272203）；北京市自然科学基金-北京市教育委员会科技计划重点项目联合项目（KZ202110011015）。

Abstract

Abstract:

Aiming at the low accuracy problem of existing algorithms for fine-grained target bird recognition tasks， a target detection algorithm for bird targets called YOLOv5-Bird， was proposed. Firstly， a mixed domain based Coordinate Attention （CA） mechanism was introduced in the backbone of YOLOv5 to increase the weights of valuable channels and distinguish the features of the target from the redundant features in the background. Secondly， Bi-level Routing Attention （BRA） modules were used to replace part C3 modules in the original backbone to filter the low correlated key-value pair information and obtain efficient long-distance dependencies. Finally， WIoU （Wise-Intersection over Union） function was used as loss function to enhance the localization ability of algorithm. Experimental results show that the detection precision of YOLOv5-Bird reaches 82.8%， and the recall reaches 77.0% on the self-constructed dataset， which are 4.3 and 7.6 percentage points higher than those of YOLOv5 algorithm. Compared with the algorithms adding other attention mechanisms， YOLOv5-Bird also has performance advantages.It is verified that YOLOv5-Bird has better performance in bird target detection scenarios.

Key words: target detection, biological recognition, Convolutional Neural Network (CNN), attention mechanism, loss function

摘要：

针对现有细粒度鸟类目标识别算法准确率不高的问题，提出一种鸟类目标检测算法YOLOv5-Bird。首先，在YOLOv5主干网络中引入基于混合域的坐标注意力（CA）机制，增大有价值的通道权重，以区分目标特征和背景中的冗余特征；其次，在原始主干网络中采用双层路由注意力（BRA）模块替换原网络中的部分C3模块，过滤低相关度的键值对信息，获得高效的长距离依赖关系；最后，使用WIoU（Wise-Intersection over Union）损失函数，增强算法对目标的定位能力。实验结果表明，YOLOv5-Bird在自建数据集上取得了82.8%的精确率和77.0%的召回率，比YOLOv5算法分别提高4.3和7.6个百分点，也优于增加其他注意力机制的算法。验证了YOLOv5-Bird在鸟类目标检测场景中具有较好的性能。

关键词: 目标检测, 生物识别, 卷积神经网络, 注意力机制, 损失函数

CLC Number:

TP391.41

Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism[J]. Journal of Computer Applications, 2024, 44(4): 1114-1120.

陈天华, 朱家煊, 印杰. 基于注意力机制的鸟类识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1114-1120.

Figures/Tables 12

References 36

1	李祎可，王强，李星醇，等. 边缘效应对湿地中鸟类的影响机制研究进展［J］. 湿地科学， 2022， 20（5）： 613-621.
	LI Y K， WANG Q， LI X C， et al. Progress on the impact mechanism of edge effect on birds in wetlands［J］. Wetland Science， 2022， 20（5）： 613-621.
2	唐鑫鑫. 基于深度学习的鸟类识别研究［D］.贵阳：贵州大学，2022：002606.
	TANG X X. Research on bird recognition based on deep learning ［D］. Guiyang： Guizhou University， 2022：002606.
3	李华超，康彬，王磊. 常识辅助细粒度数据增强方法［J］. 计算机工程与应用， 2024， 60（6）：214-221. 10.3778/j.issn.1002-8331.2210-0361
	LI H C， KANG B， WANG L. Commonsense oriented fine-grained data augmentation［J］. Computer Engineering and Applicaions， 2024， 60（6）：214-221. 10.3778/j.issn.1002-8331.2210-0361
4	李柯泉，陈燕，刘佳晨，等.基于深度学习的目标检测算法综述［J］.计算机工程，2022，48（7）：1-12.
	LI K Q， CHEN Y， LIU J C， et al. Survey of deep learning-based object detection algorithms［J］. Computer Engineering， 2022，48（7）：1-12.
5	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
6	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015：1440-1448. 10.1109/iccv.2015.169
7	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）：1137-1149. 10.1109/tpami.2016.2577031
8	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
9	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：6517-6525. 10.1109/cvpr.2017.690
10	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］.（2018-04-08）［2023-07-30］. . 10.1109/cvpr.2017.690
11	BOCHKOVSKIY A， WANG C-Y， LIAO H-Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. ［2023-07-30］. .
12	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
13	许德刚，王露，李凡. 深度学习的典型目标检测算法研究综述［J］. 计算机工程与应用， 2021， 57（8）： 10-25.
	XU D G， WANG L， LI F. Review of typical object detection algorithms for deep learning［J］. Computer Engineering and Applications， 2021， 57（8）： 10-25.
14	HOU Q， ZHOU D， FENG J. Coordinate attention for efficient mobile network design［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021：13713-13722. 10.1109/cvpr46437.2021.01350
15	TONG Z， CHEN Y， XU Z， et al. Wise-IoU： bounding box regression loss with dynamic focusing mechanism［EB/OL］. （2023-01-24）［2023-02-06］. .
16	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018：7132-7141. 10.1109/cvpr.2018.00745
17	WOO S， PARK J， LEE J-Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 3-19. 10.1007/978-3-030-01234-2_1
18	黄子杰，欧阳，江德港，等. 面向牵引座焊缝表面质量检测的轻量型深度学习算法［J］. 计算机应用， 2024， 44（3）：983-988.
	HUANG Z J， OU Y， JIANG D G，et al. Lightweight deep learning algorithm orienting for weld seam surface quality inspection of traction seat［J］. Journal of Computer Applications， 2024， 44（3）：983-988.
19	任欢，王旭光.注意力机制综述［J］.计算机应用，2021，41（S1）：1-6. 10.11772/j.issn.1001-9081.2020101634
	REN H， WANG X G. Review of attention mechanism［J］. Journal of Computer Applications， 2021， 41（S1）：1-6. 10.11772/j.issn.1001-9081.2020101634
20	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words：Transformers for image recognition at scale［EB/OL］. ［2023-07-30］. .
21	顾勇翔，蓝鑫，伏博毅，等. 基于几何适应与全局感知的遥感图像目标检测算法［J］. 计算机应用， 2023， 43（3）： 916-922.
	GU Y X， LAN X， FU B Y， et al. Object detection algorithm for remote sensing images based on geometric adaptation and global perception［J］. Journal of Computer Applications， 2023， 43（3）： 916-922.
22	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision Transformer using shifted windows［C］// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision.Piscataway： IEEE， 2021： 10012-10022. 10.1109/iccv48922.2021.00986
23	LI Y， YAO T， PAN Y， et al. Contextual transformer networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 45（2）： 1489-1500. 10.1109/tpami.2022.3164083
24	MEHTA S， RASTEGARI M. MobileViT： light-weight， general-purpose， and mobile-friendly vision transformer ［EB/OL］. ［2023-07-30］. . 10.1109/cvpr.2019.00941
25	王越，冯振.基于CAM与双线性网络的鸟类图像识别方法［J］.重庆理工大学学报（自然科学），2021，35（11）：136-141，239.
	WANG Y， FENG Z. Bird image recognition method based on CAM and bilinear network［J］. Journal of Chongqing University of Technology （Natural Science）， 2021，35（11）：136-141，239.
26	林梦翔，林志玮，黄秀萍，等. 融合全局与随机局部特征的鸟类姿态识别模型［J］. 计算机辅助设计与图形学学报， 2022，34（4）：581-591.
	LIN M X， LIN Z W， HUANG X P， et al. Bird postures recognition model fusing global and random local features［J］. Journal of Computer-Aided Design & Computer Graphics， 2022，34（4）：581-591.
27	吴洋铭，洪翠，高伟.基于雷达点云与视觉图像融合的输电线路探鸟驱鸟技术［J］.高电压技术， 2023， 49（8）： 3446-3457.
	WU Y M， HONG C， GAO W. Bird detecting and bird repelling technology for transmission lines based on the fusion of radar point cloud and visual image［J］. High Voltage Engineering， 2023， 49（8）： 3446-3457.
28	王蕊，史玉龙，孙辉，等.基于轻量化的高分辨率鸟群识别深度学习网络［J］.华中科技大学学报（自然科学版）， 2023， 51（5）： 81-87.
	WANG R， SHI Y L， SUN H， et al. Lightweight-based high resolution bird flocking recognition deep learning network［J］. Journal of Huazhong University of Science and Technology （Natural Science Edition）， 2023， 51（5）： 81-87.
29	邓亚平，李迎江．YOLO算法及其在自动驾驶场景中目标检测研究综述［J/OL］．计算机应用： 1-12 ［2023-07-30］. . 10.11772/j.issn.1001-9081.2023060889
	DENG Y P， LI Y J. Review of YOLO algorithm and its application to object detection in autonomous driving scenes［J/OL］.Journal of Computer Applications： 1-12 ［2023-07-30］.. 10.11772/j.issn.1001-9081.2023060889
30	WANG C-Y， LIAO H-Y M， WU Y-H， et al. CSPNet： a new backbone that can enhance learning capability of CNN［C］// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2020： 1571-1580. 10.1109/cvprw50498.2020.00203
31	李建，杜建强，朱彦陈，等. 基于Transformer的目标检测算法综述［J］. 计算机工程与应用， 2023， 59（10）： 48-64. 10.3778/j.issn.1002-8331.2211-0133
	LI J， DU J Q， ZHU Y C， et al. Survey of Transformer-based object detection algorithms［J］. Computer Engineering and Applications， 2023， 59（10）： 48-64. 10.3778/j.issn.1002-8331.2211-0133
32	ZHU L， WANG X， KE Z， et al. BiFormer： vision transformer with bi-level routing attention［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023：10323-10333. 10.1109/cvpr52729.2023.00995
33	REN S， ZHOU D， HE S， et al. Shunted self-attention via multi-scale token aggregation［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 10853-10862. 10.1109/cvpr52688.2022.01058
34	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot MultiBox detector［C］// Proceedings of the 14th European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
35	TAN M X， PANG R M， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079
36	WANG C Y， BOCHKOVSKIY A， LIAO H Y M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 7464-7475. 10.1109/cvpr52729.2023.00721

名称	参数
操作系统	Windows 10
CPU	Intel Core i5-10400F
GPU	NVIDIA GeForce GTX 1070
软件	Anaconda、PyCharm2021
深度学习平台	Python 3.7
深度学习框架	PyTorch 1.8.0
GPU加速库	CUDA 11.7

名称	参数
操作系统	Windows 10
CPU	Intel Core i5-10400F
GPU	NVIDIA GeForce GTX 1070
软件	Anaconda、PyCharm2021
深度学习平台	Python 3.7
深度学习框架	PyTorch 1.8.0
GPU加速库	CUDA 11.7

算法	精确率	召回率
YOLOv5	78.5	69.4
Faster R-CNN	77.7	70.2
SSD	63.1	64.7
YOLOv7	78.8	70.3
EfficientDet	78.3	67.6
YOLOv5+CBAM	79.0	70.8
YOLOv5+SE	79.5	70.9
YOLOv5+CA	79.5	71.1
YOLOv5+BRA	80.5	70.4
YOLOv5+Swin Transformer	80.0	70.5
YOLOv5+CotNet	79.9	69.8
YOLOv5+MobileViT	79.5	69.9
YOLOv5-Bird	82.8	77.0

算法	精确率	召回率
YOLOv5	78.5	69.4
Faster R-CNN	77.7	70.2
SSD	63.1	64.7
YOLOv7	78.8	70.3
EfficientDet	78.3	67.6
YOLOv5+CBAM	79.0	70.8
YOLOv5+SE	79.5	70.9
YOLOv5+CA	79.5	71.1
YOLOv5+BRA	80.5	70.4
YOLOv5+Swin Transformer	80.0	70.5
YOLOv5+CotNet	79.9	69.8
YOLOv5+MobileViT	79.5	69.9
YOLOv5-Bird	82.8	77.0

CA	WIoU	BRA	精确率	召回率	mAP@0.5	mAP@0.5：0.95
			78.5	69.4	73.6	54.3
√			79.5	71.1	74.5	56.1
	√		79.7	71.9	76.5	57.5
		√	80.5	70.4	75.5	54.7
√	√		80.7	74.3	79.6	58.0
√		√	80.0	72.3	76.7	57.1
	√	√	81.4	74.2	79.7	58.4
√	√	√	82.8	77.0	80.7	59.4

Bird recognition algorithm based on attention mechanism

基于注意力机制的鸟类识别算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 36

Related Articles 15

Recommended Articles

Metrics

[1]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[2]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[3]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[4]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[5]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[6]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[7]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[8]	Kaili DENG, Weibo WEI, Zhenkuan PAN. Industrial defect detection method with improved masked autoencoder [J]. Journal of Computer Applications, 2024, 44(8): 2595-2603.
[9]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[10]	Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499.
[11]	Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242.
[12]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[13]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[14]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[15]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.