Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (4): 1114-1120.DOI: 10.11772/j.issn.1001-9081.2023081042
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Tianhua CHEN1, Jiaxuan ZHU1(), Jie YIN2
Received:
2023-08-08
Revised:
2023-12-04
Online:
2023-12-18
Published:
2024-04-10
Contact:
Jiaxuan ZHU
About author:
CHEN Tianhua, born in 1966, M. S., professor. His research interests include image processing, pattern recognition, measurement and control technology.Supported by:
通讯作者:
朱家煊
作者简介:
陈天华(1966—),男,湖南长沙人,教授,硕士,主要研究方向:图像处理、模式识别、测控技术基金资助:
CLC Number:
Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism[J]. Journal of Computer Applications, 2024, 44(4): 1114-1120.
陈天华, 朱家煊, 印杰. 基于注意力机制的鸟类识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1114-1120.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023081042
名称 | 参数 |
---|---|
操作系统 | Windows 10 |
CPU | Intel Core i5-10400F |
GPU | NVIDIA GeForce GTX 1070 |
软件 | Anaconda、PyCharm2021 |
深度学习平台 | Python 3.7 |
深度学习框架 | PyTorch 1.8.0 |
GPU加速库 | CUDA 11.7 |
Tab. 1 Experimental environment
名称 | 参数 |
---|---|
操作系统 | Windows 10 |
CPU | Intel Core i5-10400F |
GPU | NVIDIA GeForce GTX 1070 |
软件 | Anaconda、PyCharm2021 |
深度学习平台 | Python 3.7 |
深度学习框架 | PyTorch 1.8.0 |
GPU加速库 | CUDA 11.7 |
算法 | 精确率 | 召回率 |
---|---|---|
YOLOv5 | 78.5 | 69.4 |
Faster R-CNN | 77.7 | 70.2 |
SSD | 63.1 | 64.7 |
YOLOv7 | 78.8 | 70.3 |
EfficientDet | 78.3 | 67.6 |
YOLOv5+CBAM | 79.0 | 70.8 |
YOLOv5+SE | 79.5 | 70.9 |
YOLOv5+CA | 79.5 | 71.1 |
YOLOv5+BRA | 80.5 | 70.4 |
YOLOv5+Swin Transformer | 80.0 | 70.5 |
YOLOv5+CotNet | 79.9 | 69.8 |
YOLOv5+MobileViT | 79.5 | 69.9 |
YOLOv5-Bird | 82.8 | 77.0 |
Tab. 2 Experimental data comparison of different algorithms
算法 | 精确率 | 召回率 |
---|---|---|
YOLOv5 | 78.5 | 69.4 |
Faster R-CNN | 77.7 | 70.2 |
SSD | 63.1 | 64.7 |
YOLOv7 | 78.8 | 70.3 |
EfficientDet | 78.3 | 67.6 |
YOLOv5+CBAM | 79.0 | 70.8 |
YOLOv5+SE | 79.5 | 70.9 |
YOLOv5+CA | 79.5 | 71.1 |
YOLOv5+BRA | 80.5 | 70.4 |
YOLOv5+Swin Transformer | 80.0 | 70.5 |
YOLOv5+CotNet | 79.9 | 69.8 |
YOLOv5+MobileViT | 79.5 | 69.9 |
YOLOv5-Bird | 82.8 | 77.0 |
CA | WIoU | BRA | 精确率 | 召回率 | mAP@0.5 | mAP@0.5:0.95 |
---|---|---|---|---|---|---|
78.5 | 69.4 | 73.6 | 54.3 | |||
√ | 79.5 | 71.1 | 74.5 | 56.1 | ||
√ | 79.7 | 71.9 | 76.5 | 57.5 | ||
√ | 80.5 | 70.4 | 75.5 | 54.7 | ||
√ | √ | 80.7 | 74.3 | 79.6 | 58.0 | |
√ | √ | 80.0 | 72.3 | 76.7 | 57.1 | |
√ | √ | 81.4 | 74.2 | 79.7 | 58.4 | |
√ | √ | √ | 82.8 | 77.0 | 80.7 | 59.4 |
Tab. 3 Ablation experimental results
CA | WIoU | BRA | 精确率 | 召回率 | mAP@0.5 | mAP@0.5:0.95 |
---|---|---|---|---|---|---|
78.5 | 69.4 | 73.6 | 54.3 | |||
√ | 79.5 | 71.1 | 74.5 | 56.1 | ||
√ | 79.7 | 71.9 | 76.5 | 57.5 | ||
√ | 80.5 | 70.4 | 75.5 | 54.7 | ||
√ | √ | 80.7 | 74.3 | 79.6 | 58.0 | |
√ | √ | 80.0 | 72.3 | 76.7 | 57.1 | |
√ | √ | 81.4 | 74.2 | 79.7 | 58.4 | |
√ | √ | √ | 82.8 | 77.0 | 80.7 | 59.4 |
1 | 李祎可,王强,李星醇,等. 边缘效应对湿地中鸟类的影响机制研究进展[J]. 湿地科学, 2022, 20(5): 613-621. |
LI Y K, WANG Q, LI X C, et al. Progress on the impact mechanism of edge effect on birds in wetlands[J]. Wetland Science, 2022, 20(5): 613-621. | |
2 | 唐鑫鑫. 基于深度学习的鸟类识别研究[D].贵阳:贵州大学,2022:002606. |
TANG X X. Research on bird recognition based on deep learning [D]. Guiyang: Guizhou University, 2022:002606. | |
3 | 李华超,康彬,王磊. 常识辅助细粒度数据增强方法[J]. 计算机工程与应用, 2024, 60(6):214-221. 10.3778/j.issn.1002-8331.2210-0361 |
LI H C, KANG B, WANG L. Commonsense oriented fine-grained data augmentation[J]. Computer Engineering and Applicaions, 2024, 60(6):214-221. 10.3778/j.issn.1002-8331.2210-0361 | |
4 | 李柯泉,陈燕,刘佳晨,等.基于深度学习的目标检测算法综述[J].计算机工程,2022,48(7):1-12. |
LI K Q, CHEN Y, LIU J C, et al. Survey of deep learning-based object detection algorithms[J]. Computer Engineering, 2022,48(7):1-12. | |
5 | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587. 10.1109/cvpr.2014.81 |
6 | GIRSHICK R. Fast R-CNN[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015:1440-1448. 10.1109/iccv.2015.169 |
7 | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. 10.1109/tpami.2016.2577031 |
8 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. 10.1109/cvpr.2016.91 |
9 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017:6517-6525. 10.1109/cvpr.2017.690 |
10 | REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL].(2018-04-08)[2023-07-30]. . 10.1109/cvpr.2017.690 |
11 | BOCHKOVSKIY A, WANG C-Y, LIAO H-Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-07-30]. . |
12 | LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8759-8768. 10.1109/cvpr.2018.00913 |
13 | 许德刚,王露,李凡. 深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用, 2021, 57(8): 10-25. |
XU D G, WANG L, LI F. Review of typical object detection algorithms for deep learning[J]. Computer Engineering and Applications, 2021, 57(8): 10-25. | |
14 | HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021:13713-13722. 10.1109/cvpr46437.2021.01350 |
15 | TONG Z, CHEN Y, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[EB/OL]. (2023-01-24) [2023-02-06]. . |
16 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018:7132-7141. 10.1109/cvpr.2018.00745 |
17 | WOO S, PARK J, LEE J-Y, et al. CBAM: convolutional block attention module[C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19. 10.1007/978-3-030-01234-2_1 |
18 | 黄子杰,欧阳,江德港,等. 面向牵引座焊缝表面质量检测的轻量型深度学习算法[J]. 计算机应用, 2024, 44(3):983-988. |
HUANG Z J, OU Y, JIANG D G,et al. Lightweight deep learning algorithm orienting for weld seam surface quality inspection of traction seat[J]. Journal of Computer Applications, 2024, 44(3):983-988. | |
19 | 任欢,王旭光.注意力机制综述[J].计算机应用,2021,41(S1):1-6. 10.11772/j.issn.1001-9081.2020101634 |
REN H, WANG X G. Review of attention mechanism[J]. Journal of Computer Applications, 2021, 41(S1):1-6. 10.11772/j.issn.1001-9081.2020101634 | |
20 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words:Transformers for image recognition at scale[EB/OL]. [2023-07-30]. . |
21 | 顾勇翔, 蓝鑫, 伏博毅, 等. 基于几何适应与全局感知的遥感图像目标检测算法[J]. 计算机应用, 2023, 43(3): 916-922. |
GU Y X, LAN X, FU B Y, et al. Object detection algorithm for remote sensing images based on geometric adaptation and global perception[J]. Journal of Computer Applications, 2023, 43(3): 916-922. | |
22 | LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision.Piscataway: IEEE, 2021: 10012-10022. 10.1109/iccv48922.2021.00986 |
23 | LI Y, YAO T, PAN Y, et al. Contextual transformer networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(2): 1489-1500. 10.1109/tpami.2022.3164083 |
24 | MEHTA S, RASTEGARI M. MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer [EB/OL]. [2023-07-30]. . 10.1109/cvpr.2019.00941 |
25 | 王越,冯振.基于CAM与双线性网络的鸟类图像识别方法[J].重庆理工大学学报(自然科学),2021,35(11):136-141,239. |
WANG Y, FENG Z. Bird image recognition method based on CAM and bilinear network[J]. Journal of Chongqing University of Technology (Natural Science), 2021,35(11):136-141,239. | |
26 | 林梦翔, 林志玮, 黄秀萍,等. 融合全局与随机局部特征的鸟类姿态识别模型[J]. 计算机辅助设计与图形学学报, 2022,34(4):581-591. |
LIN M X, LIN Z W, HUANG X P, et al. Bird postures recognition model fusing global and random local features[J]. Journal of Computer-Aided Design & Computer Graphics, 2022,34(4):581-591. | |
27 | 吴洋铭,洪翠,高伟.基于雷达点云与视觉图像融合的输电线路探鸟驱鸟技术[J].高电压技术, 2023, 49(8): 3446-3457. |
WU Y M, HONG C, GAO W. Bird detecting and bird repelling technology for transmission lines based on the fusion of radar point cloud and visual image[J]. High Voltage Engineering, 2023, 49(8): 3446-3457. | |
28 | 王蕊,史玉龙,孙辉,等.基于轻量化的高分辨率鸟群识别深度学习网络[J].华中科技大学学报(自然科学版), 2023, 51(5): 81-87. |
WANG R, SHI Y L, SUN H, et al. Lightweight-based high resolution bird flocking recognition deep learning network[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2023, 51(5): 81-87. | |
29 | 邓亚平,李迎江 .YOLO算法及其在自动驾驶场景中目标检测研究综述[J/OL].计算机应用: 1-12 [2023-07-30]. . 10.11772/j.issn.1001-9081.2023060889 |
DENG Y P, LI Y J. Review of YOLO algorithm and its application to object detection in autonomous driving scenes[J/OL].Journal of Computer Applications: 1-12 [2023-07-30].. 10.11772/j.issn.1001-9081.2023060889 | |
30 | WANG C-Y, LIAO H-Y M, WU Y-H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2020: 1571-1580. 10.1109/cvprw50498.2020.00203 |
31 | 李建, 杜建强, 朱彦陈, 等. 基于Transformer的目标检测算法综述[J]. 计算机工程与应用, 2023, 59(10): 48-64. 10.3778/j.issn.1002-8331.2211-0133 |
LI J, DU J Q, ZHU Y C, et al. Survey of Transformer-based object detection algorithms[J]. Computer Engineering and Applications, 2023, 59(10): 48-64. 10.3778/j.issn.1002-8331.2211-0133 | |
32 | ZHU L, WANG X, KE Z, et al. BiFormer: vision transformer with bi-level routing attention[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023:10323-10333. 10.1109/cvpr52729.2023.00995 |
33 | REN S, ZHOU D, HE S, et al. Shunted self-attention via multi-scale token aggregation[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10853-10862. 10.1109/cvpr52688.2022.01058 |
34 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// Proceedings of the 14th European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37. |
35 | TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787. 10.1109/cvpr42600.2020.01079 |
36 | WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475. 10.1109/cvpr52729.2023.00721 |
[1] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. |
[2] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[3] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[4] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[5] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[6] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[7] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
[8] | Kaili DENG, Weibo WEI, Zhenkuan PAN. Industrial defect detection method with improved masked autoencoder [J]. Journal of Computer Applications, 2024, 44(8): 2595-2603. |
[9] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
[10] | Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499. |
[11] | Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242. |
[12] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. |
[13] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[14] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
[15] | Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||