1 |
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
|
2 |
HE K, ZHANG X, REN S, et al. Identity mappings in deep residual networks[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9908. Cham: Springer, 2016: 630-645.
|
3 |
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778.
|
4 |
TAN M, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 6105-6114.
|
5 |
TAN M, LE Q V. EfficientNetV2: smaller models and faster training[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 10096-10106.
|
6 |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. (2021-06-03) [2024-01-05]. .
|
7 |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017:6000-6010.
|
8 |
LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9992-10002.
|
9 |
WANG W, XIE E, LI X, et al. Pyramid Vision Transformer: a versatile backbone for dense prediction without convolutions[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 548-558.
|
10 |
WANG W, XIE E, LI X, et al. PVT v2: improved baselines with pyramid vision Transformer[J]. Computational Visual Media, 2022, 8(3): 415-424.
|
11 |
GUO M H, LU C Z, LIU Z N, et al. Visual attention network[J]. Computational Visual Media, 2023, 9(3): 733-752.
|
12 |
DING M, XIAO B, CODELLA N, et al. DaViT: dual attention vision Transformers[C]// Proceedings of the 2022 European Conference on Computer Vision, LNCS 13684. Cham: Springer, 2022: 74-92.
|
13 |
LI K, WANG Y, ZHANG J, et al. UniFormer: unifying convolution and self-attention for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 12581-12600.
|
14 |
YANG J, LI C, DAI X, et al. Focal modulation networks[C]// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 4203-4217.
|
15 |
YU W, LUO M, ZHOU P, et al. MetaFormer is actually what you need for vision[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10809-10819.
|
16 |
LI X, LI F, YU J, et al. A high-precision underwater object detection based on joint self-supervised deblurring and improved spatial Transformer network[EB/OL]. (2022-03-09) [2024-01-05]..
|
17 |
XU X, QIN Y, XI D, et al. MulTNet: a multi-scale Transformer network for marine image segmentation toward fishing[J]. Sensors, 2022, 22(19): No.7224.
|
18 |
GONG B, DAI K, SHAO J, et al. Fish-TViT: a novel fish species classification method in multi water areas based on transfer learning and vision Transformer[J]. Heliyon, 2023, 9(6): No.e16761.
|
19 |
崔颖,韩佳成,高山,等. 基于改进Deformable-DETR的水下图像目标检测方法[J]. 应用科技, 2024, 51(1):30-36, 91.
|
|
CUI Y, HAN J C, GAO S, et al. An object detection method of underwater image based on improved Deformable-DETR[J]. Applied Science and Technology, 2024, 51(1):30-36, 91.
|
20 |
HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17) [2024-01-05]..
|
21 |
DAI Y, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 3559-3568.
|
22 |
CHU X, TIAN Z, ZHANG B, et al. Conditional positional encodings for vision Transformers[EB/OL]. (2023-02-13) [2024-01-05]..
|
23 |
WU H, XIAO B, CODELLA N, et al. CvT: introducing convolutions to vision Transformers[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021:22-31.
|
24 |
CHU X, TIAN Z, WANG Y, et al. Twins: revisiting the design of spatial attention in vision Transformers[C]// Proceedings of the 35th Conference on Neural Information Processing Systems. New York: ACM, 2024: 9355-9366.
|
25 |
LOSHCHILOV I, HUTTER F. Fixing weight decay regularization in Adam[EB/OL]. [2024-01-05]..
|
26 |
FISHER R B, CHEN-BUEGER Y H, GIORDANO D, et al. Fish4Knowledge: Collecting and Analyzing Massive Coral Reef Fish Video Data[M]. Cham: Springer, 2016: 1-319.
|
27 |
FalkSCHUETZENMEISTER, MATT M, RISDAL MEG, et al. The nature conservancy fisheries monitoring[DS/OL]. [2024-01-05]..
|