| 1 | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. | 
																													
																						| 2 | HE K, ZHANG X, REN S, et al. Identity mappings in deep residual networks[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9908. Cham: Springer, 2016: 630-645. | 
																													
																						| 3 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. | 
																													
																						| 4 | TAN M, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 6105-6114. | 
																													
																						| 5 | TAN M, LE Q V. EfficientNetV2: smaller models and faster training[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 10096-10106. | 
																													
																						| 6 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. (2021-06-03) [2024-01-05]. . | 
																													
																						| 7 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017:6000-6010. | 
																													
																						| 8 | LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9992-10002. | 
																													
																						| 9 | WANG W, XIE E, LI X, et al. Pyramid Vision Transformer: a versatile backbone for dense prediction without convolutions[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 548-558. | 
																													
																						| 10 | WANG W, XIE E, LI X, et al. PVT v2: improved baselines with pyramid vision Transformer[J]. Computational Visual Media, 2022, 8(3): 415-424. | 
																													
																						| 11 | GUO M H, LU C Z, LIU Z N, et al. Visual attention network[J]. Computational Visual Media, 2023, 9(3): 733-752. | 
																													
																						| 12 | DING M, XIAO B, CODELLA N, et al. DaViT: dual attention vision Transformers[C]// Proceedings of the 2022 European Conference on Computer Vision, LNCS 13684. Cham: Springer, 2022: 74-92. | 
																													
																						| 13 | LI K, WANG Y, ZHANG J, et al. UniFormer: unifying convolution and self-attention for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 12581-12600. | 
																													
																						| 14 | YANG J, LI C, DAI X, et al. Focal modulation networks[C]// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 4203-4217. | 
																													
																						| 15 | YU W, LUO M, ZHOU P, et al. MetaFormer is actually what you need for vision[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10809-10819. | 
																													
																						| 16 | LI X, LI F, YU J, et al. A high-precision underwater object detection based on joint self-supervised deblurring and improved spatial Transformer network[EB/OL]. (2022-03-09) [2024-01-05].. | 
																													
																						| 17 | XU X, QIN Y, XI D, et al. MulTNet: a multi-scale Transformer network for marine image segmentation toward fishing[J]. Sensors, 2022, 22(19): No.7224. | 
																													
																						| 18 | GONG B, DAI K, SHAO J, et al. Fish-TViT: a novel fish species classification method in multi water areas based on transfer learning and vision Transformer[J]. Heliyon, 2023, 9(6): No.e16761. | 
																													
																						| 19 | 崔颖,韩佳成,高山,等. 基于改进Deformable-DETR的水下图像目标检测方法[J]. 应用科技, 2024, 51(1):30-36, 91. | 
																													
																						|  | CUI Y, HAN J C, GAO S, et al. An object detection method of underwater image based on improved Deformable-DETR[J]. Applied Science and Technology, 2024, 51(1):30-36, 91. | 
																													
																						| 20 | HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17) [2024-01-05].. | 
																													
																						| 21 | DAI Y, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 3559-3568. | 
																													
																						| 22 | CHU X, TIAN Z, ZHANG B, et al. Conditional positional encodings for vision Transformers[EB/OL]. (2023-02-13) [2024-01-05].. | 
																													
																						| 23 | WU H, XIAO B, CODELLA N, et al. CvT: introducing convolutions to vision Transformers[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021:22-31. | 
																													
																						| 24 | CHU X, TIAN Z, WANG Y, et al. Twins: revisiting the design of spatial attention in vision Transformers[C]// Proceedings of the 35th Conference on Neural Information Processing Systems. New York: ACM, 2024: 9355-9366. | 
																													
																						| 25 | LOSHCHILOV I, HUTTER F. Fixing weight decay regularization in Adam[EB/OL]. [2024-01-05].. | 
																													
																						| 26 | FISHER R B, CHEN-BUEGER Y H, GIORDANO D, et al. Fish4Knowledge: Collecting and Analyzing Massive Coral Reef Fish Video Data[M]. Cham: Springer, 2016: 1-319. | 
																													
																						| 27 | FalkSCHUETZENMEISTER, MATT M, RISDAL MEG, et al. The nature conservancy fisheries monitoring[DS/OL]. [2024-01-05].. |