Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1586-1595.DOI: 10.11772/j.issn.1001-9081.2025050565

• Multimedia computing and computer simulation • Previous Articles    

Lightweight underwater small object detection based on graph Transformer and RT-DETR

Minqi WU1, Yuanhua YANG1(), Hang LI1, Yaqin HU2, Zhihao TANG1, Teng MEI1   

  1. 1.School of Artificial Intelligence and Computer Science,Hubei Normal University,Huangshi Hubei 435002,China
    2.Wuhan Beike Instruments Company Limited,Wuhan Hubei 430010,China
  • Received:2025-05-26 Revised:2025-10-13 Accepted:2025-10-20 Online:2025-10-29 Published:2026-05-10
  • Contact: Yuanhua YANG
  • About author:WU Minqi, born in 1999, M. S. candidate. His research interests include computer vision, object detection, graph neural networks.
    LI Hang, born in 1999, M. S. candidate. His research interests include artificial intelligence.
    HU Yaqin, born in 1999, engineer. Her research interests include software engineering.
    TANG Zhihao, born in 2000, M. S. candidate. His research interests include deep learning, object detection.
    MEI Teng, born in 1991, M. S. candidate. His research interests include computer vision, 3D image processing.
  • Supported by:
    Natural Science Foundation of Hubei Province(2022CFC013);2025 “Graduate Innovation and Research Project” of Hubei Normal University(2025Y127)

基于图Transformer和RT-DETR的轻量化水下小目标检测

吴闵奇1, 杨元华1(), 李航1, 胡雅琴2, 汤智豪1, 梅腾1   

  1. 1.湖北师范大学 人工智能与计算机学院,湖北 黄石 435002
    2.武汉贝科仪器有限公司,武汉 430010
  • 通讯作者: 杨元华
  • 作者简介:吴闵奇(1999—),男,湖北武汉人,硕士研究生,CCF会员,主要研究方向:计算机视觉、目标检测、图神经网络;
    李航(1999—),男,四川成都人,硕士研究生,主要研究方向:人工智能;
    胡雅琴(1999—),女,湖北黄石人,工程师,主要研究方向:软件工程;
    汤智豪(2000—),男,湖北十堰人,硕士研究生,主要研究方向:深度学习、目标检测;
    梅腾(1991—),男,湖北武汉人,硕士研究生,主要研究方向:计算机视觉、三维图像处理。
  • 基金资助:
    湖北省自然科学基金资助项目(2022CFC013);2025年湖北师范大学“研究生创新科研”立项建设项目(2025Y127)

Abstract:

Existing underwater small object detection methods are primarily based on deep learning algorithms, which face challenges in balancing lightweight design and detection accuracy, so that they unable to meet the requirements of real-time and resource-constrained platforms. Therefore, Graph-DETR, a lightweight underwater small object detection model based on RT-DETR (Real-Time DEtection TRansformer) and a graph Transformer, was proposed. The model used a lightweight MobileNetV4 backbone improved with the Large Separable Kernel Attention mechanism (LSKAttention) and the Context-Mixing dynamic convolutional block (CM block) to enhance feature extraction efficiency and reduce model complexity. Additionally, a hierarchical Graph Transformer Feature Pyramid Network (GTFPN) was proposed to strengthen multi-scale feature fusion, and the hybrid encoder was optimized via Wavelet Transform Convolution (WTConv), Adaptive downsampling (Adown), and path pruning, thereby achieving convolutional receptive field expansion of the CNN-based Cross-scale Feature Fusion (CCFF) module with low parameterization. Experimental results on the underwater public dataset URPC2020 show that, compared to RT-DETR, Graph-DETR reduces the parameters by 66.9% and the reasoning latency by 6.8 ms, achieving a mean Average Precision (mAP) of 53.2% and an Average Precision of 86.8% at an IoU threshold of 0.5 (AP@0.5); on URPC2021, it has 81.3% recall, 54.1% mAP, 87.6% AP@0.5 with only 10.5 ms latency, outperforming the existing methods. Graph-DETR exhibits excellent performance in underwater small object detection and is practical for deployment on resource-constrained underwater platforms.

Key words: underwater small object detection, graph Transformer, RT-DETR (Real-Time DEtection TRansformer), lightweight, multi-scale feature fusion

摘要:

现有的水下小目标检测方法多基于深度学习算法,难以兼顾轻量化与检测精度,无法满足实时性和资源受限水下平台的需求。因此,提出一种基于RT-DETR(Real-Time DEtection TRansformer)和图Transformer的轻量化水下小目标检测模型Graph-DETR。该模型采用大型可分离核注意力机制(LSKAttention)和上下文混合的动态卷积块(CM block)改进的轻量级MobileNetV4作为主干网络,以提升特征提取效率,降低模型复杂度。同时,提出层次化图Transformer金字塔网络(GTFPN)增强多尺度特征融合,并通过小波变换卷积(WTConv)、自适应下采样(Adown)和路径剪枝改进混合编码器,低参数化扩展基于CNN的跨尺度特征融合(CCFF)模块的卷积感受野。在水下公开数据集URPC2020上的实验结果表明,Graph-DETR的参数量比RT-DETR减少66.9%,推理延迟缩短6.8 ms,检测平均精度均值(mAP)达到53.2%,交并比(IoU)阈值为0.5时的平均精度(AP@0.5)为86.8%;在URPC2021上,Graph-DETR的召回率达到81.3%,mAP为54.1%,AP@0.5为87.6%,均超越了对比方法,且推理延迟仅10.5 ms。Graph-DETR展现出优越的水下小目标检测性能,具备实际部署在计算资源受限水下平台的应用前景。

关键词: 水下小目标检测, 图Transformer, RT-DETR, 轻量化, 多尺度特征融合

CLC Number: