Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
Supported by:
陈江彦1,王彦丹1,刘翼虎1,马应龙2
通讯作者:
基金资助:
Abstract: To address the problem that significant performance degradation was usually incurred by Transformer-based object detection methods when facing long-tailed distributions, a collaborative query optimization-based single-stage end-to-end long-tailed object detection framework (CQ-DETR) was proposed. A Layer-Adaptive Encoder Fusion (LAEF) module was designed to dynamically integrate multi-scale encoder features, by which both high-level semantics and low-level details were incorporated. A Feature-aware Query Generation (FQG) module was designed to dynamically generate content-aware queries from image features, and the representation capability of initial content queries for potential objects was thereby enhanced. A Category-Localization Joint-aware query Selection (CLJS) mechanism was proposed to achieve collaborative optimization of category coverage and localization accuracy. Based on experimental results on the LVIS v1.0 and COCO 2017 benchmark datasets, it was indicated that the proposed CQ-DETR framework was superior to the existing advanced method RichSem on the long-tailed object detection benchmark LVIS v1.0, where the average precision (AP) and tail category average precision (APr) were improved by 1.4 and 1.5 percentage points, respectively, and the effectiveness of the proposed framework in category-imbalanced scenarios was demonstrated; meanwhile, on the relatively balanced COCO 2017 dataset, compared with DINO and Salience-DETR, the AP of CQ-DETR was improved by 1.1 and 0.1 percentage points, respectively, by which the good generalization ability of the model in general object detection scenarios was also proven.
Key words: object detection, Transformer, long-tail distribution, multi-scale feature fusion, query selection mechanism
摘要: 针对基于Transformer的目标检测方法在面向长尾分布时通常会产生性能显著下降的问题,提出了一个协同查询优化的单阶段端到端长尾目标检测框架(CQ-DETR)。设计层自适应编码器融合(LAEF)模块动态整合多尺度编码器特征,兼具高层语义与底层细节。设计特征感知查询生成(FQG)模块从图像特征中动态生成内容感知的查询以增强初始内容查询对潜在目标的表征能力。提出类别-定位联合感知查询选择(CLJS)机制以实现类别覆盖范围与定位精度的协同优化。在LVIS v1.0和COCO 2017基准数据集上的实验结果表明,所提出的CQ-DETR框架在长尾目标检测基准LVIS v1.0上优于现有的先进方法RichSem,平均精度(AP)和尾部类别平均精度(APr)分别提升了1.4和1.5个百分点,证明了所提框架在类别不平衡场景下的有效性;同时,在相对平衡的COCO 2017数据集上,与DINO和Salience-DETR相比,CQ-DETR的AP分别提升了1.1和0.1个百分点,证明了模型在通用目标检测场景下亦具有良好的泛化能力。
关键词: 目标检测, Transformer, 长尾分布, 多尺度特征融合, 查询选择机制
CLC Number:
TP391.41
陈江彦 王彦丹 刘翼虎 马应龙. 基于协同查询优化的长尾目标检测框架[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025081018.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025081018