Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
黄舒雯,郭柯宇,宋翔宇,韩锋,孙士杰,宋焕生
通讯作者:
Abstract: In view of the problems that existed 3D visual grounding methods that relied on expensive sensor equipment, incurred high system costs, and exhibited poor accuracy and robustness in complex multi-target grounding scenarios, a multi-target 3D visual grounding method based on monocular images was proposed. This method combined natural language descriptions to achieve the recognition of multiple 3D targets from a single RGB image. To this end, a multi-target visual grounding dataset was constructed and a cross-modal matching network, TextVizNet, was designed. TextVizNet generated 3D bounding boxes for targets by means of a pre-trained monocular detector and deeply integrated visual and linguistic information via an information fusion module and an information alignment module, thereby realizing text-guided multi-target 3D detection. Experimental results show that compared with existed advanced methods, TextVizNet improves the F1-score, Precision, and Recall by 8.92%, 8.39%, and 9.57%, respectively on the Mmo3DRefer dataset compared with the second-place method, significantly improving the accuracy of text-based multi-target localization in complex scenarios, providing effective support for practical applications such as autonomous driving and intelligent robotics.
Key words: 3D visual grounding, monocular image, multimodal technology, object detection, scene understanding
摘要: 针对现有的三维视觉定位方法依赖昂贵传感器设备、系统成本高且在复杂多目标定位中准确度和鲁棒性不足的问题,提出了一种基于单目图像的多目标三维视觉定位方法。该方法结合自然语言描述,在单个RGB图像中实现对多个三维目标的识别。为此,构建了一个多目标视觉定位数据集,并设计了跨模态匹配网络TextVizNet。TextVizNet通过预训练的单目检测器生成目标的三维边界框,并借助信息融合模块与信息对齐模块实现视觉与语言信息的深度整合,进而实现文本指导下的多目标三维检测。实验结果表明,与现有先进方法对比,TextVizNet较第二名方法,在Mmo3DRefer数据集上F1-score、Precision、Recall分别提升了8.92%、8.39%和9.57%,显著提升了复杂场景下基于文本的多目标定位精度,为自动驾驶、智能机器人等实际应用提供了有效支持。
关键词: 三维视觉定位, 单目图像, 多模态技术, 目标检测, 场景理解
CLC Number:
TP391.41
黄舒雯 郭柯宇 宋翔宇 韩锋 孙士杰 宋焕生. 基于单目图像的多目标三维视觉定位方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025010074.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025010074