Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (3): 901-908.DOI: 10.11772/j.issn.1001-9081.2023040412

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Cross-view matching model based on attention mechanism and multi-granularity feature fusion

Meiyu CAI, Runzhe ZHU, Fei WU(), Kaiyu ZHANG, Jiale LI   

  1. School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China
  • Received:2023-04-12 Revised:2023-07-08 Accepted:2023-07-13 Online:2024-03-12 Published:2024-03-10
  • Contact: Fei WU
  • About author:CAI Meiyu, born in 1998, M. S. candidate. Her research interests include visual positioning, scene matching and positioning.
    ZHU Runzhe, born in 1998, M. S. candidate. His research interests include visual geo-localization, cross-view matching.
    ZHANG Kaiyu, born in 1999, M. S. candidate. His research interests include target detection, target tracking, semantic segmentation, image generation.
    LI Jiale, born in 1999, M. S. candidate. His research interests include target detection, document layout analysis.
  • Supported by:
    China University Industry-University-Research Innovation Fund of Ministry of Education(2021ZYA08008);Project of Shanghai Municipal Science and Technology Commission(N22DZ1100803)

基于注意力机制和多粒度特征融合的跨视角匹配模型

蔡美玉, 朱润哲, 吴飞(), 张开昱, 李家乐   

  1. 上海工程技术大学 电子电气工程学院,上海 201620
  • 通讯作者: 吴飞
  • 作者简介:蔡美玉(1998—),女,山东德州人,硕士研究生,主要研究方向:视觉定位、景象匹配定位
    朱润哲(1998—),男,浙江嘉兴人,硕士研究生,主要研究方向:视觉地理定位、交叉视图匹配
    张开昱(1999—),男,福建福州人,硕士研究生,主要研究方向:目标检测、目标跟踪、语义分割、图像生成
    李家乐(1999—),男,江苏无锡人,硕士研究生,主要研究方向:目标检测、文档布局分析。
  • 基金资助:
    教育部中国高校产学研创新基金资助项目(2021ZYA08008);上海市科委项目(N22DZ1100803)

Abstract:

Cross-view scene matching refers to the discovery of images of the same geographical target from different platforms (such as drones and satellites). However, different image platforms lead to low accuracy of UAV (Unmanned Aerial Vehicle) positioning and navigation tasks, and the existing methods usually focus only on a single dimension of the image and ignore the multi-dimensional features of the image. To solve the above problems, GAMF (Global Attention and Multi-granularity feature Fusion) deep neural network was proposed to improve feature representation and feature distinguishability. Firstly, the images from the UAV perspective and the satellite perspective were combined, and the three branches were extended under the unified network architecture, the spatial location, channel and local features of the images from three dimensions were extracted. Then, by establishing the SGAM (Spatial Global relationship Attention Module) and CGAM (Channel Global Attention Module), the spatial global relationship mechanism and channel attention mechanism were introduced to capture global information, so as to better carry out attention learning. Secondly, in order to fuse local perception features, a local division strategy was introduced to better improve the model’s ability to extract fine-grained features. Finally, the features of the three dimensions were combined as the final features to train the model. The test results on the public dataset University-1652 show that the AP (Average Precision) of the GAMF model on UAV visual positioning tasks reaches 87.41%, and the Recall (R@1) in UAV visual navigation tasks reaches 90.30%, which verifies that the GAMF model can effectively aggregate the multi-dimensional features of the image and improve the accuracy of UAV positioning and navigation tasks.

Key words: Unmanned Aerial Vehicle (UAV), scene matching and positioning, visual positioning, measurement learning, global relationship attention, deep learning

摘要:

跨视角景象匹配是指从不同平台(如无人机、卫星等)发现同一地理目标的图像。然而,不同图像平台会导致无人机(UAV)定位和导航任务精度较低,现有方法通常只关注图像的单一维度,忽略了图像的多维特征。针对上述问题,提出一种全局注意力和多粒度特征融合(GAMF)深度神经网络以改进特征表示,提高特征可区分度。首先,GAMF模型结合无人机视角和卫星视角的图像,在统一的网络架构下延展为3个分支,从3个维度提取图像的空间位置、通道和局部特征;然后,建立空间全局关系注意力模块(SGAM)和通道全局注意力模块(CGAM),引入空间全局关系机制和通道注意力机制捕获全局信息,从而更好地进行注意力学习;其次,为了融合局部感知特征,引入局部划分策略,以更好地增强模型提取细粒度特征的能力;最后,联合3个维度的特征作为最后的特征对模型训练。在公开数据集University-1652上的实验结果表明,GAMF模型在无人机视觉定位任务上的平均精准率(AP)达到了87.41%,在无人机视觉导航任务中召回率(R@1)达到了90.30%。验证了GAMF模型能够有效聚合图像的多维特征,提高无人机定位和导航任务的准确性。

关键词: 无人机, 景象匹配定位, 视觉定位, 度量学习, 全局关系注意力, 深度学习

CLC Number: