Cross-view matching model based on attention mechanism and multi-granularity feature fusion

doi:10.11772/j.issn.1001-9081.2023040412

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (3): 901-908.DOI: 10.11772/j.issn.1001-9081.2023040412

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles Next Articles

Cross-view matching model based on attention mechanism and multi-granularity feature fusion

Meiyu CAI, Runzhe ZHU, Fei WU(), Kaiyu ZHANG, Jiale LI

School of Electronic and Electrical Engineering，Shanghai University of Engineering Science，Shanghai 201620，China

Received:2023-04-12 Revised:2023-07-08 Accepted:2023-07-13 Online:2024-03-12 Published:2024-03-10
Contact: Fei WU
About author:CAI Meiyu， born in 1998， M. S. candidate. Her research interests include visual positioning， scene matching and positioning.
ZHU Runzhe， born in 1998， M. S. candidate. His research interests include visual geo-localization， cross-view matching.
ZHANG Kaiyu， born in 1999， M. S. candidate. His research interests include target detection， target tracking， semantic segmentation， image generation.
LI Jiale， born in 1999， M. S. candidate. His research interests include target detection， document layout analysis.
Supported by:
China University Industry-University-Research Innovation Fund of Ministry of Education(2021ZYA08008);Project of Shanghai Municipal Science and Technology Commission(N22DZ1100803)

基于注意力机制和多粒度特征融合的跨视角匹配模型

蔡美玉, 朱润哲, 吴飞(), 张开昱, 李家乐

上海工程技术大学电子电气工程学院，上海 201620

通讯作者: 吴飞
作者简介:蔡美玉（1998—），女，山东德州人，硕士研究生，主要研究方向：视觉定位、景象匹配定位
朱润哲（1998—），男，浙江嘉兴人，硕士研究生，主要研究方向：视觉地理定位、交叉视图匹配
张开昱（1999—），男，福建福州人，硕士研究生，主要研究方向：目标检测、目标跟踪、语义分割、图像生成
李家乐（1999—），男，江苏无锡人，硕士研究生，主要研究方向：目标检测、文档布局分析。
基金资助:
教育部中国高校产学研创新基金资助项目(2021ZYA08008);上海市科委项目(N22DZ1100803)

Abstract

Abstract:

Cross-view scene matching refers to the discovery of images of the same geographical target from different platforms （such as drones and satellites）. However， different image platforms lead to low accuracy of UAV （Unmanned Aerial Vehicle） positioning and navigation tasks， and the existing methods usually focus only on a single dimension of the image and ignore the multi-dimensional features of the image. To solve the above problems， GAMF （Global Attention and Multi-granularity feature Fusion） deep neural network was proposed to improve feature representation and feature distinguishability. Firstly， the images from the UAV perspective and the satellite perspective were combined， and the three branches were extended under the unified network architecture， the spatial location， channel and local features of the images from three dimensions were extracted. Then， by establishing the SGAM （Spatial Global relationship Attention Module） and CGAM （Channel Global Attention Module）， the spatial global relationship mechanism and channel attention mechanism were introduced to capture global information， so as to better carry out attention learning. Secondly， in order to fuse local perception features， a local division strategy was introduced to better improve the model’s ability to extract fine-grained features. Finally， the features of the three dimensions were combined as the final features to train the model. The test results on the public dataset University-1652 show that the AP （Average Precision） of the GAMF model on UAV visual positioning tasks reaches 87.41%， and the Recall （R@1） in UAV visual navigation tasks reaches 90.30%， which verifies that the GAMF model can effectively aggregate the multi-dimensional features of the image and improve the accuracy of UAV positioning and navigation tasks.

Key words: Unmanned Aerial Vehicle (UAV), scene matching and positioning, visual positioning, measurement learning, global relationship attention, deep learning

摘要：

跨视角景象匹配是指从不同平台（如无人机、卫星等）发现同一地理目标的图像。然而，不同图像平台会导致无人机（UAV）定位和导航任务精度较低，现有方法通常只关注图像的单一维度，忽略了图像的多维特征。针对上述问题，提出一种全局注意力和多粒度特征融合（GAMF）深度神经网络以改进特征表示，提高特征可区分度。首先，GAMF模型结合无人机视角和卫星视角的图像，在统一的网络架构下延展为3个分支，从3个维度提取图像的空间位置、通道和局部特征；然后，建立空间全局关系注意力模块（SGAM）和通道全局注意力模块（CGAM），引入空间全局关系机制和通道注意力机制捕获全局信息，从而更好地进行注意力学习；其次，为了融合局部感知特征，引入局部划分策略，以更好地增强模型提取细粒度特征的能力；最后，联合3个维度的特征作为最后的特征对模型训练。在公开数据集University-1652上的实验结果表明，GAMF模型在无人机视觉定位任务上的平均精准率（AP）达到了87.41%，在无人机视觉导航任务中召回率（R@1）达到了90.30%。验证了GAMF模型能够有效聚合图像的多维特征，提高无人机定位和导航任务的准确性。

关键词: 无人机, 景象匹配定位, 视觉定位, 度量学习, 全局关系注意力, 深度学习

CLC Number:

TP391.4

Meiyu CAI, Runzhe ZHU, Fei WU, Kaiyu ZHANG, Jiale LI. Cross-view matching model based on attention mechanism and multi-granularity feature fusion[J]. Journal of Computer Applications, 2024, 44(3): 901-908.

蔡美玉, 朱润哲, 吴飞, 张开昱, 李家乐. 基于注意力机制和多粒度特征融合的跨视角匹配模型[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 901-908.

Figures/Tables 16

References 21

1	YANG M. Research on vehicle automatic driving target perception technology based on improved MSRPN algorithm ［J］. Journal of Computational and Cognitive Engineering， 2022， 1（3）：147-151. 10.47852/bonviewjcce20514
2	蒋三生，郭辉，王尚，等. 无人机农业植保应用研究新进展［J］.农业科学，2022， 12（11）：1136-1142.
	JIANG S S， GUO H， WANG S， et al. New progress in the application of UAV in agricultural plant protection ［J］. Hans Journal of Agricultural Sciences， 2020， 12（11）：1136-1142.
3	申泽峰.无人机巡检系统在高速公路上的应用［J］.市政工程 2022， 7（7）：119-121.
	SHEN Z F.Application of UAV inspection system on expressway ［J］. Municipal Engineering， 2022， 7（7）：119-121.
4	ZHU R， YANG M， YIN L， et al. UAV’s status is worth considering： a fusion representations matching method for geo-localization ［J］.Sensors， 2023， 23（2）：720. 10.3390/s23020720
5	ZHENG Z， WEI Y， YANG Y. University-1652： a multi-view multi-source benchmark for drone-based geo-localization ［C］// Proceedings of the 28th ACM International Conference on Multimedia. New York： ACM， 2020： 1395-1403. 10.1145/3394171.3413896
6	上海工程技术大学. 一种用于无人机图像和卫星图像的跨视角景象匹配方法： CN202210889578.1 ［P］.2022-07-27.
	Shanghai University Of Engineering Science. A cross-perspective scene matching method for drone images and satellite images： CN202210889578.1 ［P］.2022-07-27.
7	DING L， ZHOU J， MENG L， et al. A practical cross-view image matching method between UAV and satellite for UAV-based geo-localization ［J］.Remote Sensing， 2020， 13（1）：47. 10.3390/rs13010047
8	WANG T， ZHENG Z， YAN C， et al. Each part matters： Local patterns facilitate cross-view geo-localization ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2022， 32（2）：867-879. 10.1109/tcsvt.2021.3061265
9	ZHUANG J， DAI M， CHEN X， et al. A faster and more effective cross-view matching method of UAV and satellite images for UAV geolocalization ［J］.Remote Sensing， 2021， 13（19）：3979. 10.3390/rs13193979
10	TIAN X， SHAO J， OUYANG D， et al. UAV-satellite view synthesis for cross-view geo-localization ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2022， 32（7）：4804-4815. 10.1109/tcsvt.2021.3121987
11	王泽宇，布树辉，黄伟，等.面向交通场景解析的局部和全局上下文注意力融合网络［J］.计算机应用， 2023， 43（3）：713-722.
	WANG Z Y， BU S H， HUANG W， et al. Local and global context attentive fusion network for traffic scene parsing ［J］.Journal of Computer Applications， 2023， 43（3）：713-722.
12	CHICCO D.Siamese neural networks： an overview ［M］// Artificial Neural Networks： Methods in Molecular Biology 2190. New York： Humana Press， 2021： 73-94. 10.1007/978-1-0716-0826-5_3
13	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale［EB/OL］. （2021-06-03）［2022-10-14］. .
14	申志军，穆丽娜，高静，等.细粒度图像分类综述［J］.计算机应用， 2023； 43（1）：51-60.
	SHEN Z J， MU L N， GAO J， et al. Review of fine-grained image categorization ［J］. Journal of Computer Applications， 2023， 43（1）：51-60.
15	LU J， STEINERBERGER S. Neural collapse under cross-entropy loss ［J］. Applied and Computational Harmonic Analysis， 2022，59： 224-241. 10.1016/j.acha.2021.12.011
16	BOUTROS F， DAMER N， KIRCHBUCHNER F， et al. Self-restrained triplet loss for accurate masked face recognition ［J］. Pattern Recognition， 2022，124：108473. 10.1016/j.patcog.2021.108473
17	ZHU R， YIN L， YANG M， et al. SUES-200： a multi-height multi-scene cross-view image benchmark across drone and satellite ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2023， 33（9）： 4825-4839. 10.1109/tcsvt.2023.3249204
18	LYU Z， ELSAID A， KARNS J， et al. An experimental study of weight initialization and Lamarckian inheritance on neuroevolution［C］// Proceedings of the 24th International Conference on Applications of Evolutionary Computation. Cham： Springer， 2021：584-600. 10.1007/978-3-030-72699-7_37
19	HE S， WANG Y. Cross-view geo-localization via salient feature partition network ［J］. Journal of Physics： Conference Series， 2021， 1914： 012009. 10.1088/1742-6596/1914/1/012009
20	周金坤，王先兰，穆楠，等.基于多视角多监督网络的无人机图像定位方法［J］.计算机应用，2022，42（10）：3191-3199. 10.11772/j.issn.1001-9081.2021081518
	ZHOU J K， WANG X L， MU N， et al. Unmanned aerial vehicle image localization method based on multi-view and multi-supervision network ［J］. Journal of Computer Applications，2022，42（10）：3191-3199. 10.11772/j.issn.1001-9081.2021081518
21	王嘉怡，陈子洋，袁小晨，等.面向跨视角地理定位的感知特征融合网络［J/OL］.计算机工程与应用［2023-04-01］. . 10.3778/j.issn.1002-8331.2209-0149
	WANG J Y， CHEN Z Y， YUAN X C， et al. Perceptual feature fusion network for cross-view geo-localization ［J/OL］. Journal of Computer Engineering and Applications ［2023-04-01］. . 10.3778/j.issn.1002-8331.2209-0149

数据集		样本数	类别数	学校数
训练集		43 253	701	33
测试集	Query_drone	37 855	701	39
	Query_satellite	701	701
	Gallery_drone	51 355	951
	Gallery_satellite	951	951

数据集		样本数	类别数	学校数
训练集		43 253	701	33
测试集	Query_drone	37 855	701	39
	Query_satellite	701	701
	Gallery_drone	51 355	951
	Gallery_satellite	951	951

层	参数
第1层	11，步长为1，填充为0
第2层	3×3，填充为1，groups为32
第3层	1×1，步长为1，填充为0
第4层	3×3，填充为1，groups为32

层	参数
第1层	11，步长为1，填充为0
第2层	3×3，填充为1，groups为32
第3层	1×1，步长为1，填充为0
第4层	3×3，填充为1，groups为32

库名	版本号	库名	版本号
time	1.7-25.1build1	torchvision	0.13.1
numpy	1.21.5	math	10.3.0
pandas	1.2.4	timm	0.6.7
torch	1.12.1+ch113	argparse	1.1
sys	1.5.12

Cross-view matching model based on attention mechanism and multi-granularity feature fusion

基于注意力机制和多粒度特征融合的跨视角匹配模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 16

References 21

Related Articles 15

Recommended Articles

Metrics

方法	Drone→Satellite		Satellite→Drone
方法	R@1	AP	R@1	AP
IL	58.23	62.91	74.47	59.45
LCM	66.65	70.82	79.89	65.38
SFPN	70.83	77.36	80.26	71.58
LPN	75.93	79.14	86.45	74.79
PFFNet	76.97	81.17	87.94	76.64
MMNet-distractors	81.15	84.92	—	—
MMNET	83.97	86.96	90.15	84.69
GAMF	85.33	87.41	90.30	84.52

方法	Drone→Satellite		Satellite→Drone
方法	R@1	AP	R@1	AP
Baseline	72.96	76.40	85.16	74.53
Baseline+LB	83.24	85.62	87.73	82.16
Baseline+LB+SGAM	85.39	87.45	90.01	84.26
Baseline+LB+SGAM+CGAM	85.33	87.41	90.30	84.52

粒度等级	Drone→Satellite		Satellite→Drone
粒度等级	R@1	AP	R@1	AP
1	84.94	87.08	89.44	84.41
2	85.33	87.41	90.30	84.52
3	85.19	87.28	89.59	84.33

粒度等级	方法	Drone→Satellite		Satellite→Drone
粒度等级	方法	R@1	AP	R@1	AP
2	均匀划分	85.33	87.41	90.30	84.52
2	重叠窗口划分	70.14	73.84	80.88	70.64
3	均匀划分	85.19	87.28	89.59	84.33
3	重叠窗口划分	66.23	70.28	78.32	65.87

[1]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[2]	Lingxia MU, Zhengjun ZHOU, Ban WANG, Youmin ZHANG, Xianghong XUE, Kaikai NING. Formation obstacle-avoidance and reconfiguration method for multiple UAVs [J]. Journal of Computer Applications, 2024, 44(9): 2938-2946.
[3]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[4]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[5]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[6]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[7]	Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557.
[8]	Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625.
[9]	Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650.
[10]	Zheng WU, Zhiyou CHENG, Zhentian WANG, Chuanjian WANG, Sheng WANG, Hui XU. Deep learning-based classification of head movement amplitude during patient anaesthesia resuscitation [J]. Journal of Computer Applications, 2024, 44(7): 2258-2263.
[11]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[12]	Zhi ZHANG, Xin LI, Naifu YE, Kaixi HU. DKP： defending against model stealing attacks based on dark knowledge protection [J]. Journal of Computer Applications, 2024, 44(7): 2080-2086.
[13]	Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318.
[14]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[15]	Xun SUN, Ruifeng FENG, Yanru CHEN. Monocular 3D object detection method integrating depth and instance segmentation [J]. Journal of Computer Applications, 2024, 44(7): 2208-2215.