《计算机应用》唯一官方网站

• •    下一篇

基于图像编辑代理的零样本遥感图像复合检索算法

张杰1,黄智勇1,王瑞锦1,张凤荔2   

  1. 1. 电子科技大学
    2. 电子科技大学 计算机科学与工程学院,成都 611731;
  • 收稿日期:2025-07-21 修回日期:2025-11-07 发布日期:2025-12-22 出版日期:2025-12-22
  • 通讯作者: 张杰
  • 基金资助:
    国家自然科学基金;四川省科技计划“揭榜挂帅”项目

Zero shot composed image retrieval for remote sensing based on image edit proxy

  • Received:2025-07-21 Revised:2025-11-07 Online:2025-12-22 Published:2025-12-22
  • Supported by:
    National Natural Science Foundation of China;Sichuan Provincial Science and Technology Plan “Unveiling and Leading” Project

摘要: 随着图像复合检索(CIR)技术的快速发展,研究者开始探索将其应用于遥感图像检索领域,以提高从遥感图像库中检索目标图像的准确性。然而,现有算法未能有效解决图像与文本模态间的语义鸿沟问题并受限于遥感领域缺乏适用于图像复合检索模型训练的高质量标注数据集。针对这些挑战,文中提出了一种零样本的基于图像编辑代理的遥感图像复合检索( IEP4RS)算法,通过图像编辑技术生成与查询图像和文本描述对齐的代理图像,以增强查询表征。IEP4RS基于查询图像与目标图像的文本描述生成图像编辑指令,将指令与查询图像输入图像编辑模型生成代理图像,通过融合代理图像与原始查询图像的特征,构建复合查询图像特征。该算法通过图像特征的直接匹配有效跨越了图文模态间的语义鸿沟,并采用零样本学习范式,避免了传统算法对标注数据集的依赖。在公开的遥感图文复合检索基准数据集PatternCom上的实验结果表明,IEP4RS算法显著提升了检索性能,相较基线WEICOM(WEIghted COMposed Image Retrieval Method)提升了9.74个百分点、相较主流零样本图像复合检索算法Pic2Word(Mapping Pictures to Words for Zero-shot Composed Image Retrieval)、SEARLE(zero-Shot composEd imAge Retrieval with textuaL invErsion)以及FREEDOM(Composed Image Retrieval for Training-FREE DOMain Conversion)则分别提升了11.79、7.81以及3.99个百分点。

关键词: 图像复合检索, 遥感图像, 信息检索, 图像编辑, 零样本图像复合检索

Abstract: With the rapid development of Composed Image Retrieval (CIR), its application isexplored in the field of Remote Sensing (RS) to improve the accuracy of retrieving target images from RS image databases. However, existing algorithms failed to effectively bridge the semantic gap between image and text modalities and were limited by the lack of high-quality annotated datasets for training CIR models in the RS domain. To address these challenges, a zero-shot algorithm termed IEP4RS (zero shot Composed Image Retrieval for Remote Sensing Based on Image Edit Proxy) was proposed. Image editing techniques were utilized by this algorithm to generate proxy images aligned with the query image and text description, thereby enhancing the query representation. Specifically, image editing instructions were first generated based on the query image and the text description of the target image. These instructions, along with the query image, were then fed into an image editing model to produce a proxy image. A composite query representation was constructed by fusing the features of this proxy image and the original query image. This algorithm effectively bridges the semantic gap between modalities through direct image feature matching. Adopting a zero-shot learning paradigm, the algorithm avoids the dependency on annotated datasets required by traditional approaches. Experimental results on the public remote sensing composed image retrieval benchmark dataset, PatternCom, demonstrate that the proposed IEP4RS algorithm significantly improves retrieval performance. Compared to the baseline WEICOM (WEIghted COMposed Image Retrieval Method), an improvement of 9.74 percentage points is observed. Furthermore, the algorithm outperforms mainstream zero-shot composed image retrieval methods, achieving improvements of 11.79, 7.81, and 3.99 percentage points over Pic2Word (Mapping Pictures to Words for Zero-shot Composed Image Retrieval), SEARLE (zero-Shot composEd imAge Retrieval with textuaL invErsion), and FREEDOM (Composed Image Retrieval for Training-FREE DOMain Conversion), respectively.

Key words: Composed Image Retrieval (CIR), remote sensing image, information retrieval, image editing, Zero-Shot Composed Image Retrieval (ZS-CIR)

中图分类号: