《计算机应用》唯一官方网站

• •    下一篇

基于改进Swin Transformer的电力图像检索方法

白翔1,李巨川1,2,王慧民1,景超1,2*,钮键2,张兴忠1,2,程永强1,3   

  1. 1.山西省能源互联网研究院,太原 030000; 2.太原理工大学 软件学院,山西 晋中 030600; 3.太原理工大学 信息与计算机学院,山西 晋中 030600
  • 收稿日期:2025-04-18 修回日期:2025-06-05 接受日期:2025-06-09 发布日期:2025-06-12 出版日期:2025-06-12
  • 通讯作者: 景超
  • 基金资助:
    山西省重点研发计划项目

Power image retrieval method based on improved Swin Transformer

  • Received:2025-04-18 Revised:2025-06-05 Accepted:2025-06-09 Online:2025-06-12 Published:2025-06-12

摘要: 针对现有图像检索方法难以有效辨别和提取电力设备的相似结构信息和纹理细节特征,导致检索精度和效率低的问题,提出基于改进Swin Transformer的电力图像检索方法(Power Image Retrieval method based on improved Swin Transformer,PIR-iSwinT)。首先,提出多特征结构交叉增强模块(MFSCE),通过结合梯度幅值图的交叉注意力机制增强模型对设备的结构和边缘特征感知能力;其次,设计自适应类间差异中心损失模块(AIDCL)加强模型对同类样本和异类样本的辨别能力;最后,构建层次聚类检索模块(HCR),优化检索过程中的样本匹配策略并减少计算复杂度,进一步提升检索精度和效率。在自建电力场景数据集和NUS-WIDE数据集上的实验结果表明,当哈希码长度为32bits时,PIR-iSwinT的最高平均检索精度分别达到96.76%和92.68%,与HRMPA(Hash image Retrieval based on Mixed attention and Polarization Asymmetric loss)相比分别提升了2.35%和0.56%,同时检索速率有三倍的提升。实验结果表明,PIR-iSwinT能有效提取和辨别电力设备的细节结构特征,提升检索效率,同时展现出良好的泛化能力,验证了所提方法的有效性。

关键词: 图像检索, Swin Transformer, 交叉注意力机制, 中心损失, 层次聚类

Abstract: Existing image retrieval methods struggle to effectively distinguish and extract similar structural information and texture details of power equipment, resulting in lower retrieval accuracy. To solve this problem, a Power Image Retrieval method based on improved Swin Transformer (PIR-iSwinT) was proposed. Firstly, a Multi-Feature Structure Cross-Enhancement module (MFSCE) was introduced to enhance the model's perception of structural and edge features by combining the cross-attention mechanism of the gradient magnitude map. Secondly, an Adaptive Inter-class Difference Center Loss module (AIDCL) was designed to strengthen the model's ability to distinguish between similar and dissimilar samples. Finally, a Hierarchical Clustering Retrieval module (HCR) was constructed to optimize the sample matching strategy during retrieval and reduce computational complexity, further enhancing retrieval accuracy and efficiency. Experimental results on the self-built power scenario dataset and the NUS-WIDE dataset show that PIR-iSwinT achieves the highest mean retrieval precision of 96.76% and 92.68% at a 32-bit hash code length, outperforming HRMPA (Hash image Retrieval based on Mixed attention and Polarization Asymmetric loss) by 2.35% and 0.56%, respectively. Retrieval speed is also improved by a factor of three. These results indicate that PIR-iSwinT effectively extracts and distinguishes fine-grained structural features of power equipment, enhances retrieval efficiency, and demonstrates good generalization capability, verifying the effectiveness of the proposed method.

Key words: image retrieval, Swin Transformer, cross-attention mechanism, center loss, hierarchical clustering

中图分类号: