《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (4): 1334-1343.DOI: 10.11772/j.issn.1001-9081.2025040416

• 前沿与综合应用 • 上一篇    

基于改进Swin Transformer的电力图像检索方法

白翔1, 李巨川1,2, 王慧民1, 景超1,2(), 钮键2, 张兴忠1,2, 程永强1,3   

  1. 1.山西省能源互联网研究院,太原 030000
    2.太原理工大学 软件学院,山西 晋中 030600
    3.太原理工大学 计算机科学与技术学院,山西 晋中 030600
  • 收稿日期:2025-04-18 修回日期:2025-06-05 接受日期:2025-06-09 发布日期:2025-06-12 出版日期:2026-04-10
  • 通讯作者: 景超
  • 作者简介:白翔(1992—),男,山西柳林人,工程师,硕士,主要研究方向:人工智能、能源互联网
    李巨川(2001—),男,山西运城人,硕士研究生,主要研究方向:多模态图像检索
    王慧民(1995—),男,山西古交人,硕士,主要研究方向:人工智能、图像识别
    钮键(1986—),女,河北保定人,博士研究生,主要研究方向:选煤智能化
    张兴忠(1964—),男,山西汾阳人,教授,硕士,主要研究方向:计算机视觉、人工智能
    程永强(1969—),男,山西祁县人,教授,博士,主要研究方向:图像重建、信息处理。
  • 基金资助:
    山西省重点研发计划项目(202202130501008)

Power image retrieval method based on improved Swin Transformer

Xiang BAI1, Juchuan LI1,2, Huimin WANG1, Chao JING1,2(), Jian NIU2, Xingzhong ZHANG1,2, Yongqiang CHENG1,3   

  1. 1.Shanxi Energy Internet Research Institute,Taiyuan Shanxi 030000,China
    2.School of Software,Taiyuan University of Technology,Jinzhong Shanxi 030600,China
    3.College of Computer Science and Technology,Taiyuan University of Technology,Jinzhong Shanxi 030600,China
  • Received:2025-04-18 Revised:2025-06-05 Accepted:2025-06-09 Online:2025-06-12 Published:2026-04-10
  • Contact: Chao JING
  • About author:BAI Xiang, born in 1992, M. S., engineer. His research interests include artificial intelligence, energy internet.
    LI Juchuan, born in 2001, M. S. candidate. His research interests include multimodal image retrieval.
    WANG Huimin, born in 1995, M. S. His research interests include artificial intelligence, image recognition.
    NIU Jian, born in 1986, Ph. D. candidate. Her research interests include intelligent coal preparation.
    ZHANG Xingzhong, born in 1964, M. S., professor. His research interests include computer vision, artificial intelligence.
    CHENG Yongqiang, born in 1969, Ph. D., professor. His research interests include image reconstruction, information processing.
  • Supported by:
    Key Research and Development Program of Shanxi Province(202202130501008)

摘要:

针对现有的图像检索方法难以有效辨别和提取电力设备的相似结构信息和纹理细节特征,导致检索精度和效率低的问题,提出基于改进Swin Transformer的电力图像检索方法(PIR-iSwinT)。首先,提出多特征结构交叉增强模块(MFSCE),通过结合梯度幅值图的交叉注意力机制增强模型对设备结构和边缘特征的感知能力;其次,设计自适应类间差异中心损失模块(AIDCL)加强模型对同类样本和异类样本的辨别能力;最后,构建层次聚类检索模块(HCR)优化检索过程中的样本匹配策略并降低计算复杂度,进一步提升检索精度和效率。在自建电力场景数据集和NUS-WIDE数据集上的实验结果表明,当哈希码长度为32 bit时,PIR-iSwinT的平均精度均值(mAP)分别达到96.76%和92.68%,与HRMPA(Hash image Retrieval based on Mixed attention and Polarization Asymmetric loss)相比分别提升了2.35%和0.56%。可见,PIR-iSwinT能有效提取和辨别电力设备的细节结构特征,提升检索效率,同时展现出良好的泛化能力,验证了所提方法的有效性。

关键词: 图像检索, Swin Transformer, 交叉注意力机制, 中心损失, 层次聚类

Abstract:

The existing image retrieval methods struggle to distinguish and extract similar structural information and texture details of power equipment effectively, resulting in low retrieval accuracy and efficiency. To solve these problems, a Power Image Retrieval method based on improved Swin Transformer (PIR-iSwinT) was proposed. Firstly, a Multi-Feature Structure Cross-Enhancement module (MFSCE) was introduced to enhance the model's perception ability of equipment structural and edge features by combining cross-attention mechanism of the gradient magnitude map. Secondly, an Adaptive Inter-class Difference Center Loss module (AIDCL) was designed to strengthen the model's ability to distinguish between similar and dissimilar samples. Finally, a Hierarchical Clustering Retrieval module (HCR) was constructed to optimize the sample matching strategy during retrieval and reduce computational complexity, thereby further enhancing retrieval accuracy and efficiency. Experimental results on the self-built power scenario dataset and the NUS-WIDE dataset show that PIR-iSwinT achieves the mean Average Precision (mAP) of 96.76% and 92.68%, respectively, at a 32 bit hash code length, outperforming HRMPA (Hash image Retrieval based on Mixed attention and Polarization Asymmetric loss) by 2.35% and 0.56%, respectively. It can be seen that PIR-iSwinT extracts and distinguishes detailed structural features of power equipment effectively, enhances retrieval efficiency, and demonstrates good generalization capability, verifying effectiveness of the proposed method.

Key words: image retrieval, Swin Transformer, cross-attention mechanism, center loss, hierarchical clustering

中图分类号: