Journal of Computer Applications ›› 0, Vol. ›› Issue (): 229-233.DOI: 10.11772/j.issn.1001-9081.2024040524

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Pixel-level registration technology for multi-modal weld seam images

Zhenrong HUANG1, Yao HUANG2, Wang TU2, Fei WANG2, Bin CHEN1,3()   

  1. 1.International Institute of Artificial Intelligence,Harbin Institute of Technology,Shenzhen,Shenzhen Guangdong 518055,China
    2.Shanghai Spaceflight Precision Machinery Institute,Shanghai Academy of Spaceflight Technology,Shanghai 201600,China
    3.Chongqing Research Institute,Harbin Institute of Technology,Chongqing 401151,China
  • Received:2024-04-26 Revised:2024-07-01 Accepted:2024-07-04 Online:2025-01-24 Published:2024-12-31
  • Contact: Bin CHEN

多模态焊缝图像像素级配准技术

黄振荣1, 黄瑶2, 涂旺2, 王飞2, 陈斌1,3()   

  1. 1.哈尔滨工业大学(深圳) 国际人工智能研究院,广东 深圳 518055
    2.上海航天技术研究院 上海航天精密机械研究所,上海 201600
    3.哈尔滨工业大学 重庆研究院,重庆 401151
  • 通讯作者: 陈斌
  • 作者简介:黄振荣(2000—),男,广东梅州人,硕士研究生,主要研究方向:多模态图像配准、图像生成
    黄瑶(1996—),女,江苏南通人,工程师,硕士,主要研究方向:自动化、智能化检测
    涂旺(1998—),男,江西宜春人,助理工程师,硕士,主要研究方向:射线检测
    王飞(1986—),男,江苏南通人,研究员,硕士,主要研究方向:无损检测
    陈斌(1970—),男,四川广汉人,研究员,博士,主要研究方向:工业视觉。
  • 基金资助:
    深圳市2022年高校稳定支持计划项目(GXWD?20220811170603002)

Abstract:

To exploit the visual features of multi-modal industrial weld seam images and further improve the registration effect through modal translation, a modal translation-based network for pixel-level registration of multi-modal weld seam images was proposed. Firstly, a cross-modal translation module was designed to make the network have the capability to capture shared features of different modalities of industrial images. Then, the shared features were captured to perform multi-modal image registration. At the same time, adversarial loss and multi-level contrastive loss were used to improve the modal translation effect. Additionally, the cross-modal translation module was integrated with the unimodal image registration module, and reconstruction loss was employed to improve pixel-level registration performance. Finally, a multi-modal industrial weld seam image dataset was constructed, and experiments were conducted using this dataset for comparison. Experimental results demonstrate that the proposed network significantly outperforms the existing advanced multi-modal image registration models such as DFMIR (Discriminator-Free-Medical-Image-Registration) and IMSE (Indescribable Multi-modal Spatial Evaluator), achieving 3.9 and 3.2 percentage point increases in mean Intersection over Union (mIoU) and 16- and 11-pixel registration accuracy improvements in average Euclidean distance (aEd), thereby obtaining good results in pixel-level registration.

Key words: industrial weld seam imaging, multi-modal image registration, modal translation, shared feature, multi-level contrastive learning

摘要:

为了挖掘多模态工业焊缝图像视觉特征并通过模态翻译的方式进一步提高多模态工业图像的配准效果,提出一种基于模态翻译的多模态焊缝图像像素级配准网络。首先,通过设计跨模态翻译模块赋予网络捕获不同模态工业图像共享特征的感知能力;其次,捕获共享特征以进行多模态图像配准,并利用对抗性损失和多层级对比损失提高模态翻译效果;同时,结合跨模态翻译模块与单模态图像配准模块,并通过重构损失提升像素级配准性能;最后,构建多模态工业焊缝图像数据集,并基于此数据集开展对比实验。实验结果表明,相较于DFMIR(Discriminator-Free-Medical-Image-Registration)和IMSE(Indescribable Multi-modal Spatial Evaluator)等现有的先进多模态图像配准模型,所提网络在平均交并比(mIoU)上提升了3.9、3.2个百分点,在平均欧氏距离(aEd)上提升了约16、11个像素点的配准精度,在像素级别配准上取得了较好的结果。

关键词: 工业焊缝成像, 多模态图像配准, 模态翻译, 共享特征, 多层级对比学习

CLC Number: