To exploit the visual features of multi-modal industrial weld seam images and further improve the registration effect through modal translation, a modal translation-based network for pixel-level registration of multi-modal weld seam images was proposed. Firstly, a cross-modal translation module was designed to make the network have the capability to capture shared features of different modalities of industrial images. Then, the shared features were captured to perform multi-modal image registration. At the same time, adversarial loss and multi-level contrastive loss were used to improve the modal translation effect. Additionally, the cross-modal translation module was integrated with the unimodal image registration module, and reconstruction loss was employed to improve pixel-level registration performance. Finally, a multi-modal industrial weld seam image dataset was constructed, and experiments were conducted using this dataset for comparison. Experimental results demonstrate that the proposed network significantly outperforms the existing advanced multi-modal image registration models such as DFMIR (Discriminator-Free-Medical-Image-Registration) and IMSE (Indescribable Multi-modal Spatial Evaluator), achieving 3.9 and 3.2 percentage point increases in mean Intersection over Union (mIoU) and 16- and 11-pixel registration accuracy improvements in average Euclidean distance (aEd), thereby obtaining good results in pixel-level registration.