Multi-attention contrastive learning for infrared small target detection

doi:10.11772/j.issn.1001-9081.2024101554

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (11): 3707-3712.DOI: 10.11772/j.issn.1001-9081.2024101554

• Multimedia computing and computer simulation • Previous Articles

Multi-attention contrastive learning for infrared small target detection

Xiaoyong BIAN¹^,²^,³(), Qiren HU¹

^1.School of Computer Science and Technology，Wuhan University of Science and Technology，Wuhan Hubei 430065，China
^2.Institute of Big Data Science and Engineering，Wuhan University of Science and Technology，Wuhan Hubei 430065，China
^3.Key Laboratory of Hubei Province for Intelligent Information Processing and Real-time Industrial System （Wuhan University of Science and Technology），Wuhan Hubei 430065，China

Received:2024-11-01 Revised:2025-01-09 Accepted:2025-01-09 Online:2025-01-13 Published:2025-11-10
Contact: Xiaoyong BIAN
About author:HU Qiren， born in 1995， M. S. candidate. His research interests include small target detection.
Supported by:
This work is partially supported by National Natural Science Foundation of China(62372343)

多注意力对比学习的红外小目标检测

边小勇¹^,²^,³(), 胡其仁¹

^1.武汉科技大学计算机科学与技术学院，武汉 430065
^2.武汉科技大学大数据科学与工程研究院，武汉 430065
^3.智能信息处理与实时工业系统湖北省重点实验室（武汉科技大学），武汉 430065

通讯作者: 边小勇
作者简介:胡其仁（1995—），男，湖北仙桃人，硕士研究生，主要研究方向：小目标检测。
基金资助:
国家自然科学基金资助项目(62372343)

Abstract

Abstract:

InfRared Small Target Detection （IRSTD） is a hotspot and suffers from difficulties in the field of target detection. IRSTD is difficult to learn accurate feature representation from limited and distorted information of small targets due to its characteristics of small pixels， low contrast and lacking texture， thus IRSTD methods still face many challenges. To address the above issue， a multi-attention contrastive learning based IRSTD method was proposed. Firstly， with U-Net adopted as the fundamental framework， a Context Mixer Block （CMB） that integrates Frequency Attention （FA） and spatial attention was proposed during the encoding phase to produce a preliminary attention feature map. Then， in the decoding phase， a Multi-Kernel Central Difference Convolution （MKCDC） was designed to extract the core information of small targets， which remained stable with different scales. Finally， by combining binary cross-entropy loss and contrastive loss functions， the small target detection network was trained， so that the feature representation ability of small targets was enhanced and a discriminative small target detection model was obtained. Experimental results show that the Probability of detection （Pd） of the proposed method on IRSTD-1k and NUAA-SIRST datasets reaches 96.63% and 100.00% respectively， which is improved by 4.71 and 1.90 percentage points， respectively， compared with Dense Nested Attention Network （DNA-Net）. It can be seen that the proposed method improves the performance of IRSTD effectively.

Key words: deep learning, small target detection, attention U-Net, Central Difference Convolution (CDC), contrastive learning

摘要：

红外小目标检测（IRSTD）是目标检测领域中的研究热点和难点，具有像素小、对比度低和无纹理的特性，难以从小目标有限和扭曲的信息中学习正确的特征表示，因此IRSTD方法依然面临挑战。针对以上问题，提出多注意力对比学习的IRSTD方法。首先，采用U-Net为基本框架，在编码阶段提出一种融合频率注意力（FA）和空间注意力的上下文混合块（CMB），产生初级注意力特征图；其次，在解码阶段设计多核中心差分卷积（MKCDC），用于提取小目标在不同尺度下都稳定表征的核心信息；最后，联合二元交叉熵损失和对比损失函数训练小目标检测网络，提高小目标特征表示能力，得到富于判别的小目标检测模型。实验结果表明，在IRSTD-1k和NUAA-SIRST数据集上，所提方法的检测率（Pd）分别达到96.63%和100.00%，与密集嵌套的注意力网络（DNA-Net）相比，分别提高了4.71和1.90个百分点。可见，所提方法有效提高了IRSTD性能。

关键词: 深度学习, 小目标检测, 注意力U-Net, 中心差分卷积, 对比学习

CLC Number:

TP391.4

Xiaoyong BIAN, Qiren HU. Multi-attention contrastive learning for infrared small target detection[J]. Journal of Computer Applications, 2025, 45(11): 3707-3712.

边小勇, 胡其仁. 多注意力对比学习的红外小目标检测[J]. 《计算机应用》唯一官方网站, 2025, 45(11): 3707-3712.

Figures/Tables 8

References 20

[1]	BAI X， ZHOU F. Analysis of new top-hat transformation and the application for infrared dim small target detection［J］. Pattern Recognition， 2010， 43（6）： 2145-2156.
[2]	GAO C， MENG D， YANG Y， et al. Infrared patch-image model for small target detection in a single image［J］. IEEE Transactions on Image Processing， 2013， 22（12）： 4996-5009.
[3]	DAI Y， WU Y. Reweighted infrared patch-tensor model with both nonlocal and local priors for single-frame small target detection［J］. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing， 2017， 10（8）： 3752-3767.
[4]	WANG H， ZHOU L， WANG L. Miss detection vs. false alarm： adversarial learning for small object segmentation in infrared images［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 8508-8517.
[5]	DAI Y， WU Y， ZHOU F， et al. Attentional local contrast networks for infrared small target detection［J］. IEEE Transactions on Geoscience and Remote Sensing， 2021， 59（11）： 9813-9824.
[6]	LI B， XIAO C， WANG L， et al. Dense nested attention network for infrared small target detection［J］. IEEE Transactions on Image Processing， 2023， 32： 1745-1758.
[7]	WU X， HONG D， CHANUSSOT J. UIU-Net： U-Net in U-Net for infrared small object detection［J］. IEEE Transactions on Image Processing， 2023， 32： 364-376.
[8]	CHUNG W Y， LEE I H， PARK C G. Lightweight infrared small target detection network using full-scale skip connection U-net［J］. IEEE Geoscience and Remote Sensing Letters， 2023， 20： No.7000705.
[9]	ZHANG T， LI L， CAO S， et al. Attention-guided pyramid context networks for detecting infrared small target under complex background［J］. IEEE Transactions on Aerospace and Electronic Systems， 2023， 59（4）： 4250-4261.
[10]	ZHANG M， ZHANG R， ZHANG J. Dim2Clear network for infrared small target detection［J］. IEEE Transactions on Geoscience and Remote Sensing， 2023， 61： No.5001714.
[11]	ZHANG M， ZHANG R， YANG Y. ISNet： shape matters for infrared small target detection［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 867-876.
[12]	刘奎，唐慧萍，苏本跃. 门控卷积和高频特征融合的红外小目标检测［J］. 计算机工程与应用， 2025， 61（7）： 306-314.
	LIU K， TANG H P， SU B Y. Gated convolution and high-frequency feature fusion for infrared small target detection［J］. Computer Engineering and Applications， 2025， 61（7）： 306-314.
[13]	ZHANG M， YANG H， GUO J， et al. IRPruneDet： efficient infrared small target detection via wavelet structure-regularized soft channel pruning［C］// Proceedings of the 38th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2024： 7224-7232.
[14]	HUANG Y， ZHI X， HU J， et al. FDDBA-NET： frequency domain decoupling bidirectional interactive attention network for infrared small target detection［J］. IEEE Transactions on Geoscience and Remote Sensing， 2024， 62： No.5004416.
[15]	代少升，刘科生，黄炼，等. 基于视觉Transformer和双解码器的红外小目标检测方法［J］. 红外技术， 2024， 46（9）： 1070-1080.
	DAI S S， LIU K S， HUANG L， et al. Infrared small target detection method with vision Transformer and dual decoder［J］. Infrared Technology， 2024， 46（9）： 1070-1080.
[16]	王林，刘景亮，王无为. 基于空洞卷积融合Transformer的无人机图像小目标检测方法［J］. 计算机应用， 2024， 44（11）： 3595-3602.
	WANG L， LIU J L， WANG W W. Small target detection method in UAV images based on fusion of dilated convolution and Transformer［J］. Journal of Computer Applications， 2024， 44（11）： 3595-3602.
[17]	XIE S， GIRSHICK R， DOLLÁR P， et al. Aggregated residual transformations for deep neural networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5987-5995.
[18]	LI Y， YAO T， PAN Y， et al. Contextual transformer networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023， 45（2）： 1489-1500.
[19]	CHEN G， WANG Z， WANG W， et al. Holistic modularization of local contrast in the end-to-end network for infrared small target detection［J］. IEEE Geoscience and Remote Sensing Letters， 2023， 20： No.7001305.
[20]	LIU Q， LIU R， ZHENG B， et al. Infrared small target detection with scale and location sensitivity［C］// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2024： 17490-17499.

方法	IRSTD-1k			NUAA-SIRST
方法	IoU/%	Pd/%	Fa/10^-6	IoU/%	Pd/%	Fa/10^-6
IPI^［2］	27.92	81.37	16.18	25.67	85.55	11.47
RIPT^［3］	14.11	77.55	28.31	11.05	79.08	22.61
MDvsFA^［4］	49.50	82.11	80.33	60.30	89.35	56.35
ALCNet^［5］	62.05	92.19	31.56	74.31	97.34	20.21
DNA-Net^［6］	67.54	91.92	8.81	77.54	98.10	2.51
AMFU-net^［8］	67.85	89.90	15.98	75.86	100.00	5.86
AGPCNet^［9］	62.39	90.91	21.29	69.09	93.94	50.00
Dim2Clear^［10］	66.30	93.70	20.90	77.20	99.10	6.70
HoLoCoNet^［19］	70.15	94.28	23.97	67.59	100.00	22.46
MSHNet^［20］	68.57	91.91	17.63	76.93	99.10	5.86
本文方法	70.07	96.63	10.80	76.53	100.00	9.20

方法	IRSTD-1k			NUAA-SIRST
方法	IoU/%	Pd/%	Fa/10^-6	IoU/%	Pd/%	Fa/10^-6
IPI^［2］	27.92	81.37	16.18	25.67	85.55	11.47
RIPT^［3］	14.11	77.55	28.31	11.05	79.08	22.61
MDvsFA^［4］	49.50	82.11	80.33	60.30	89.35	56.35
ALCNet^［5］	62.05	92.19	31.56	74.31	97.34	20.21
DNA-Net^［6］	67.54	91.92	8.81	77.54	98.10	2.51
AMFU-net^［8］	67.85	89.90	15.98	75.86	100.00	5.86
AGPCNet^［9］	62.39	90.91	21.29	69.09	93.94	50.00
Dim2Clear^［10］	66.30	93.70	20.90	77.20	99.10	6.70
HoLoCoNet^［19］	70.15	94.28	23.97	67.59	100.00	22.46
MSHNet^［20］	68.57	91.91	17.63	76.93	99.10	5.86
本文方法	70.07	96.63	10.80	76.53	100.00	9.20

方法	参数量/MB	计算量/GFLOPs
MDvsFA^［4］	3.59	230.14
ALCNet^［5］	8.56	14.52
DNA-Net^［6］	4.69	56.34
AMFU-net^［8］	2.17	23.80
AGPCNet^［9］	11.79	40.22
HoLoCoNet^［19］	0.70	57.80
MSHNet^［20］	4.07	24.43
本文方法	4.15	34.31

方法	参数量/MB	计算量/GFLOPs
MDvsFA^［4］	3.59	230.14
ALCNet^［5］	8.56	14.52
DNA-Net^［6］	4.69	56.34
AMFU-net^［8］	2.17	23.80
AGPCNet^［9］	11.79	40.22
HoLoCoNet^［19］	0.70	57.80
MSHNet^［20］	4.07	24.43
本文方法	4.15	34.31

方法	IRSTD-1k				NUAA-SIRST
方法	IoU/%	nIoU/%	Pd/%	Fa/10^-6	IoU/%	nIoU/%	Pd/%	Fa/10^-6
本文方法	70.07	64.59	96.63	10.80	76.53	74.62	100.00	9.20
w/o CMB	68.18	63.82	95.62	12.79	74.24	72.55	100.00	12.38
w/o MKCDC	66.99	61.20	95.29	14.92	74.01	70.92	99.08	10.90
w/o CMB+MKCDC	65.18	62.69	94.28	10.08	72.81	70.88	99.08	26.00

Multi-attention contrastive learning for infrared small target detection

多注意力对比学习的红外小目标检测

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 20

Related Articles 15

Recommended Articles

Metrics

[1]	Weigang LI, Jiale SHAO, Zhiqiang TIAN. Point cloud classification and segmentation network based on dual attention mechanism and multi-scale fusion [J]. Journal of Computer Applications, 2025, 45(9): 3003-3010.
[2]	Zhixiong XU, Bo LI, Xiaoyong BIAN, Qiren HU. Adversarial sample embedded attention U-Net for 3D medical image segmentation [J]. Journal of Computer Applications, 2025, 45(9): 3011-3016.
[3]	Chao LIU, Yanhua YU. Knowledge-aware recommendation model combining denoising strategy and multi-view contrastive learning [J]. Journal of Computer Applications, 2025, 45(9): 2827-2837.
[4]	Hongjun ZHANG, Gaojun PAN, Hao YE, Yubin LU, Yiheng MIAO. Multi-source heterogeneous data analysis method combining deep learning and tensor decomposition [J]. Journal of Computer Applications, 2025, 45(9): 2838-2847.
[5]	Jin LI, Liqun LIU. SAR and visible image fusion based on residual Swin Transformer [J]. Journal of Computer Applications, 2025, 45(9): 2949-2956.
[6]	Bing YIN, Zhenhua LING, Yin LIN, Changfeng XI, Ying LIU. Emotion recognition method compatible with missing modal reasoning [J]. Journal of Computer Applications, 2025, 45(9): 2764-2772.
[7]	Panfeng JING, Yudong LIANG, Chaowei LI, Junru GUO, Jinyu GUO. Semi-supervised image dehazing algorithm based on teacher-student learning [J]. Journal of Computer Applications, 2025, 45(9): 2975-2983.
[8]	Lina GE, Mingyu WANG, Lei TIAN. Review of research on efficiency of federated learning [J]. Journal of Computer Applications, 2025, 45(8): 2387-2398.
[9]	Zhiyuan WANG, Tao PENG, Jie YANG. Integrating internal and external data for out-of-distribution detection training and testing [J]. Journal of Computer Applications, 2025, 45(8): 2497-2506.
[10]	Peng PENG, Ziting CAI, Wenling LIU, Caihua CHEN, Wei ZENG, Baolai HUANG. Speech emotion recognition method based on hybrid Siamese network with CNN and bidirectional GRU [J]. Journal of Computer Applications, 2025, 45(8): 2515-2521.
[11]	Shuo ZHANG, Guokai SUN, Yuan ZHUANG, Xiaoyu FENG, Jingzhi WANG. Dynamic detection method of eclipse attacks for blockchain node analysis [J]. Journal of Computer Applications, 2025, 45(8): 2428-2436.
[12]	Yanhua LIAO, Yuanxia YAN, Wenlin PAN. Multi-target detection algorithm for traffic intersection images based on YOLOv9 [J]. Journal of Computer Applications, 2025, 45(8): 2555-2565.
[13]	Jinxian SUO, Liping ZHANG, Sheng YAN, Dongqi WANG, Yawen ZHANG. Review of interpretable deep knowledge tracing methods [J]. Journal of Computer Applications, 2025, 45(7): 2043-2055.
[14]	Jin XIE, Surong CHU, Yan QIANG, Juanjuan ZHAO, Hua ZHANG, Yong GAO. Dual-branch distribution consistency contrastive learning model for hard negative sample identification in chest X-rays [J]. Journal of Computer Applications, 2025, 45(7): 2369-2377.
[15]	Zhenzhou WANG, Fangfang GUO, Jingfang SU, He SU, Jianchao WANG. Robustness optimization method of visual model for intelligent inspection [J]. Journal of Computer Applications, 2025, 45(7): 2361-2368.