Vision foundation model-driven pixel-level image anomaly detection method

doi:10.11772/j.issn.1001-9081.2024091398

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (3): 823-831.DOI: 10.11772/j.issn.1001-9081.2024091398

• Frontier research and typical applications of large models • Previous Articles Next Articles

Vision foundation model-driven pixel-level image anomaly detection method

Zhenhua XUE¹, Qiang LI¹, Chao HUANG²()

^1.China Energy Institute of Transportation Technology Research Company Limited，Beijing 100080，China
^2.School of Cyber Science and Technology，Shenzhen Campus of Sun Yat-sen University，Shenzhen Guangdong 518107，China

Received:2024-10-07 Revised:2024-12-01 Accepted:2024-12-03 Online:2025-01-14 Published:2025-03-10
Contact: Chao HUANG
About author:XUE Zhenhua， born in 1983， M. S.， economist. His research interests include defect detection， efficient heavy-duty transportation.
LI Qiang， born in 1996， M. S.， engineer. His research interests include defect detection， intelligent equipment.
Supported by:
National Natural Science Foundation of China(62301621);Shenzhen Science and Technology Program(20231121172359002)

视觉基础模型驱动的像素级图像异常检测方法

薛振华¹, 李强¹, 黄超²()

^1.国能运输技术研究院有限责任公司，北京 100080
^2.中山大学深圳校区网络空间安全学院，广东深圳 518107

通讯作者: 黄超
作者简介:薛振华（1983—），男，山西大同人，经济师，硕士，主要研究方向：缺陷检测、高效重载运输
李强（1996—），男，山西神池人，工程师，硕士，主要研究方向：缺陷检测、智能装备；
基金资助:
国家自然科学基金资助项目(62301621);深圳市科技计划项目(20231121172359002)

Abstract

Abstract:

While previous anomaly detection methods have achieved high-precision detection in specific scenarios， but their applicability is constrained by their lack of generalizability and automation. Thus， a Vision Foundation Model （VFM）-driven pixel-level image anomaly detection method， namely SSMOD-Net （State Space Model driven-Omni Dimensional Net）， was proposed with the aim of achieving more accurate industrial defect detection. Unlike the existing methods， SSMOD-Net achieved automated prompting of SAM （Segment Anything Model） without the need for fine-tuning SAM， making it particularly suitable for scenarios that require processing large-scale industrial visual data. The core of SSMOD-Net is a novel prompt encoder driven by a state space model， which was able to generate prompts dynamically based on the input image of SAM. With this design， the model was allowed to introduce additional guidance information through the prompt encoder while preserving SAM’s architecture， thereby enhancing detection accuracy. A residual multi-scale module was integrated in the prompt encoder， and this module was constructed based on the state space model and was able to use multi-scale and global information comprehensively. Through iterative search， the module found optimal prompts in the prompt space and provided the prompts to SAM as high-dimensional tensors， thereby strengthening the model’s ability to recognize industrial anomalies. Moreover， the proposed method did not require any modifications to SAM， thereby avoiding the need for complex fine-tuning of the training schedules. Experimental results on several datasets show that the proposed method has excellent performance， and achieves better results in mE （mean E-measure） and Mean Absolute Error （MAE）， Dice， and Intersection over Union （IoU） compared to methods such as AutoSAM and SAM-EG （SAM with Edge Guidance framework for efficient polyp segmentation）.

Key words: deep learning, pixel-level anomaly detection, Vision Foundation Model (VFM), SAM (Segment Anything Model), automated prompting

摘要：

现有的异常检测方法能在特定应用场景下实现高精度检测，然而这些方法难以适用于其他应用场景，且自动化程度有限。因此，提出一种视觉基础模型（VFM）驱动的像素级图像异常检测方法SSMOD-Net（State Space Model driven-Omni Dimensional Net），旨在实现更精确的工业缺陷检测。与现有方法不同，SSMOD-Net实现SAM（Segment Anything Model）的自动化提示且不需要微调SAM，因此特别适用于需要处理大规模工业视觉数据的场景。SSMOD-Net的核心是一个新颖的提示编码器，该编码器由状态空间模型驱动，能够根据SAM的输入图像动态地生成提示。这一设计允许模型在保持SAM架构不变的同时，通过提示编码器引入额外的指导信息，从而提高检测精度。提示编码器内部集成一个残差多尺度模块，该模块基于状态空间模型构建，能够综合利用多尺度信息和全局信息。这一模块通过迭代搜索，在提示空间中寻找最优的提示，并将这些提示以高维张量的形式提供给SAM，从而增强模型对工业异常的识别能力。而且所提方法不需要对SAM进行任何修改，从而避免复杂的对训练计划的微调需求。在多个数据集上的实验结果表明，所提方法展现出了卓越的性能，与AutoSAM和SAM-EG（SAM with Edge Guidance framework for efficient polyp segmentation）等方法相比，所提方法在mE（mean E-measure）和平均绝对误差（MAE）、Dice和交并比（IoU）上都取得了较好的结果。

关键词: 深度学习, 像素级异常检测, 视觉基础模型, SAM, 自动提示

CLC Number:

TP391.41

Zhenhua XUE, Qiang LI, Chao HUANG. Vision foundation model-driven pixel-level image anomaly detection method[J]. Journal of Computer Applications, 2025, 45(3): 823-831.

薛振华, 李强, 黄超. 视觉基础模型驱动的像素级图像异常检测方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 823-831.

Figures/Tables 14

References 49

1	HUANG C， WEN J， XU Y， et al. Self-supervised attentive generative adversarial networks for video anomaly detection ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2023， 34（11）： 9389-9403.
2	KIM T， LEE H， KIM D. UACANet： uncertainty augmented context attention for polyp segmentation ［C］// Proceedings of the 29th ACM International Conference on Multimedia. New York： ACM， 2021： 2167-2175.
3	HUANG C， LIU C， ZHANG Z， et al. Pixel-level anomaly detection via uncertainty-aware prototypical Transformer ［C］// Proceedings of the 30th ACM International Conference on Multimedia. New York： ACM， 2022： 521-530.
4	MAMONOV A V， FIGUEIREDO I N， FIGUEIREDO P N， et al. Automated polyp detection in colon capsule endoscopy ［J］. IEEE Transactions on Medical Imaging， 2014， 33（7）： 1488-1502.
5	TAJBAKHSH N， GURUDU S R， LIANG J. Automated polyp detection in colonoscopy videos using shape and context information［J］. IEEE Transactions on Medical Imaging， 2016， 35（2）： 630-644.
6	KIRILLOV A， MINTUN E， RAVI N， et al. Segment anything ［C］// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2023： 3992-4003.
7	LI J， CHEN T， WANG X， et al. Adapting the segment anything model for multi-modal retinal anomaly detection and localization［J］. Information Fusion， 2025， 113： No.102631.
8	LIU J， WU K， NIE Q， et al. Unsupervised continual anomaly detection with contrastively-learned prompt ［C］// Proceedings of the 38th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2024： 3639-3647.
9	CAI W， HUANG W， TIAN L， et al. Multiscale global attention for abnormal geological hazard segmentation ［J］. IEEE Sensors Journal， 2024， 24（10）： 16961-16971.
10	XIE B， TANG H， DUAN B， et al. MaskSAM： towards auto-prompt SAM with mask classification for medical image segmentation ［EB/OL］. ［2024-05-14］. .
11	SHAHARABANY T， DAHAN A， GIRYES R， et al. AutoSAM： adapting SAM to medical images by overloading the prompt encoder［C］// Proceedings of the 2023 British Machine Vision Conference. Durham： BMVA Press， 2023： No.530.
12	BAE S H， YOON K J. Polyp detection via imbalanced learning and discriminative feature learning ［J］. IEEE Transactions on Medical Imaging， 2015， 34（11）： 2379-2393.
13	REISS T， COHEN N， BERGMAN L， et al. PANDA： adapting pretrained features for anomaly detection and segmentation ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 2805-2813.
14	CHEN Z， LI J， LUO Y， et al. CANZSL： cycle-consistent adversarial networks for zero-shot learning from natural language ［C］// Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2020： 863-872.
15	CHEN Z， LUO Y， QIU R， et al. Semantics disentangling for generalized zero-shot learning ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 8692-8700.
16	LIU A A， TIAN H， XU N， et al. Toward region-aware attention learning for scene graph generation ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2022， 33（12）： 7655-7666.
17	RONNEBERGER O， FISCHER P， BROX T. U-Net： convolutional networks for biomedical image segmentation ［C］// Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 9351. Cham： Springer， 2015： 234-241.
18	ZHOU Z， RAHMAN SIDDIQUEE M M， TAJBAKHSH N， et al. UNet++： a nested U-Net architecture for medical image segmentation ［C］// Proceedings of the 4th International Workshop on Deep Learning in Medical Image Analysis and 8th International Workshop on Multimodal Learning for Clinical Decision Support， LNCS 11045. Cham： Springer， 2018： 3-11.
19	FANG Y， CHEN C， YUAN Y， et al. Selective feature aggregation network with area-boundary constraints for polyp segmentation［C］// Proceedings of the 2019 International Conference on Medical Image Computing and Computer Assisted Intervention， LNCS 11764. Cham： Springer， 2019： 302-310.
20	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale［EB/OL］. ［2024-08-10］. .
21	WEI X， CAO J， JIN Y， et al. I-MedSAM： implicit medical image segmentation with segment anything ［C］// Proceedings of the 2024 European Conference on Computer Vision， LNCS 15068. Cham： Springer， 2025： 90-107.
22	XIE Z， GUAN B， JIANG W， et al. PA-SAM： prompt adapter SAM for high-quality image segmentation［EB/OL］. ［2024-08-19］. .
23	GU A， DAO T. Mamba： linear-time sequence modeling with selective state spaces ［EB/OL］. ［2024-04-03］. .
24	ZHAO S， CHEN H， ZHANG X， et al. RS-Mamba for large remote sensing image dense prediction［J］. IEEE Transactions on Geoscience and Remote Sensing， 2024， 62： No.5633314.
25	LI C， ZHOU A， YAO A. Omni-dimensional dynamic convolution［EB/OL］. ［2024-09-23］. .
26	BERGMANN P， FAUSER M， SATTLEGGER D， et al. Uninformed students： student-teacher anomaly detection with discriminative latent embeddings ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 4182-4191.
27	KUMAR N， VERMA R， SHARMA S， et al. A dataset and a technique for generalized nuclear segmentation for computational pathology ［J］. IEEE Transactions on Medical Imaging， 2017， 36（7）： 1550-1560.
28	SIRINUKUNWATTANA K， PLUIM J P W， CHEN H， et al. Gland segmentation in colon histology images： the GlaS challenge contest ［J］. Medical Image Analysis， 2017， 35： 489-502.
29	JHA D， SMEDSRUD P H， RIEGLER M A， et al. Kvasir-SEG： a segmented polyp dataset ［C］// Proceedings of the 2020 International Conference on MultiMedia Modeling， LNCS 11962. Cham： Springer， 2020： 451-462.
30	BERNAL J， SÁNCHEZ F J， FERNÁNDEZ-ESPARRACH G， et al. WM-DOVA maps for accurate polyp highlighting in colonoscopy： validation vs. saliency maps from physicians ［J］. Computerized Medical Imaging and Graphics， 2015， 43： 99-111.
31	SILVA J， HISTACE A， ROMAIN O， et al. Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer ［J］. International Journal of Computer Assisted Radiology and Surgery， 2014， 9（2）： 283-293.
32	FAN D P， JI G P， ZHOU T， et al. PraNet： parallel reverse attention network for polyp segmentation ［C］// Proceedings of the 2020 International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham： Springer， 2020： 263-273.
33	KINGMA D P， BA J L. Adam： a method for stochastic optimization ［EB/OL］. ［2024-08-10］. .
34	范登平，季葛鹏，秦雪彬，等. 认知规律启发的物体分割评价标准及损失函数［J］. 中国科学：信息科学， 2021， 51（9）：1475-1489.
	FAN D P， JI G P， QIN X B， et al. Cognitive vision inspired object segmentation metric and loss function ［J］. SCIENTIA SINICA Informationis， 2021， 51（9）： 1475-1489.
35	LI Z， LI Y， LI Q， et al. LViT： language meets vision Transformer in medical image segmentation ［J］. IEEE Transactions on Medical Imaging， 2024， 43（1）： 96-107.
36	ZHANG R， LI G， LI Z， et al. Adaptive context selection for polyp segmentation ［C］// Proceedings of the 2020 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 12266. Cham： Springer， 2020： 253-262.
37	TRINH Q H， NGUYEN H D， NGOC B T N， et al. SAM-EG： segment anything model with edge guidance framework for efficient polyp segmentation ［C］// Proceedings of the 2024 British Machine Vision Conference. Durham： BMVA Press， 2024： No.472.
38	HUANG C H， WU H Y， LIN Y L. HarDNet-MSEG： a simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean Dice and 86 FPS ［EB/OL］. ［2024-03-04］. .
39	YIN Z， LIANG K， MA Z， et al. Duplex contextual relation network for polyp segmentation ［C］// Proceedings of the IEEE 19th International Symposium on Biomedical Imaging. Piscataway： IEEE， 2022： 1-5.
40	PATEL K， BUR A M， WANG G. Enhanced U-Net： a feature enhancement network for polyp segmentation ［C］// Proceedings of the 18th Conference on Robots and Vision. Piscataway： IEEE， 2021： 181-188.
41	WEI J， HU Y， ZHANG R， et al. Shallow attention network for polyp segmentation ［C］// Proceedings of the 2021 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 12901. Cham： Springer， 2021： 699-708.
42	SHIN W， LEE M S， HAN S W. COMMA： propagating complementary multi-level aggregation network for polyp segmentation ［J］. Applied Sciences， 2022， 12（4）： No.2114.
43	BADRINARAYANAN V， KENDALL A， CIPOLLA R. SegNet： a deep convolutional encoder-decoder architecture for image segmentation ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（12）： 2481-2495.
44	WANG H， ZHU Y， GREEN B， et al. Axial-DeepLab： stand-alone axial-attention for panoptic segmentation ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12349. Cham： Springer， 2020： 108-126.
45	VALANARASU J M J， OZA P， HACIHALILOGLU I， et al. Medical Transformer： gated axial-attention for medical image segmentation ［C］// Proceedings of the 2021 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 12901. Cham： Springer， 2021： 36-46.
46	WANG H， CAO P， WANG J， et al. UCTransNet： rethinking the skip connections in U-Net from a channel-wise perspective with Transformer ［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2022： 2441-2449.
47	SHAHARABANY T， WOLF L. End-to-end segmentation of medical images via patch-wise polygons prediction ［C］// Proceedings of the 2022 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 13435. Cham： Springer， 2022： 308-318.
48	WU J， JI W， LIU Y， et al. Medical SAM Adapter： adapting segment anything model for medical image segmentation ［EB/OL］. ［2024-09-03］. .
49	HUANG C， CAI W， JIANG Q， et al. Multimodal representation distribution learning for medical image segmentation ［C］// Proceedings of the 33rd International Joint Conference on Artificial Intelligence. California： IJCAI.org， 2024： 4156-4164.

类别	SFA^［19］		ACSNet^［36］		PraNet^［32］		文献［3］方法		AutoSAM^［11］		I-MedSAM^［21］		本文方法
类别	MAE	mE	MAE	mE	MAE	mE	MAE	mE	MAE	mE	MAE	mE	MAE	mE
平均	0.041	0.739	0.036	0.840	0.015	0.844	0.011	0.860	0.014	0.820	0.016	0.875	0.009	0.904
药片	0.031	0.735	0.022	0.772	0.010	0.826	0.006	0.850	0.004	0.900	0.006	0.878	0.004	0.947
电缆	0.083	0.726	0.037	0.825	0.024	0.828	0.018	0.881	0.022	0.791	0.024	0.831	0.020	0.833
胶囊	0.025	0.552	0.006	0.788	0.008	0.808	0.004	0.765	0.017	0.429	0.010	0.841	0.007	0.850
瓷砖	0.062	0.768	0.016	0.961	0.024	0.907	0.012	0.924	0.014	0.954	0.012	0.954	0.011	0.968
晶体管	0.133	0.596	0.758	0.189	0.034	0.616	0.032	0.838	0.063	0.390	0.105	0.590	0.023	0.913
地毯	0.031	0.690	0.011	0.848	0.011	0.885	0.008	0.855	0.007	0.900	0.007	0.912	0.010	0.893
木材	0.031	0.832	0.013	0.938	0.018	0.899	0.013	0.900	0.016	0.860	0.017	0.880	0.012	0.892
榛子	0.077	0.583	0.009	0.960	0.015	0.902	0.008	0.939	0.006	0.906	0.007	0.930	0.006	0.908
皮革	0.015	0.737	0.003	0.940	0.005	0.886	0.004	0.878	0.004	0.915	0.004	0.904	0.004	0.914
螺丝	0.007	0.750	0.003	0.833	0.006	0.729	0.003	0.780	0.005	0.789	0.002	0.903	0.002	0.905
金属螺母	0.095	0.747	0.022	0.809	0.021	0.885	0.013	0.930	0.012	0.939	0.011	0.947	0.010	0.950
牙刷	0.045	0.738	0.033	0.786	0.042	0.692	0.027	0.729	0.007	0.806	0.007	0.839	0.003	0.876
拉链	0.009	0.941	0.008	0.923	0.011	0.930	0.009	0.920	0.008	0.918	0.007	0.930	0.007	0.914
瓶子	0.037	0.883	0.029	0.898	0.037	0.850	0.017	0.940	0.018	0.948	0.019	0.947	0.017	0.936
网格	0.023	0.718	0.009	0.775	0.011	0.818	0.010	0.806	0.007	0.859	0.006	0.842	0.005	0.865

类别	SFA^［19］		ACSNet^［36］		PraNet^［32］		文献［3］方法		AutoSAM^［11］		I-MedSAM^［21］		本文方法
类别	MAE	mE	MAE	mE	MAE	mE	MAE	mE	MAE	mE	MAE	mE	MAE	mE
平均	0.041	0.739	0.036	0.840	0.015	0.844	0.011	0.860	0.014	0.820	0.016	0.875	0.009	0.904
药片	0.031	0.735	0.022	0.772	0.010	0.826	0.006	0.850	0.004	0.900	0.006	0.878	0.004	0.947
电缆	0.083	0.726	0.037	0.825	0.024	0.828	0.018	0.881	0.022	0.791	0.024	0.831	0.020	0.833
胶囊	0.025	0.552	0.006	0.788	0.008	0.808	0.004	0.765	0.017	0.429	0.010	0.841	0.007	0.850
瓷砖	0.062	0.768	0.016	0.961	0.024	0.907	0.012	0.924	0.014	0.954	0.012	0.954	0.011	0.968
晶体管	0.133	0.596	0.758	0.189	0.034	0.616	0.032	0.838	0.063	0.390	0.105	0.590	0.023	0.913
地毯	0.031	0.690	0.011	0.848	0.011	0.885	0.008	0.855	0.007	0.900	0.007	0.912	0.010	0.893
木材	0.031	0.832	0.013	0.938	0.018	0.899	0.013	0.900	0.016	0.860	0.017	0.880	0.012	0.892
榛子	0.077	0.583	0.009	0.960	0.015	0.902	0.008	0.939	0.006	0.906	0.007	0.930	0.006	0.908
皮革	0.015	0.737	0.003	0.940	0.005	0.886	0.004	0.878	0.004	0.915	0.004	0.904	0.004	0.914
螺丝	0.007	0.750	0.003	0.833	0.006	0.729	0.003	0.780	0.005	0.789	0.002	0.903	0.002	0.905
金属螺母	0.095	0.747	0.022	0.809	0.021	0.885	0.013	0.930	0.012	0.939	0.011	0.947	0.010	0.950
牙刷	0.045	0.738	0.033	0.786	0.042	0.692	0.027	0.729	0.007	0.806	0.007	0.839	0.003	0.876
拉链	0.009	0.941	0.008	0.923	0.011	0.930	0.009	0.920	0.008	0.918	0.007	0.930	0.007	0.914
瓶子	0.037	0.883	0.029	0.898	0.037	0.850	0.017	0.940	0.018	0.948	0.019	0.947	0.017	0.936
网格	0.023	0.718	0.009	0.775	0.011	0.818	0.010	0.806	0.007	0.859	0.006	0.842	0.005	0.865

方法	Kvasir33		Clinic		Colon		ETIS
方法	Dice	IoU	Dice	IoU	Dice	IoU	Dice	IoU
U-Net	81.80	74.60	82.30	75.50	51.20	44.40	39.80	33.50
U-Net++	82.10	74.30	79.40	72.90	48.30	41.00	40.10	34.40
SFA	72.30	61.10	70.00	60.70	46.90	34.70	29.70	21.70
MSEG	89.70	83.90	90.90	86.40	73.50	66.60	70.00	63.00
DCRNet	88.60	82.50	89.60	84.40	70.40	63.10	55.60	49.60
ACSNet	89.80	83.80	88.20	82.60	71.60	64.90	57.80	50.90
PraNet	89.80	84.00	89.90	84.90	71.20	64.00	62.80	56.70
EU-Net	90.80	85.40	90.20	84.60	75.60	68.10	68.70	60.90
SANet	90.40	84.70	91.60	85.90	75.30	67.00	75.00	65.40
COMMA	90.40	86.00	91.60	87.10	75.40	68.90	71.10	64.80
SAM-EG	91.50	86.20	93.10	87.90	77.40	68.90	75.70	68.10
本文方法	92.10	87.60	93.00	87.70	79.30	71.30	78.60	71.50

方法	Kvasir33		Clinic		Colon		ETIS
方法	Dice	IoU	Dice	IoU	Dice	IoU	Dice	IoU
U-Net	81.80	74.60	82.30	75.50	51.20	44.40	39.80	33.50
U-Net++	82.10	74.30	79.40	72.90	48.30	41.00	40.10	34.40
SFA	72.30	61.10	70.00	60.70	46.90	34.70	29.70	21.70
MSEG	89.70	83.90	90.90	86.40	73.50	66.60	70.00	63.00
DCRNet	88.60	82.50	89.60	84.40	70.40	63.10	55.60	49.60
ACSNet	89.80	83.80	88.20	82.60	71.60	64.90	57.80	50.90
PraNet	89.80	84.00	89.90	84.90	71.20	64.00	62.80	56.70
EU-Net	90.80	85.40	90.20	84.60	75.60	68.10	68.70	60.90
SANet	90.40	84.70	91.60	85.90	75.30	67.00	75.00	65.40
COMMA	90.40	86.00	91.60	87.10	75.40	68.90	71.10	64.80
SAM-EG	91.50	86.20	93.10	87.90	77.40	68.90	75.70	68.10
本文方法	92.10	87.60	93.00	87.70	79.30	71.30	78.60	71.50

方法	MoNuSeg		GlaS
方法	Dice	IoU	Dice	IoU
FCN	28.84	28.71	—	—
U-Net	79.43	65.99	75.12	75.12
U-Net++	79.49	66.04	79.03	79.03
Axial Attention	76.83	62.49	—	—
MedT	79.55	66.17	88.85	78.93
PraNet	79.62	66.14	89.69	82.19
UCTransNet	79.87	66.68	89.84	82.24
文献［47］方法	80.13	67.09	91.19	84.34
Med-SA	80.34	67.33	92.02	85.88
LViT	80.15	67.00	90.02	82.68
文献［49］方法	80.96	68.12	91.08	84.00
本文方法	84.02	72.52	92.74	87.01

Vision foundation model-driven pixel-level image anomaly detection method

视觉基础模型驱动的像素级图像异常检测方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 49

Related Articles 15

Recommended Articles

Metrics

SAM	OD-SSM	OD Conv	MoNuSeg		GlaS
SAM	OD-SSM	OD Conv	Dice	IoU	Dice	IoU
√			82.43	70.17	92.10	86.02
√	√		82.99	71.01	92.58	86.62
√		√	83.36	71.59	92.60	86.81
√	√	√	84.02	72.52	92.74	87.01

OD Conv数	MoNuSeg		GlaS
OD Conv数	Dice/%	IoU/%	Dice/%	IoU/%
1	83.63	71.97	92.37	86.38
2	84.02	72.52	92.74	87.01
3	82.52	70.32	92.63	86.93

OD-SSM数	MoNuSeg		GlaS
OD-SSM数	Dice/%	IoU/%	Dice/%	IoU/%
1	84.02	72.52	92.74	87.01
2	83.21	71.35	92.61	86.77
3	83.14	71.22	92.53	86.66

卷积机制	MoNuSeg		GlaS
卷积机制	Dice	IoU	Dice	IoU
DW Conv	84.02	72.52	92.74	87.01
Conv	83.78	72.00	92.56	86.69

方法	计算量/GFLOPs	参数量/10⁶
AutoSAM	80.314	88.569
I-MedSAM	648.060	92.520
本文方法	53.902	53.687

[1]	Tianqi ZHANG, Shuang TAN, Xiwen SHEN, Juan TANG. Image watermarking method combining attention mechanism and multi-scale feature [J]. Journal of Computer Applications, 2025, 45(2): 616-623.
[2]	Miaolei DENG, Yupei KAN, Chuanchuan SUN, Haihang XU, Shaojun FAN, Xin ZHOU. Summary of network intrusion detection systems based on deep learning [J]. Journal of Computer Applications, 2025, 45(2): 453-466.
[3]	Songsen YU, Zhifan LIN, Guopeng XUE, Jianyu XU. Lightweight large-format tile defect detection algorithm based on improved YOLOv8 [J]. Journal of Computer Applications, 2025, 45(2): 647-654.
[4]	Danni DING, Bo PENG, Xi WU. VPNet： fatty liver ultrasound image classification method inspired by ventral pathway [J]. Journal of Computer Applications, 2025, 45(2): 662-669.
[5]	Yan LI, Guanhua YE, Yawen LI, Meiyu LIANG. Enterprise ESG indicator prediction model based on richness coordination technology [J]. Journal of Computer Applications, 2025, 45(2): 670-676.
[6]	Zirong HONG, Guangqing BAO. Review of radar automatic target recognition based on ensemble learning [J]. Journal of Computer Applications, 2025, 45(2): 371-382.
[7]	Zhongwei ZHANG, Jun WANG, Shudong LIU, Zhiheng WANG. Object detection in remote sensing image based on multi-scale feature fusion and weighted boxes fusion [J]. Journal of Computer Applications, 2025, 45(2): 633-639.
[8]	Siqi ZHANG, Jinjun ZHANG, Tianyi WANG, Xiaolin QIN. Deep temporal event detection algorithm based on signal temporal logic [J]. Journal of Computer Applications, 2025, 45(1): 90-97.
[9]	Zongsheng ZHENG, Jia DU, Yuhe CHENG, Zecheng ZHAO, Yuewei ZHANG, Xulong WANG. Cross-modal dual-stream alternating interactive network for infrared-visible image classification [J]. Journal of Computer Applications, 2025, 45(1): 275-283.
[10]	Xinran XU, Shaobing ZHANG, Miao CHENG, Yang ZHANG, Shang ZENG. Bearings fault diagnosis method based on multi-pathed hierarchical mixture-of-experts model [J]. Journal of Computer Applications, 2025, 45(1): 59-68.
[11]	Jietao LIANG, Bing LUO, Lanhui FU, Qingling CHANG, Nannan LI, Ningbo YI, Qi FENG, Xin HE, Fuqin DENG. Point cloud registration method based on coordinate geometric sampling [J]. Journal of Computer Applications, 2025, 45(1): 214-222.
[12]	Yan YAN, Xingying QIAN, Pengbin YAN, Jie YANG. Federated learning-based statistical prediction and differential privacy protection method for location big data [J]. Journal of Computer Applications, 2025, 45(1): 127-135.
[13]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[14]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[15]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.