《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (3): 823-831.DOI: 10.11772/j.issn.1001-9081.2024091398

• 大模型前沿研究与典型应用 • 上一篇    下一篇

视觉基础模型驱动的像素级图像异常检测方法

薛振华1, 李强1, 黄超2()   

  1. 1.国能运输技术研究院有限责任公司,北京 100080
    2.中山大学深圳校区 网络空间安全学院,广东 深圳 518107
  • 收稿日期:2024-10-07 修回日期:2024-12-01 接受日期:2024-12-03 发布日期:2025-01-14 出版日期:2025-03-10
  • 通讯作者: 黄超
  • 作者简介:薛振华(1983—),男,山西大同人,经济师,硕士,主要研究方向:缺陷检测、高效重载运输
    李强(1996—),男,山西神池人,工程师,硕士,主要研究方向:缺陷检测、智能装备;
  • 基金资助:
    国家自然科学基金资助项目(62301621);深圳市科技计划项目(20231121172359002)

Vision foundation model-driven pixel-level image anomaly detection method

Zhenhua XUE1, Qiang LI1, Chao HUANG2()   

  1. 1.China Energy Institute of Transportation Technology Research Company Limited,Beijing 100080,China
    2.School of Cyber Science and Technology,Shenzhen Campus of Sun Yat-sen University,Shenzhen Guangdong 518107,China
  • Received:2024-10-07 Revised:2024-12-01 Accepted:2024-12-03 Online:2025-01-14 Published:2025-03-10
  • Contact: Chao HUANG
  • About author:XUE Zhenhua, born in 1983, M. S., economist. His research interests include defect detection, efficient heavy-duty transportation.
    LI Qiang, born in 1996, M. S., engineer. His research interests include defect detection, intelligent equipment.
  • Supported by:
    National Natural Science Foundation of China(62301621);Shenzhen Science and Technology Program(20231121172359002)

摘要:

现有的异常检测方法能在特定应用场景下实现高精度检测,然而这些方法难以适用于其他应用场景,且自动化程度有限。因此,提出一种视觉基础模型(VFM)驱动的像素级图像异常检测方法SSMOD-Net(State Space Model driven-Omni Dimensional Net),旨在实现更精确的工业缺陷检测。与现有方法不同,SSMOD-Net实现SAM(Segment Anything Model)的自动化提示且不需要微调SAM,因此特别适用于需要处理大规模工业视觉数据的场景。SSMOD-Net的核心是一个新颖的提示编码器,该编码器由状态空间模型驱动,能够根据SAM的输入图像动态地生成提示。这一设计允许模型在保持SAM架构不变的同时,通过提示编码器引入额外的指导信息,从而提高检测精度。提示编码器内部集成一个残差多尺度模块,该模块基于状态空间模型构建,能够综合利用多尺度信息和全局信息。这一模块通过迭代搜索,在提示空间中寻找最优的提示,并将这些提示以高维张量的形式提供给SAM,从而增强模型对工业异常的识别能力。而且所提方法不需要对SAM进行任何修改,从而避免复杂的对训练计划的微调需求。在多个数据集上的实验结果表明,所提方法展现出了卓越的性能,与AutoSAM和SAM-EG(SAM with Edge Guidance framework for efficient polyp segmentation)等方法相比,所提方法在mE(mean E-measure)和平均绝对误差(MAE)、Dice和交并比(IoU)上都取得了较好的结果。

关键词: 深度学习, 像素级异常检测, 视觉基础模型, SAM, 自动提示

Abstract:

While previous anomaly detection methods have achieved high-precision detection in specific scenarios, but their applicability is constrained by their lack of generalizability and automation. Thus, a Vision Foundation Model (VFM)-driven pixel-level image anomaly detection method, namely SSMOD-Net (State Space Model driven-Omni Dimensional Net), was proposed with the aim of achieving more accurate industrial defect detection. Unlike the existing methods, SSMOD-Net achieved automated prompting of SAM (Segment Anything Model) without the need for fine-tuning SAM, making it particularly suitable for scenarios that require processing large-scale industrial visual data. The core of SSMOD-Net is a novel prompt encoder driven by a state space model, which was able to generate prompts dynamically based on the input image of SAM. With this design, the model was allowed to introduce additional guidance information through the prompt encoder while preserving SAM’s architecture, thereby enhancing detection accuracy. A residual multi-scale module was integrated in the prompt encoder, and this module was constructed based on the state space model and was able to use multi-scale and global information comprehensively. Through iterative search, the module found optimal prompts in the prompt space and provided the prompts to SAM as high-dimensional tensors, thereby strengthening the model’s ability to recognize industrial anomalies. Moreover, the proposed method did not require any modifications to SAM, thereby avoiding the need for complex fine-tuning of the training schedules. Experimental results on several datasets show that the proposed method has excellent performance, and achieves better results in mE (mean E-measure) and Mean Absolute Error (MAE), Dice, and Intersection over Union (IoU) compared to methods such as AutoSAM and SAM-EG (SAM with Edge Guidance framework for efficient polyp segmentation).

Key words: deep learning, pixel-level anomaly detection, Vision Foundation Model (VFM), SAM (Segment Anything Model), automated prompting

中图分类号: