多模态融合下的非对称师生网络图像异常检测
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

山西省基础研究计划(20210302123216); 山西省产教融合研究生联合培养示范基地项目(2022JD11); 吕梁市引进高层次科技人才重点研发项目(2022RC08)


Multimodal-fusion-based Asymmetric Teacher-student Network for Image Anomaly Detection
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    现有基于多模态的图像异常检测方法存在以下不足: 在异常区域提取阶段易出现异常平滑现象, 同时在缺陷检测过程中面临细粒度感知能力不足与判别效率低下的问题, 最终导致整体检测性能下降. 为此, 提出多模态融合下的非对称师生网络图像异常检测模型(multimodal image anomaly detection with asymmetric teacher-student network, MATS), 包括跨模态异常放大器(CAA)、多扩张率局部注意力(MDLA)模块和FastKAN前馈网络. 首先, 跨模态放大器通过扩展与压缩辅助特征, 与目标特征融合后放大异常区域并减少噪声, 解决后续检测时的异常平滑问题. 随后, MDLA模块通过不同扩张率卷积并结合局部注意力提取多尺度特征以提高异常区域细粒度感知能力, 并结合归一化流(NF)生成正常样本的条件概率分布; FastKAN模块通过更轻量化的特征处理以实现高效判别异常, 生成与教师输出一致的特征图, 用于逐像素距离计算以评估异常程度. 在测试阶段, 教师与学生网络输出差异较大的区域被判断为异常. 在公开的工业图像数据集MVTec AD和MVTec 3D-AD上的实验结果表明, 该方法在多模态异常检测和定位方面具有先进的性能.

    Abstract:

    Existing multimodal-based image anomaly detection methods suffer from several limitations: anomaly smoothing during anomaly region extraction, insufficient fine-grained perception, and low discrimination efficiency in defect detection, leading to degraded overall performance. To address these issues, this study proposes a multimodal image anomaly detection model with an asymmetric teacher-student network (MATS), comprising three key components: a cross-modal anomaly amplifier (CAA), a multi-dilated local attention (MDLA) module, and a FastKAN feed-forward network. First, the CAA amplifies anomalous regions while reducing noise by expanding/compressing auxiliary features and fusing them with target features, thus alleviating anomaly smoothing in subsequent detection. Subsequently, the MDLA module enhances fine-grained perception of anomalies through multi-dilation-rate convolutions combined with local attention for multi-scale feature extraction, while integrating normalizing flow (NF) to generate the conditional probability distribution of normal samples. The FastKAN module enables efficient anomaly discrimination via lightweight feature processing, producing feature maps consistent with the teacher network’s outputs for pixel-wise distance calculation to evaluate anomaly scores. During testing, regions with significant discrepancies between teacher and student network outputs are identified as anomalies. Experimental results on public industrial image datasets MVTec AD and MVTec 3D-AD demonstrate that the proposed method achieves state-of-the-art performance in multimodal anomaly detection and localization.

    参考文献
    相似文献
    引证文献
引用本文

李昊,张英俊,谢斌红,张睿.多模态融合下的非对称师生网络图像异常检测.计算机系统应用,2026,35(2):53-64

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-07-02
  • 最后修改日期:2025-07-23
  • 录用日期:
  • 在线发布日期: 2025-11-04
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62661041 传真: Email:csa@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号