语义感知交互扩散图像超分辨率重建
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(41975183)


Semantic-aware Interactive Diffusion for Image Super-resolution Reconstruction
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对真实世界图像超分辨率任务中图片退化类型多样与细节恢复困难的问题, 现有方法在结构保持与语义一致性方面仍存在不足. 为此, 本文提出一种语义感知交互扩散图像超分辨率重建方法(semantic-aware interactive diffusion method for image super-resolution reconstruction, SISRM), 引入语义分割信息作为先验以增强重建过程的结构理解与语义引导. 具体而言, 该方法首先设计并训练分割感知提示提取器, 通过分割掩码编码器和标签文本生成器, 从退化低分辨率图像中高效提取分割掩码嵌入与语义标签; 其次, 引入交互式文本到图像控制器, 结合分割交叉注意力模块和可训练图像编码器, 通过多模态语义条件引导扩散过程增强局部细节与全局结构感知; 最后, 提出掩码特征融合机制缓解局部条件控制与全局潜在分布差异, 提高生成图像的一致性和视觉质量. 在 DIV2K-Val 和 RealSR 数据集上, 所提方法在无参考图像质量评估和跨模态图像质量评估最高分别达到0.6121和0.7274, 感知质量提高明显, 验证了其在细节还原、语义一致性及视觉质量方面的综合优势.

    Abstract:

    Diverse degradation types and the difficulty of detail recovery make real-world image super-resolution challenging, with existing methods still struggling with structural preservation and semantic consistency. This study proposes a semantic-aware interactive diffusion method for image super-resolution reconstruction (SISRM) method. Semantic segmentation information is introduced as prior knowledge, enhancing structural understanding and providing semantic guidance during reconstruction. Specifically, a segmentation-aware prompt extractor is designed and trained to efficiently obtain segmentation mask embeddings and semantic labels from degraded low-resolution images using a segmentation mask encoder and a label text generator. An interactive text-to-image controller is then introduced, integrating a segmentation-guided cross-attention module with a trainable image encoder. The diffusion process is guided under multi-modal semantic conditions to enhance local detail and global structure awareness. Finally, a mask feature fusion mechanism is proposed to mitigate the mismatch between local conditional control and the global latent distribution, improving the consistency and visual quality of the generated images. Experimental results on the DIV2K-Val and RealSR datasets show that the proposed method achieves the highest scores of 0.6121 and 0.7274 in no-reference and cross-modal image quality assessment, respectively. These results demonstrate notable improvements in detail restoration, semantic consistency, and overall perceptual quality.

    参考文献
    相似文献
    引证文献
引用本文

王军,杨立文.语义感知交互扩散图像超分辨率重建.计算机系统应用,,():1-11

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-08-01
  • 最后修改日期:2025-08-22
  • 录用日期:
  • 在线发布日期: 2025-12-26
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62661041 传真: Email:csa@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号