《计算机应用》唯一官方网站

• •    下一篇

基于显著性特征与交叉注意力的无参考图像质量评价算法

邓旸1,2,赵涛3,孙凯3,童同4,高钦泉1   

  1. 1. 福州大学物理与信息工程学院
    2. 中国电信股份有限公司福州分公司
    3. 北京广播电视台
    4. 福建帝视科技信息有限公司
  • 收稿日期:2025-01-03 修回日期:2025-03-18 发布日期:2025-04-03 出版日期:2025-04-03
  • 通讯作者: 高钦泉
  • 基金资助:
    福建省人工智能科技经济融合服务平台资助项目

No-reference image quality assessment algorithm based on cross-attention mechanism and saliency features

  • Received:2025-01-03 Revised:2025-03-18 Online:2025-04-03 Published:2025-04-03

摘要: 实际业务场景中的图像数据通常呈现出内容丰富和失真表现复杂的特点,对客观图像质量评价算法的泛化性是一个巨大挑战。针对这一问题,提出了一种无参考的图像质量评价算法,该算法主要由特征提取网络、交叉注意力融合网络、自适应预测网络三部分组成。首先,将样本的全局视图、局部patch和显著性视图一并输入特征提取网络,通过Swim Transformer提取全局失真、局部失真和显著性特征。其次,采用级联的Transformer 编码器对全局失真特征和局部失真特征进行融合,同时挖掘二者的潜在关联模式。受人类视觉关注机制的启发,特征融合网络中尝试使用显著性特征来激发注意力模块,使其对视觉显著性区域施加额外关注,以提升算法的语义解析能力。最后,通过动态构建的MLP回归网络计算出预测分数。实验在主流的合成失真和真实失真数据集上进行,结果表明,所提算法与DSMix算法相比,在TID2013数据集上的斯皮尔曼秩相关系数(SRCC)提升了4个百分点,并展现出了出色的泛化能力和可解释性,能够有效应对业务场景中失真情况复杂的情况,且可根据样本个体特征做出适应性预测。

关键词: 图像质量评价, 人类视觉系统, 视觉关注, 显著目标检测, 注意力机制

Abstract: Image data in actual business scenarios usually presents the characteristics of rich content and complex distortion performance, which is a great challenge to the generalization of objective image quality assessment algorithms. In order to solve this problem, a no-reference image quality assessment algorithm was proposed, which was mainly composed of three parts: feature extraction network, cross-attention fusion network and adaptive prediction network. Firstly, the global view, local patch and saliency view of the sample were input into the feature extraction network, and the global distortion, local distortion and saliency features were extracted by Swim Transformer. Then, the cascaded Transformer encoder was used to fuse the global distortion features and local distortion features, and the potential correlation patterns of the two were explored. Inspired by the human visual attention mechanism, the feature fusion network tried to use saliency features to stimulate the attention module, so that it could exert additional attention to the visual salient region, so as to improve the semantic parsing ability of the algorithm. Finally, the prediction score was predicted by the dynamically constructed MLP regression network. The results show that compared with the DSMix algorithm, the proposed algorithm improves the Spearman Rank Correlation Coefficient (SRCC) on the TID2013 dataset by 4 percentage points, and shows excellent generalization ability and interpretability, which can effectively deal with the complex distortion situation in the business scenario, and can make adaptive predictions according to the individual characteristics of the sample.

Key words: Keywords: Image Quality Assessment(IQA), Human Visual System(HVS), visual attention, Salient Object Detection(SOD), attention mechanism

中图分类号: