Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (12): 3995-4003.DOI: 10.11772/j.issn.1001-9081.2024121866

• Multimedia computing and computer simulation • Previous Articles     Next Articles

No-reference image quality assessment algorithm based on saliency features and cross-attention mechanism

Yang DENG1,2, Tao ZHAO3, Kai SUN3, Tong TONG1,4, Qinquan GAO1,4()   

  1. 1.School of Physics and Information Engineering,Fuzhou University,Fuzhou Fujian 350108,China
    2.Fuzhou Branch,China Telecom Corporation Limited,Fuzhou Fujian 350005,China
    3.Beijing Radio and Television Station,Beijing 100022,China
    4.Fujian Imperial Vision Technology Group Company Limited,Fuzhou Fujian 350002,China
  • Received:2025-01-03 Revised:2025-03-18 Accepted:2025-03-20 Online:2025-04-03 Published:2025-12-10
  • Contact: Qinquan GAO
  • About author:DENG Yang, born in 1998, M. S. candidate. Her research interests include computer vision, image processing.
    ZHAO Tao, born in 1982. His research interests include integrated media technology, artificial intelligence, radio and television technology.
    SUN Kai, born in 1984, engineer. His research interests include ultra-high definition video technology, secure broadcast control technology.
    TONG Tong, born in 1986, Ph. D., professor. His research interests include computer vision, medical image processing, computer-aided diagnosis for brain disease.
    GAO Qinquan, born in 1986, Ph. D., associate professor. His research interests include model compression, medical image processing, computer vision.
  • Supported by:
    Project of Artificial Intelligence and Economy Integration Platform of Fujian Province([2022]15)

基于显著性特征与交叉注意力的无参考图像质量评价算法

邓旸1,2, 赵涛3, 孙凯3, 童同1,4, 高钦泉1,4()   

  1. 1.福州大学 物理与信息工程学院,福州 350108
    2.中国电信股份有限公司 福州分公司,福州 350005
    3.北京广播电视台,北京 100022
    4.福建帝视科技集团有限公司,福州 350002
  • 通讯作者: 高钦泉
  • 作者简介:邓旸(1998—),女,福建三明人,硕士研究生,主要研究方向:计算机视觉、图像处理
    赵涛(1982—),男,北京人,主要研究方向:融媒体技术、人工智能、广电技术
    孙凯(1984—),男,北京人,工程师,主要研究方向:超高清视频技术、安全播控技术
    童同(1986—),男,安徽安庆人,教授,博士,主要研究方向:计算机视觉、医学影像处理、脑疾病辅助诊断
    高钦泉(1986—),男,福建福州人,副教授,博士,主要研究方向:模型压缩、医学影像处理、计算机视觉。
  • 基金资助:
    福建省人工智能科技经济融合服务平台项目([2022]15)

Abstract:

Image data in actual business scenarios usually presents the characteristics of rich content and complex distortion performance, which is a great challenge to the generalization of objective Image Quality Assessment (IQA) algorithms. In order to solve this problem, a No-Reference IQA (NR-IQA) algorithm was proposed, which is mainly composed of three sub-networks: Feature Extraction Network (FEN), Feature Fusion Network (FFN), and Adaptive Prediction Network (APN). Firstly, the global view, local patch, and saliency view of the sample were input into the FEN together, and the global distortion, local distortion, and saliency features were extracted by Swim Transformer. Then, the cascaded Transformer encoder was used to fuse the global distortion features and local distortion features, and the potential correlation patterns of the two were explored. Inspired by the human visual attention mechanism, the saliency features were used in the FFN to activate the attention module, so that the module was able to pay additional attention to the visual salient region, so as to improve the semantic parsing ability of the algorithm. Finally, the prediction score was calculated by the dynamically constructed MultiLayer Perceptron (MLP) regression network. Experimental results on main stream synthetic and real-world distortion datasets show that compared with the DSMix (Distortion-induced Sensitivity map-guided Mixed augmentation) algorithm, the proposed algorithm improves the Spearman Rank-order Correlation Coefficient (SRCC) by 4.3% on TID2013 dataset, and the Pearson Linear Correlation Coefficient (PLCC) by 1.4% on KonIQ dataset. The proposed algorithm also demonstrates excellent generalization ability and interpretability, which can deal with the complex distortion performance in business scenarios effectively, and can make adaptive prediction according to the individual characteristics of the sample.

Key words: Image Quality Assessment (IQA), Human Visual System (HVS), visual attention, Salient Object Detection (SOD), attention mechanism

摘要:

实际业务场景中的图像数据通常呈现内容丰富和失真表现复杂的特点,对客观图像质量评价(IQA)算法的泛化是一个巨大挑战。针对这一问题,提出一种无参考IQA(NR-IQA)算法。该算法主要由特征提取网络(FEN)、特征融合网络(FFN)和自适应预测网络(APN)这3个子网络组成。首先,将样本的全局视图、局部patch和显著性视图一并输入FEN,并通过Swim Transformer提取全局失真、局部失真和显著性特征;其次,采用级联的Transformer编码器融合全局失真特征和局部失真特征,并挖掘二者的潜在关联模式;受人类视觉关注机制的启发,在FFN中使用显著性特征激发注意力模块,使该模块对视觉显著性区域施加额外关注,从而提升算法的语义解析能力;最后,通过动态构建的多层感知机(MLP)回归网络计算出预测分数。在主流的合成失真和真实失真数据集上的实验结果表明,所提算法与DSMix(Distortion-induced Sensitivity map-guided Mixed augmentation)算法相比,所提算法在TID2013数据集上的斯皮尔曼秩序相关系数(SRCC)提升了4.3%,在KonIQ数据集上的皮尔森线性相关系数(PLCC)提升了1.4%,并展现出了出色的泛化能力和可解释性,能够有效应对业务场景中失真表现复杂的情况,且可以根据样本个体特征做出适应性预测。

关键词: 图像质量评价, 人类视觉系统, 视觉关注, 显著目标检测, 注意力机制

CLC Number: