Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Image aesthetic quality evaluation method based on self-supervised vision Transformer
Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU
Journal of Computer Applications    2024, 44 (4): 1269-1276.   DOI: 10.11772/j.issn.1001-9081.2023040540
Abstract233)   HTML9)    PDF (3071KB)(809)       Save

The existing image aesthetic quality evaluation methods widely use Convolution Neural Network (CNN) to extract image features. Limited by the local receptive field mechanism, it is difficult for CNN to extract global features from a given image, thereby resulting in the absence of aesthetic attributes like global composition relations, global color matching and so on. In order to solve this problem, an image aesthetic quality evaluation method based on SSViT (Self-Supervised Vision Transformer) model was proposed. Self-attention mechanism was utilized to establish long-distance dependencies among local patches of the image and to adaptively learn their correlations, and extracted the global features so as to characterize the aesthetic attributes. Meanwhile, three tasks of perceiving the aesthetic quality, namely classifying image degradation, ranking image aesthetic quality, and reconstructing image semantics, were designed to pre-train the vision Transformer in a self-supervised manner using unlabeled image data, so as to enhance the representation of global features. The experimental results on AVA (Aesthetic Visual Assessment) dataset show that the SSViT model achieves 83.28%, 0.763 4, 0.746 2 on the metrics including evaluation accuracy, Pearson Linear Correlation Coefficient (PLCC) and SRCC (Spearman Rank-order Correlation Coefficient), respectively. These experimental results demonstrate that the SSViT model achieves higher accuracy in image aesthetic quality evaluation.

Table and Figures | Reference | Related Articles | Metrics