Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (4): 1269-1276.DOI: 10.11772/j.issn.1001-9081.2023040540
Special Issue: 多媒体计算与计算机仿真
• Multimedia computing and computer simulation • Previous Articles Next Articles
Rong HUANG1,2, Junjie SONG1, Shubo ZHOU1,2(), Hao LIU1,2
Received:
2023-05-08
Revised:
2023-06-29
Accepted:
2023-07-13
Online:
2023-12-04
Published:
2024-04-10
Contact:
Shubo ZHOU
About author:
HUANG Rong, born in 1985, Ph. D., associate professor. His research interests include deep learning, image analysis.Supported by:
通讯作者:
周树波
作者简介:
黄荣(1985—),男,浙江绍兴人,副教授,博士,主要研究方向:深度学习、图像分析基金资助:
CLC Number:
Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer[J]. Journal of Computer Applications, 2024, 44(4): 1269-1276.
黄荣, 宋俊杰, 周树波, 刘浩. 基于自监督视觉Transformer的图像美学质量评价方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1269-1276.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023040540
类型 | 方法 | SRCC | PLCC | Acc/% |
---|---|---|---|---|
经典CNN 方法 | 文献[ | — | — | 66.70 |
文献[ | 0.558 0 | — | 77.33 | |
文献[ | — | — | 77.40 | |
文献[ | 0.612 0 | 0.636 0 | 81.50 | |
文献[ | 0.756 0 | 0.757 0 | 81.72 | |
文献[ | 0.719 0 | 0.720 0 | 80.81 | |
文献[ | 0.648 9 | 0.671 1 | ||
CNN全局 特征提取 方法 | 文献[ | — | — | 71.20 |
文献[ | — | — | 74.46 | |
文献[ | — | — | 82.50 | |
文献[ | 0.690 0 | 0.704 2 | 81.81 | |
文献[ | — | — | 83.03 | |
文献[ | 82.35 | |||
本文方法 | 0.746 2 | 0.763 4 | 83.28 |
Tab. 1 Comparison of quantitative results among different methods
类型 | 方法 | SRCC | PLCC | Acc/% |
---|---|---|---|---|
经典CNN 方法 | 文献[ | — | — | 66.70 |
文献[ | 0.558 0 | — | 77.33 | |
文献[ | — | — | 77.40 | |
文献[ | 0.612 0 | 0.636 0 | 81.50 | |
文献[ | 0.756 0 | 0.757 0 | 81.72 | |
文献[ | 0.719 0 | 0.720 0 | 80.81 | |
文献[ | 0.648 9 | 0.671 1 | ||
CNN全局 特征提取 方法 | 文献[ | — | — | 71.20 |
文献[ | — | — | 74.46 | |
文献[ | — | — | 82.50 | |
文献[ | 0.690 0 | 0.704 2 | 81.81 | |
文献[ | — | — | 83.03 | |
文献[ | 82.35 | |||
本文方法 | 0.746 2 | 0.763 4 | 83.28 |
训练数据量 占比 | PT-CLS(基准) | 增加单项 | 增加双项 | 3项美学感知任务 | ||||
---|---|---|---|---|---|---|---|---|
+rect | +pred | +rank | +rect +pred | +pred +rank | +rect +rank | +rect +pred +rank | ||
平均值 | 80.88 | 0.22 | 0.26 | 0.37 | 1.23 | 1.52 | ||
10 | 79.68 | 0.02 | 0.04 | 0.03 | 0.63 | 0.83 | ||
20 | 80.16 | 0.12 | 0.04 | 0.08 | 0.43 | 1.28 | ||
30 | 80.44 | 0.15 | 0.25 | 0.29 | 1.00 | 1.61 | ||
40 | 80.77 | 0.20 | 0.23 | 0.39 | 1.28 | 1.55 | ||
50 | 80.86 | 0.22 | 0.22 | 0.45 | 1.38 | 1.58 | ||
60 | 80.98 | 0.25 | 0.24 | 0.40 | 1.56 | 1.79 | ||
70 | 81.44 | 0.34 | 0.39 | 0.39 | 1.29 | 1.36 | ||
80 | 81.49 | 0.34 | 0.34 | 0.48 | 1.25 | 1.69 | ||
90 | 81.44 | 0.29 | 0.44 | 0.56 | 1.36 | 1.75 | ||
100 | 81.58 | 0.31 | 0.43 | 0.58 | 1.51 | 1.80 |
Tab. 2 Ablation studies for self-supervised tasks of aesthetic quality perception (Acc)
训练数据量 占比 | PT-CLS(基准) | 增加单项 | 增加双项 | 3项美学感知任务 | ||||
---|---|---|---|---|---|---|---|---|
+rect | +pred | +rank | +rect +pred | +pred +rank | +rect +rank | +rect +pred +rank | ||
平均值 | 80.88 | 0.22 | 0.26 | 0.37 | 1.23 | 1.52 | ||
10 | 79.68 | 0.02 | 0.04 | 0.03 | 0.63 | 0.83 | ||
20 | 80.16 | 0.12 | 0.04 | 0.08 | 0.43 | 1.28 | ||
30 | 80.44 | 0.15 | 0.25 | 0.29 | 1.00 | 1.61 | ||
40 | 80.77 | 0.20 | 0.23 | 0.39 | 1.28 | 1.55 | ||
50 | 80.86 | 0.22 | 0.22 | 0.45 | 1.38 | 1.58 | ||
60 | 80.98 | 0.25 | 0.24 | 0.40 | 1.56 | 1.79 | ||
70 | 81.44 | 0.34 | 0.39 | 0.39 | 1.29 | 1.36 | ||
80 | 81.49 | 0.34 | 0.34 | 0.48 | 1.25 | 1.69 | ||
90 | 81.44 | 0.29 | 0.44 | 0.56 | 1.36 | 1.75 | ||
100 | 81.58 | 0.31 | 0.43 | 0.58 | 1.51 | 1.80 |
1 | TALEBI H, MILANFAR P. NIMA: neural image assessment [J]. IEEE Transactions on Image Processing, 2018, 27(8): 3998-4011. 10.1109/tip.2018.2831899 |
2 | MURRAY N, MARCHESOTTI L, PERRONNIN F. AVA: a large-scale database for aesthetic visual analysis [C]// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2012: 2408-2415. 10.1109/cvpr.2012.6247954 |
3 | LU X, LIN Z, JIN H L, et al. Rating image aesthetics using deep learning [J]. IEEE Transactions on Multimedia, 2015, 17(11): 2021-2034. 10.1109/tmm.2015.2477040 |
4 | FISCHER M, KOBS K, HOTHO A. NICER: aesthetic image enhancement with humans in the loop [EB/OL]. [2023-04-01]. htttps://arxiv.org/pdf/ 2012.01778.pdf. |
5 | AYDIN T O, SMOLIC A, GROSS M. Automated aesthetic analysis of photographic images [J]. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(1): 31-42. 10.1109/tvcg.2014.2325047 |
6 | PEDRO J S, YEH T, OLIVER N. Leveraging user comments for aesthetic aware image search reranking [C] // Proceedings of the 21st Annual Conference on World Wide Web. New York: ACM, 2012: 439-448. 10.1145/2187836.2187896 |
7 | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90. 10.1145/3065386 |
8 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C] // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
9 | RAWAT W, WANG Z. Deep convolutional neural networks for image classification: a comprehensive review [J]. Neural Computation, 2017, 29(9): 2352-2449. 10.1162/neco_a_00990 |
10 | 季长清, 高志勇, 秦静, 等. 基于卷积神经网络的图像分类算法综述 [J]. 计算机应用, 2022, 42(4): 1044-1049. 10.11772/j.issn.1001-9081.2021071273 |
JI C Q, GAO Z Y, QIN J, et al. Review of image classification algorithms based on convolutional neural network [J]. Journal of Computer Applications, 2022, 42(4): 1044-1049. 10.11772/j.issn.1001-9081.2021071273 | |
11 | ZHAO Z-Q, ZHENG P, XU S-T, et al. Object detection with deep learning: a review [J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(11): 3212-3232. 10.1109/tnnls.2018.2876865 |
12 | 蒋弘毅, 王永娟, 康锦煜. 目标检测模型及其优化方法综述 [J]. 自动化学报, 2021, 47(6): 1232-1255. 10.16383/j.aas.c190756 |
JIANG H Y, WANG Y J, KANG J Y. A survey of object detection models and its optimization models [J]. Acta Automatica Sinica, 2021, 47(6): 1232-1255. 10.16383/j.aas.c190756 | |
13 | SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651. 10.1109/tpami.2016.2572683 |
14 | 青晨, 禹晶, 肖创柏, 等. 深度卷积神经网络图像语义分割研究进展 [J]. 中国图象图形学报, 2020, 25(6): 1069-1090. 10.11834/jig.190355 |
QING C, YU J, XIAO C B, et al. Deep convolutional neural network for semantic image segmentation [J]. Journal of Image and Graphics, 2020, 25(6): 1069-1090. 10.11834/jig.190355 | |
15 | CHEN W, WANG W, LIU L, et al. New ideas and trends in deep multimodal content understanding: a review [J]. Neurocomputing, 2021, 426: 195-215. 10.1016/j.neucom.2020.10.042 |
16 | 顾婷婷, 郭延文, 殷昆燕. 结合浅景深与构图的图像质量评价 [J]. 中国图象图形学报, 2013, 18(5): 574-582. 10.11834/jig.20130512 |
GU T T, GUO Y W, YIN K Y. Image quality assessment combining low DoF and composition [J]. Journal of Image and Graphics, 2013, 18(5): 574-582. 10.11834/jig.20130512 | |
17 | ZHAO L, SHANG M, GAO F, et al. Representation learning of image composition for aesthetic prediction [J]. Computer Vision and Image Understanding, 2020, 199: 103024. 10.1016/j.cviu.2020.103024 |
18 | KONG S, SHEN X, LIN Z, et al. Photo aesthetics ranking network with attributes and content adaptation [C] // Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 662-679. 10.1007/978-3-319-46448-0_40 |
19 | MAI L, JIN H, LIU F. Composition-preserving deep photo aesthetics assessment [C] // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 497-506. 10.1109/cvpr.2016.60 |
20 | HOSU V, GOLDLÜCKE B, SAUPE D. Effective aesthetics prediction with multi-level spatially pooled features [C] // Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9367-9375. 10.1109/cvpr.2019.00960 |
21 | ZENG H, CAO Z, ZHANG L. A unified probabilistic formulation of image aesthetic assessment [J]. IEEE Transactions on Image Processing, 2020, 29: 1548-1561. 10.1109/tip.2019.2941778 |
22 | CHEN Q, ZHANG W, ZHOU N, et al. Adaptive fractional dilated convolution network for image aesthetics assessment [C] // Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 14102-14111. 10.1109/cvpr42600.2020.01412 |
23 | LU X, LIN Z, SHEN X, et al. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 990-998. 10.1109/iccv.2015.119 |
24 | MA S, LIU J, CHEN C W. A-Lamp: adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 722-731. 10.1109/cvpr.2017.84 |
25 | ZHANG X, GAO X, LU W, et al. A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction [J]. IEEE Transactions on Multimedia, 2019, 21(11): 2815-2826. 10.1109/tmm.2019.2911428 |
26 | SHENG K, DONG W, MA C, et al. Attention-based multi-patch aggregation for image aesthetic assessment [C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 879-886. 10.1145/3240508.3240554 |
27 | 温坤哲, 韦玉科, 董晓华. 深度卷积神经网络在图像美学评价的应用综述 [J]. 计算机工程与应用, 2019, 55(15): 13-23,58. 10.3778/j.issn.1002-8331.1901-0185 |
WEN K Z, WEI Y K, DONG X H. Survey of application of deep convolution neural network in image aesthetic evaluation [J]. Computer Engineering and Applications, 2019, 55(15): 13-23,58. 10.3778/j.issn.1002-8331.1901-0185 | |
28 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale [EB/OL]. (2020-10-22) [2023-04-01]. . |
29 | DATTA R, JOSHI D, LI J, et al. Studying aesthetics in photographic images using a computational approach [C]// Proceedings of the 2006 European Conference on Computer Vision. Berlin: Springer, 2006: 288-301. 10.1007/11744078_23 |
30 | BHATTACHARYA S. SUKTHANKAR R, SHAH M. A framework for photo-quality assessment and enhancement based on visual aesthetics [C]// Proceedings of the 18th ACM International Conference on Multimedia. New York: ACM, 2010: 271-280. 10.1145/1873951.1873990 |
31 | KE Y, TANG X, JING F. The design of high-level features for photo quality assessment [C]// Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2006: 419-426. 10.1109/cvpr.2006.3 |
32 | TONG H, LI M, ZHANG H, et al. Classification of digital photos taken by photographers or home users [C]// Proceedings of the 2004 Pacific-Rim Conference on Multimedia. Berlin: Springer, 2004: 198-205. 10.1007/978-3-540-30541-5_25 |
33 | MARCHESOTTI L, PERRONNIN F, LARLUS D, et al. Assessing the aesthetic quality of photographs using generic image descriptors [C]// Proceedings of the 2011 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2011: 1784-1791. 10.1109/iccv.2011.6126444 |
34 | 田永林, 王雨桐, 王建功, 等. 视觉Transformer研究的关键问题:现状及展望 [J]. 自动化学报, 2022, 48(4): 957-979. 10.16383/j.aas.c220027 |
TIAN Y L, WANG Y T, WANG J G, et al. Key problems and progress of vision Transformer: the state of the art and prospects [J]. Acta Automatica Sinica, 2022, 48(4): 957-979. 10.16383/j.aas.c220027 | |
35 | BA J L, KIROS J R, HINTON G E. Layer normalization [EB/OL]. (2016-07-21)[2023-04-01]. . |
36 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
37 | J-B CORDONNIER, LOUKAS A, JAGGI M. On the relationship between self-attention and convolutional layers [EB/OL]. (2019-11-08) [2023-04-01]. . 10.48550/arXiv.1911.03584 |
38 | RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge [J]. International Journal of Computer Vision, 2015, 115(3): 211-252. 10.1007/s11263-015-0816-y |
39 | ALEEM H, CORREA-HERRAN I, GRZYWACZ N M. A theoretical framework for how we learn aesthetic values [J]. Frontiers in Human Neuroscience, 2020, 14: No. 345. 10.3389/fnhum.2020.00345 |
40 | 杨文雅, 宋广乐, 崔超然, 等. 基于语义感知的图像美学质量评估方法 [J]. 计算机应用, 2018, 38(11): 3216-3220. 10.11772/j.issn.1001-9081.2018041221 |
YANG W Y, SONG G L, CUI C R, et al. Image aesthetic quality assessment method based on semantic perception [J]. Journal of Computer Applications, 2018, 38(11): 3216-3220. 10.11772/j.issn.1001-9081.2018041221 | |
41 | VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders [C]// Proceedings of the 25th International Conference on Machine Learning. New York: ACM, 2008: 1096-1103. 10.1145/1390156.1390294 |
42 | KINGMA D P, BA J. Adam: a method for stochastic optimization [EB/OL]. (2014-12-22) [2023-04-01]. . |
43 | LOSHCHILOV I, HUTTER F. SGDR: stochastic gradient descent with warm restarts [EB/OL]. (2016-08-13) [2023-04-01]. . |
44 | SUTSKEVER I, MARTENS J, DAHL G, et al. On the importance of initialization and momentum in deep learning [C]// Proceedings of the 30th International Conference on Machine Learning. New York: JMLR.org, 2013: 1139-1147. |
45 | SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization [J]. International Journal of Computer Vision, 2020, 128: 336-359. 10.1007/s11263-019-01228-7 |
46 | VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE [J]. Journal of Machine Learning Research, 2008, 9(86): 2579-2605. |
[1] | Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902. |
[2] | Zhigang XU, Chuang ZHANG. Multi-level color restoration of mural image based on gated positional encoding [J]. Journal of Computer Applications, 2024, 44(9): 2931-2937. |
[3] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[4] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[5] | Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718. |
[6] | Yuwei DING, Hongbo SHI, Jie LI, Min LIANG. Image denoising network based on local and global feature decoupling [J]. Journal of Computer Applications, 2024, 44(8): 2571-2579. |
[7] | Yue LIU, Fang LIU, Aoyun WU, Qiuyue CHAI, Tianxiao WANG. 3D object detection network based on self-attention mechanism and graph convolution [J]. Journal of Computer Applications, 2024, 44(6): 1972-1977. |
[8] | Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831. |
[9] | Guijin HAN, Xinyuan ZHANG, Wentao ZHANG, Ya HUANG. Self-supervised image registration algorithm based on multi-feature fusion [J]. Journal of Computer Applications, 2024, 44(5): 1597-1604. |
[10] | Jiong WANG, Taotao TANG, Caiyan JIA. PAGCL: positive augmentation graph contrastive learning recommendation method without negative sampling [J]. Journal of Computer Applications, 2024, 44(5): 1485-1492. |
[11] | Ziqi HUANG, Jianpeng HU. Entity category enhanced nested named entity recognition in automotive domain [J]. Journal of Computer Applications, 2024, 44(2): 377-384. |
[12] | Xinran LUO, Tianrui LI, Zhen JIA. Chinese medical named entity recognition based on self-attention mechanism and lexicon enhancement [J]. Journal of Computer Applications, 2024, 44(2): 385-392. |
[13] | Liqing QIU, Xiaopan SU. Personalized multi-layer interest extraction click-through rate prediction model [J]. Journal of Computer Applications, 2024, 44(11): 3411-3418. |
[14] | Xingyao YANG, Hongtao SHEN, Zulian ZHANG, Jiong YU, Jiaying CHEN, Dongxiao WANG. Sequential recommendation based on hierarchical filter and temporal convolution enhanced self-attention network [J]. Journal of Computer Applications, 2024, 44(10): 3090-3096. |
[15] | Wen ZHOU, Yuzhang CHEN, Zhiyuan WEN, Shiqi WANG. Fish image classification based on positional overlapping patch embedding and multi-scale channel interactive attention [J]. Journal of Computer Applications, 2024, 44(10): 3209-3216. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||