基于自监督视觉Transformer的图像美学质量评价方法

doi:10.11772/j.issn.1001-9081.2023040540

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (4): 1269-1276.DOI: 10.11772/j.issn.1001-9081.2023040540

• 多媒体计算与计算机仿真 • 上一篇

基于自监督视觉Transformer的图像美学质量评价方法

黄荣¹^,², 宋俊杰¹, 周树波¹^,²(), 刘浩¹^,²

^1.东华大学信息科学与技术学院，上海 201620
^2.数字化纺织服装技术教育部工程研究中心（东华大学），上海 201620

收稿日期:2023-05-08 修回日期:2023-06-29 接受日期:2023-07-13 发布日期:2023-12-04 出版日期:2024-04-10
通讯作者: 周树波
作者简介:黄荣（1985—），男，浙江绍兴人，副教授，博士，主要研究方向：深度学习、图像分析
宋俊杰（1998—），男，山东淄博人，硕士研究生，主要研究方向：深度学习、图像分析
周树波（1988—），男，浙江绍兴人，讲师，博士，主要研究方向：深度学习、图像分析 zhoushubo@dhu.edu.cn
刘浩（1977—），男，四川达州人，副教授，博士，CCF会员，主要研究方向：深度学习、机器视觉。
基金资助:
国家自然科学基金资助项目(62001099);中央高校基本科研业务费专项资金资助项目(2232023D?30)

Image aesthetic quality evaluation method based on self-supervised vision Transformer

Rong HUANG¹^,², Junjie SONG¹, Shubo ZHOU¹^,²(), Hao LIU¹^,²

^1.College of Information Science and Technology，Donghua University，Shanghai 201620，China
^2.Engineering Research Center of Digitalized Textile & Fashion Technology，Ministry of Education （Donghua University），Shanghai 201620，China

Received:2023-05-08 Revised:2023-06-29 Accepted:2023-07-13 Online:2023-12-04 Published:2024-04-10
Contact: Shubo ZHOU
About author:HUANG Rong， born in 1985， Ph. D.， associate professor. His research interests include deep learning， image analysis.
SONG Junjie， born in 1998， M. S. candidate. His research interests include deep learning， image analysis.
ZHOU Shubo， born in 1988， Ph. D.， lecturer. His research interests include deep learning， image analysis.
LIU Hao， born in 1977， Ph. D.， associate professor. His research interests include deep learning， machine vision.
Supported by:
National Natural Science Foundation of China(62001099);Fundamental Research Funds for Central Universities(2232023D-30)

摘要/Abstract

摘要：

现有的图像美学质量评价方法普遍使用卷积神经网络（CNN）提取图像特征，但受局部感受野机制的限制，CNN较难提取图像的全局特征，导致全局构图关系、全局色彩搭配等美学属性缺失。为解决该问题，提出基于自监督视觉Transformer（SSViT）模型的图像美学质量评价方法。利用自注意力机制建立图像局部块之间的长距离依赖关系，自适应地学习图像不同局部块之间的相关性，提取图像的全局特征，从而刻画图像的美学属性；同时，设计图像降质分类、图像美学质量排序和图像语义重构这3项美学质量感知任务，利用无标注的图像数据对视觉Transformer（ViT）进行自监督预训练，增强全局特征的表达能力。在AVA（Aesthetic Visual Assessment）数据集上的实验结果显示，SSViT模型在美学质量分类准确率、皮尔森线性相关系数（PLCC）和斯皮尔曼等级相关系数（SRCC）指标上分别达到83.28%、0.763 4和0.746 2。以上实验结果表明，SSViT模型具有较高的图像美学质量评价准确性。

关键词: 图像美学质量评价, 视觉Transformer, 自监督学习, 全局特征, 自注意力机制

Abstract:

The existing image aesthetic quality evaluation methods widely use Convolution Neural Network （CNN） to extract image features. Limited by the local receptive field mechanism， it is difficult for CNN to extract global features from a given image， thereby resulting in the absence of aesthetic attributes like global composition relations， global color matching and so on. In order to solve this problem， an image aesthetic quality evaluation method based on SSViT （Self-Supervised Vision Transformer） model was proposed. Self-attention mechanism was utilized to establish long-distance dependencies among local patches of the image and to adaptively learn their correlations， and extracted the global features so as to characterize the aesthetic attributes. Meanwhile， three tasks of perceiving the aesthetic quality， namely classifying image degradation， ranking image aesthetic quality， and reconstructing image semantics， were designed to pre-train the vision Transformer in a self-supervised manner using unlabeled image data， so as to enhance the representation of global features. The experimental results on AVA （Aesthetic Visual Assessment） dataset show that the SSViT model achieves 83.28%， 0.763 4， 0.746 2 on the metrics including evaluation accuracy， Pearson Linear Correlation Coefficient （PLCC） and SRCC （Spearman Rank-order Correlation Coefficient）， respectively. These experimental results demonstrate that the SSViT model achieves higher accuracy in image aesthetic quality evaluation.

Key words: image aesthetic quality evaluation, Vision Transformer (ViT), self-supervised learning, global feature, self-attention mechanism

中图分类号:

TP751

黄荣, 宋俊杰, 周树波, 刘浩. 基于自监督视觉Transformer的图像美学质量评价方法[J]. 计算机应用, 2024, 44(4): 1269-1276.

Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer[J]. Journal of Computer Applications, 2024, 44(4): 1269-1276.

图/表 9

参考文献 46

1	TALEBI H， MILANFAR P. NIMA： neural image assessment ［J］. IEEE Transactions on Image Processing， 2018， 27（8）： 3998-4011. 10.1109/tip.2018.2831899
2	MURRAY N， MARCHESOTTI L， PERRONNIN F. AVA： a large-scale database for aesthetic visual analysis ［C］// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2012： 2408-2415. 10.1109/cvpr.2012.6247954
3	LU X， LIN Z， JIN H L， et al. Rating image aesthetics using deep learning ［J］. IEEE Transactions on Multimedia， 2015， 17（11）： 2021-2034. 10.1109/tmm.2015.2477040
4	FISCHER M， KOBS K， HOTHO A. NICER： aesthetic image enhancement with humans in the loop ［EB/OL］. ［2023-04-01］. htttps：//arxiv.org/pdf/ 2012.01778.pdf.
5	AYDIN T O， SMOLIC A， GROSS M. Automated aesthetic analysis of photographic images ［J］. IEEE Transactions on Visualization and Computer Graphics， 2015， 21（1）： 31-42. 10.1109/tvcg.2014.2325047
6	PEDRO J S， YEH T， OLIVER N. Leveraging user comments for aesthetic aware image search reranking ［C］ // Proceedings of the 21st Annual Conference on World Wide Web. New York： ACM， 2012： 439-448. 10.1145/2187836.2187896
7	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks ［J］. Communications of the ACM， 2017， 60（6）： 84-90. 10.1145/3065386
8	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］ // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
9	RAWAT W， WANG Z. Deep convolutional neural networks for image classification： a comprehensive review ［J］. Neural Computation， 2017， 29（9）： 2352-2449. 10.1162/neco_a_00990
10	季长清，高志勇，秦静，等. 基于卷积神经网络的图像分类算法综述［J］. 计算机应用， 2022， 42（4）： 1044-1049. 10.11772/j.issn.1001-9081.2021071273
	JI C Q， GAO Z Y， QIN J， et al. Review of image classification algorithms based on convolutional neural network ［J］. Journal of Computer Applications， 2022， 42（4）： 1044-1049. 10.11772/j.issn.1001-9081.2021071273
11	ZHAO Z-Q， ZHENG P， XU S-T， et al. Object detection with deep learning： a review ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2019， 30（11）： 3212-3232. 10.1109/tnnls.2018.2876865
12	蒋弘毅，王永娟，康锦煜. 目标检测模型及其优化方法综述［J］. 自动化学报， 2021， 47（6）： 1232-1255. 10.16383/j.aas.c190756
	JIANG H Y， WANG Y J， KANG J Y. A survey of object detection models and its optimization models ［J］. Acta Automatica Sinica， 2021， 47（6）： 1232-1255. 10.16383/j.aas.c190756
13	SHELHAMER E， LONG J， DARRELL T. Fully convolutional networks for semantic segmentation ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（4）： 640-651. 10.1109/tpami.2016.2572683
14	青晨，禹晶，肖创柏，等. 深度卷积神经网络图像语义分割研究进展［J］. 中国图象图形学报， 2020， 25（6）： 1069-1090. 10.11834/jig.190355
	QING C， YU J， XIAO C B， et al. Deep convolutional neural network for semantic image segmentation ［J］. Journal of Image and Graphics， 2020， 25（6）： 1069-1090. 10.11834/jig.190355
15	CHEN W， WANG W， LIU L， et al. New ideas and trends in deep multimodal content understanding： a review ［J］. Neurocomputing， 2021， 426： 195-215. 10.1016/j.neucom.2020.10.042
16	顾婷婷，郭延文，殷昆燕. 结合浅景深与构图的图像质量评价［J］. 中国图象图形学报， 2013， 18（5）： 574-582. 10.11834/jig.20130512
	GU T T， GUO Y W， YIN K Y. Image quality assessment combining low DoF and composition ［J］. Journal of Image and Graphics， 2013， 18（5）： 574-582. 10.11834/jig.20130512
17	ZHAO L， SHANG M， GAO F， et al. Representation learning of image composition for aesthetic prediction ［J］. Computer Vision and Image Understanding， 2020， 199： 103024. 10.1016/j.cviu.2020.103024
18	KONG S， SHEN X， LIN Z， et al. Photo aesthetics ranking network with attributes and content adaptation ［C］ // Proceedings of the 14th European Conference on Computer Vision. Cham： Springer， 2016： 662-679. 10.1007/978-3-319-46448-0_40
19	MAI L， JIN H， LIU F. Composition-preserving deep photo aesthetics assessment ［C］ // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 497-506. 10.1109/cvpr.2016.60
20	HOSU V， GOLDLÜCKE B， SAUPE D. Effective aesthetics prediction with multi-level spatially pooled features ［C］ // Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 9367-9375. 10.1109/cvpr.2019.00960
21	ZENG H， CAO Z， ZHANG L. A unified probabilistic formulation of image aesthetic assessment ［J］. IEEE Transactions on Image Processing， 2020， 29： 1548-1561. 10.1109/tip.2019.2941778
22	CHEN Q， ZHANG W， ZHOU N， et al. Adaptive fractional dilated convolution network for image aesthetics assessment ［C］ // Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 14102-14111. 10.1109/cvpr42600.2020.01412
23	LU X， LIN Z， SHEN X， et al. Deep multi-patch aggregation network for image style， aesthetics， and quality estimation ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 990-998. 10.1109/iccv.2015.119
24	MA S， LIU J， CHEN C W. A-Lamp： adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 722-731. 10.1109/cvpr.2017.84
25	ZHANG X， GAO X， LU W， et al. A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction ［J］. IEEE Transactions on Multimedia， 2019， 21（11）： 2815-2826. 10.1109/tmm.2019.2911428
26	SHENG K， DONG W， MA C， et al. Attention-based multi-patch aggregation for image aesthetic assessment ［C］// Proceedings of the 26th ACM International Conference on Multimedia. New York： ACM， 2018： 879-886. 10.1145/3240508.3240554
27	温坤哲，韦玉科，董晓华. 深度卷积神经网络在图像美学评价的应用综述［J］. 计算机工程与应用， 2019， 55（15）： 13-23，58. 10.3778/j.issn.1002-8331.1901-0185
	WEN K Z， WEI Y K， DONG X H. Survey of application of deep convolution neural network in image aesthetic evaluation ［J］. Computer Engineering and Applications， 2019， 55（15）： 13-23，58. 10.3778/j.issn.1002-8331.1901-0185
28	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： transformers for image recognition at scale ［EB/OL］. （2020-10-22）［2023-04-01］. .
29	DATTA R， JOSHI D， LI J， et al. Studying aesthetics in photographic images using a computational approach ［C］// Proceedings of the 2006 European Conference on Computer Vision. Berlin： Springer， 2006： 288-301. 10.1007/11744078_23
30	BHATTACHARYA S. SUKTHANKAR R， SHAH M. A framework for photo-quality assessment and enhancement based on visual aesthetics ［C］// Proceedings of the 18th ACM International Conference on Multimedia. New York： ACM， 2010： 271-280. 10.1145/1873951.1873990
31	KE Y， TANG X， JING F. The design of high-level features for photo quality assessment ［C］// Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2006： 419-426. 10.1109/cvpr.2006.3
32	TONG H， LI M， ZHANG H， et al. Classification of digital photos taken by photographers or home users ［C］// Proceedings of the 2004 Pacific-Rim Conference on Multimedia. Berlin： Springer， 2004： 198-205. 10.1007/978-3-540-30541-5_25
33	MARCHESOTTI L， PERRONNIN F， LARLUS D， et al. Assessing the aesthetic quality of photographs using generic image descriptors ［C］// Proceedings of the 2011 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2011： 1784-1791. 10.1109/iccv.2011.6126444
34	田永林，王雨桐，王建功，等. 视觉Transformer研究的关键问题：现状及展望［J］. 自动化学报， 2022， 48（4）： 957-979. 10.16383/j.aas.c220027
	TIAN Y L， WANG Y T， WANG J G， et al. Key problems and progress of vision Transformer： the state of the art and prospects ［J］. Acta Automatica Sinica， 2022， 48（4）： 957-979. 10.16383/j.aas.c220027
35	BA J L， KIROS J R， HINTON G E. Layer normalization ［EB/OL］. （2016-07-21）［2023-04-01］. .
36	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
37	J-B CORDONNIER， LOUKAS A， JAGGI M. On the relationship between self-attention and convolutional layers ［EB/OL］. （2019-11-08）［2023-04-01］. . 10.48550/arXiv.1911.03584
38	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge ［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252. 10.1007/s11263-015-0816-y
39	ALEEM H， CORREA-HERRAN I， GRZYWACZ N M. A theoretical framework for how we learn aesthetic values ［J］. Frontiers in Human Neuroscience， 2020， 14： No. 345. 10.3389/fnhum.2020.00345
40	杨文雅，宋广乐，崔超然，等. 基于语义感知的图像美学质量评估方法［J］. 计算机应用， 2018， 38（11）： 3216-3220. 10.11772/j.issn.1001-9081.2018041221
	YANG W Y， SONG G L， CUI C R， et al. Image aesthetic quality assessment method based on semantic perception ［J］. Journal of Computer Applications， 2018， 38（11）： 3216-3220. 10.11772/j.issn.1001-9081.2018041221
41	VINCENT P， LAROCHELLE H， BENGIO Y， et al. Extracting and composing robust features with denoising autoencoders ［C］// Proceedings of the 25th International Conference on Machine Learning. New York： ACM， 2008： 1096-1103. 10.1145/1390156.1390294
42	KINGMA D P， BA J. Adam： a method for stochastic optimization ［EB/OL］. （2014-12-22）［2023-04-01］. .
43	LOSHCHILOV I， HUTTER F. SGDR： stochastic gradient descent with warm restarts ［EB/OL］. （2016-08-13）［2023-04-01］. .
44	SUTSKEVER I， MARTENS J， DAHL G， et al. On the importance of initialization and momentum in deep learning ［C］// Proceedings of the 30th International Conference on Machine Learning. New York： JMLR.org， 2013： 1139-1147.
45	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad-CAM： visual explanations from deep networks via gradient-based localization ［J］. International Journal of Computer Vision， 2020， 128： 336-359. 10.1007/s11263-019-01228-7
46	VAN DER MAATEN L， HINTON G. Visualizing data using t-SNE ［J］. Journal of Machine Learning Research， 2008， 9（86）： 2579-2605.

类型	方法	SRCC	PLCC	Acc/%
经典CNN 方法	文献［2］方法	—	—	66.70
	文献［18］方法	0.558 0	—	77.33
	文献［19］方法	—	—	77.40
	文献［1］方法	0.612 0	0.636 0	81.50
	文献［20］方法	0.756 0	0.757 0	81.72
	文献［21］方法	0.719 0	0.720 0	80.81
	文献［22］方法	0.648 9	0.671 1	83.24
CNN全局特征提取方法	文献［3］方法	—	—	71.20
	文献［23］方法	—	—	74.46
	文献［24］方法	—	—	82.50
	文献［25］方法	0.690 0	0.704 2	81.81
	文献［26］方法	—	—	83.03
	文献［17］方法	0.748 0	0.760 0	82.35
本文方法		0.746 2	0.763 4	83.28

类型	方法	SRCC	PLCC	Acc/%
经典CNN 方法	文献［2］方法	—	—	66.70
	文献［18］方法	0.558 0	—	77.33
	文献［19］方法	—	—	77.40
	文献［1］方法	0.612 0	0.636 0	81.50
	文献［20］方法	0.756 0	0.757 0	81.72
	文献［21］方法	0.719 0	0.720 0	80.81
	文献［22］方法	0.648 9	0.671 1	83.24
CNN全局特征提取方法	文献［3］方法	—	—	71.20
	文献［23］方法	—	—	74.46
	文献［24］方法	—	—	82.50
	文献［25］方法	0.690 0	0.704 2	81.81
	文献［26］方法	—	—	83.03
	文献［17］方法	0.748 0	0.760 0	82.35
本文方法		0.746 2	0.763 4	83.28

训练数据量占比	PT-CLS（基准）	增加单项			增加双项			3项美学感知任务
训练数据量占比	PT-CLS（基准）	+rect	+pred	+rank	+rect +pred	+pred +rank	+rect +rank	+rect +pred +rank
平均值	80.88	0.22	0.26	0.54	0.37	1.23	1.29	1.52
10	79.68	0.02	0.04	0.24	0.03	0.63	0.72	0.83
20	80.16	0.12	0.04	0.37	0.08	0.86	0.43	1.28
30	80.44	0.15	0.25	0.55	0.29	1.00	1.41	1.61
40	80.77	0.20	0.23	0.46	0.39	1.35	1.28	1.55
50	80.86	0.22	0.22	0.51	0.45	1.38	1.46	1.58
60	80.98	0.25	0.24	0.58	0.40	1.63	1.56	1.79
70	81.44	0.34	0.39	0.62	0.39	1.29	1.31	1.36
80	81.49	0.34	0.34	0.65	0.48	1.25	1.39	1.69
90	81.44	0.29	0.44	0.68	0.56	1.36	1.64	1.75
100	81.58	0.31	0.43	0.70	0.58	1.51	1.69	1.80

训练数据量占比	PT-CLS（基准）	增加单项			增加双项			3项美学感知任务
训练数据量占比	PT-CLS（基准）	+rect	+pred	+rank	+rect +pred	+pred +rank	+rect +rank	+rect +pred +rank
平均值	80.88	0.22	0.26	0.54	0.37	1.23	1.29	1.52
10	79.68	0.02	0.04	0.24	0.03	0.63	0.72	0.83
20	80.16	0.12	0.04	0.37	0.08	0.86	0.43	1.28
30	80.44	0.15	0.25	0.55	0.29	1.00	1.41	1.61
40	80.77	0.20	0.23	0.46	0.39	1.35	1.28	1.55
50	80.86	0.22	0.22	0.51	0.45	1.38	1.46	1.58
60	80.98	0.25	0.24	0.58	0.40	1.63	1.56	1.79
70	81.44	0.34	0.39	0.62	0.39	1.29	1.31	1.36
80	81.49	0.34	0.34	0.65	0.48	1.25	1.39	1.69
90	81.44	0.29	0.44	0.68	0.56	1.36	1.64	1.75
100	81.58	0.31	0.43	0.70	0.58	1.51	1.69	1.80

[1]	罗歆然, 李天瑞, 贾真. 基于自注意力机制与词汇增强的中文医学命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 385-392.
[2]	黄子麒, 胡建鹏. 实体类别增强的汽车领域嵌套命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 377-384.
[3]	张雨宁, 阿布都克力木·阿布力孜, 梅悌胜, 徐春, 麦尔达娜·买买提热依木, 哈里旦木·阿布都克里木, 侯钰涛. 基于自监督特征提取的骨骼X线影像异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 175-181.
[4]	陈丽安, 过弋. 融合个体偏差信息的文本情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 145-151.
[5]	史含笑, 王雷春. 结合LSTM和自注意力机制的图卷积网络短期电力负荷预测[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 311-317.
[6]	陈佳, 张鸿. 基于特征增强和语义相关性匹配的图像文本检索方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 16-23.
[7]	袁国龙, 张玉金, 刘洋. 基于残差反馈和自注意力的图像篡改取证网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2925-2931.
[8]	段升位, 程欣宇, 王浩舟, 王飞. 基于改进的YOLOv5的大坝表面病害检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2619-2629.
[9]	马胜位, 黄瑞章, 任丽娜, 林川. 基于多层语义融合的结构化深度文本聚类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2364-2369.
[10]	张奕, 王真梅. 图自动编码器上二阶段融合实现的环状RNA-疾病关联预测[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1979-1986.
[11]	隋佳宏, 毛莺池, 于慧敏, 王子成, 平萍. 基于图注意力网络的全局图像描述生成方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1409-1415.
[12]	孙浩, 曹健, 李海生, 毛典辉. 基于改进胶囊网络的会话型推荐模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1043-1049.
[13]	刘磊, 伍鹏, 谢凯, 程贝芝, 盛冠群. 自监督学习HOG预测辅助任务下的车位检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3933-3940.
[14]	左亚尧, 陈皓宇, 陈致然, 洪嘉伟, 陈坤. 融合多语义特征的命名实体识别方法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2001-2008.
[15]	任炜, 白鹤翔. 基于全局与局部标签关系的多标签图像分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1383-1390.

基于自监督视觉Transformer的图像美学质量评价方法

Image aesthetic quality evaluation method based on self-supervised vision Transformer

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 46

相关文章 15

编辑推荐

Metrics