基于注意力机制和迁移学习的古壁画朝代识别

doi:10.11772/j.issn.1001-9081.2022071008

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (6): 1826-1832.DOI: 10.11772/j.issn.1001-9081.2022071008

基于注意力机制和迁移学习的古壁画朝代识别

张慧斌¹^,²(), 冯丽萍¹, 郝耀军¹, 王一宁¹

^1.忻州师范学院计算机系，山西忻州 034000
^2.燕山大学信息科学与工程学院，河北秦皇岛 066004

收稿日期:2022-07-11 修回日期:2022-11-18 接受日期:2022-11-30 发布日期:2023-01-04 出版日期:2023-06-10
通讯作者: 张慧斌
作者简介:张慧斌（1971—），男，山西忻州人，副教授，博士研究生，主要研究方向：深度学习、应用数学Email：927433441@qq.com
冯丽萍（1976—），女，山西忻州人，教授，博士，主要研究方向：分布式优化、深度学习
郝耀军（1979—），男，山西忻州人，教授，博士，主要研究方向：深度学习、推荐系统的信息安全
王一宁（1992—），女，山西长治人，助教，硕士，主要研究方向：深度学习、人工智能。
基金资助:
教育部人文社科青年基金资助项目(20YJC630034);山西省自然科学基金资助项目(20210302124330);山西省回国留学人员科研资助项目(2020-139)

Ancient mural dynasty identification based on attention mechanism and transfer learning

Huibin ZHANG¹^,²(), Liping FENG¹, Yaojun HAO¹, Yining WANG¹

^1.Department of Computer，Xinzhou Normal University，Xinzhou Shanxi 034000，China
^2.School of Information Science and Engineering，Yanshan University，Qinhuangdao Hebei 066004，China

Received:2022-07-11 Revised:2022-11-18 Accepted:2022-11-30 Online:2023-01-04 Published:2023-06-10
Contact: Huibin ZHANG
About author:FENG Liping， born in 1976， Ph. D.， professor. Her research interests include distributed optimization， deep learning.
HAO Yaojun， born in 1979， Ph. D.， professor. His research interests include deep learning， information security of recommendation system.
WANG Yining， born in 1992， M. S.， teaching assistant. Her research interests include deep learning， artificial intelligence.
Supported by:
Youth Foundation of Humanities and Social Sciences Research of Ministry of Education(20YJC630034);Natural Science Foundation of Shanxi Province(20210302124330);Research Project Supported by Shanxi Scholarship Council of China(2020-139)

摘要/Abstract

摘要：

卷积神经网络（CNN）已成功用于敦煌古壁画的朝代分类。针对敦煌壁画的数据量有限，采用某些数据增强方法对训练集进行扩充时反而会降低预测准确率的问题，提出了一种基于注意力机制和迁移学习的残差网络（ResNet）模型。首先，改进了残差网络的残差连接方式；然后，使用极化自注意力（POSA）模块帮助网络模型提取图像的边缘局部细节特征和全局轮廓特征，增强网络模型在小样本环境下的学习能力；最后，改进分类器的算法，提高网络模型的分类性能。实验结果表明，所提模型在敦煌壁画DH1926小样本数据集上，取得了98.05%的朝代分类准确率，与标准的ResNet20网络模型相比，所提模型的朝代识别准确率提高了5.21个百分点。

关键词: 卷积神经网络, 注意力机制, 迁移学习, 残差网络, 古壁画朝代识别

Abstract:

Convolutional Neural Networks （CNNs） have been successfully used to classify dynasties of ancient murals from Dunhuang. Aiming at the problem that using some data enhancement methods to expand the training set would reduce the prediction accuracy due to the limited amount of data of Dunhuang murals， a Residual Network （ResNet） model based on attention mechanism and transfer learning was proposed. Firstly， the residual connection method of the residual network was improved. Then， the POlarized Self-Attention （POSA） module was used to help the network model to extract the edge local detail features and global contour features of the images， and the learning ability of the network model in a small sample environment was enhanced. Finally， the algorithm for classifier was improved， so that the classification performance of the network model was improved. Experimental results show that the proposed model achieves 98.05% accuracy of dynastic classification on DH1926 small sample dataset of Dunhuang murals， and the dynasty identification accuracy of the proposed model is improved by 5.21 percentage points compared with that of the standard ResNet20 network model.

Key words: Convolutional Neural Network (CNN), attention mechanism, transfer learning, Residual Network (ResNet), ancient mural dynasty identification

中图分类号:

TP183

张慧斌, 冯丽萍, 郝耀军, 王一宁. 基于注意力机制和迁移学习的古壁画朝代识别[J]. 计算机应用, 2023, 43(6): 1826-1832.

Huibin ZHANG, Liping FENG, Yaojun HAO, Yining WANG. Ancient mural dynasty identification based on attention mechanism and transfer learning[J]. Journal of Computer Applications, 2023, 43(6): 1826-1832.

图/表 11

参考文献 27

1	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［EB/OL］. （2015-04-10）［2022-05-10］..
2	曹建芳，闫敏敏，贾一鸣，等. 融合迁移学习的Inception-v3模型在古壁画朝代识别中的应用［J］. 计算机应用， 2021， 41（11）： 3219-3227. 10.11772/j.issn.1001-9081.2020121924
	CAO J F， YAN M M， JIA Y M， et al. Application of Inception-v3 model integrated with transfer learning in dynasty identification of ancient murals［J］. Journal of Computer Applications， 2021， 41（11）： 3219-3227. 10.11772/j.issn.1001-9081.2020121924
3	BALAKRISHNAN T， ROSSTON S， TANG E. Using CNN to classify and understand artists from the Rijksmuseum［R/OL］. ［2022-05-10］..
4	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
5	LI Q Q， ZOU Q， MA D， et al. Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes［J］. Science China Information Sciences， 2018， 61（9）： No.092105. 10.1007/s11432-017-9308-x
6	曹建芳，闫敏敏，田晓东，等. 适应性增强胶囊网络的古壁画朝代识别算法［J］. 图学学报， 2021， 42（5）： 744-754.
	CAO J F， YAN M M， TIAN X D， et al. A dynasty classification algorithm of ancient murals based on adaptively enhanced capsule network［J］. Journal of Graphics， 2021， 42（5）： 744-754.
7	LI X Y， ZENG Y， GONG Y. Chronological classification of ancient paintings of Mogao Grottoes using convolutional neural networks［C］// Proceedings of the IEEE 4th International Conference on Signal and Image Processing. Piscataway： IEEE， 2019：51-55. 10.1109/siprocess.2019.8868392
8	ZHU Z D， LIN K X， JAIN A K， et al. Transfer learning in deep reinforcement learning： a survey［EB/OL］. （2022-05-16）［2022-06-10］..
9	KRIZHEVSKY A. Learning multiple layers of features from tiny images［R/OL］. （2009-04-08）［2022-05-10］.. 10.1016/j.tics.2007.09.004
10	YOSINSKI J， CLUNE J， BENGIO Y， et al. How transferable are features in deep neural networks？［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014： 3320-3328.
11	DONAHUE J， JIA Y Q， VINYALS O， et al. DeCAF： a deep convolutional activation feature for generic visual recognition ［C］// Proceedings of the 31st International Conference on Machine Learning. New York： JMLR.org， 2014： 647-655.
12	LONG M S， CAO Y， WANG J M， et al. Learning transferable features with deep adaptation networks［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015：97-105.
13	GANIN Y， LEMPITSKY V. Unsupervised domain adaptation by backpropagation［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 1180-1189.
14	GUO M H， XU T X， LIU J J， et al. Attention mechanisms in computer vision： a survey［J］. Computational Visual Media， 2022， 8（3）： 331-368. 10.1007/s41095-022-0271-y
15	BAHDANAU D， CHO K， BENGIO Y. Neural machine translation by jointly learning to align and translate ［EB/OL］. （2016-05-19）［2022-05-10］.. 10.1017/9781108608480.003
16	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
17	WANG X L， GIRSHICK R， GUPTA A， et al. Non-local neural networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7794-7803. 10.1109/cvpr.2018.00813
18	MISRA D， NALAMADA T， ARASANIPALAI A U， et al. Rotate to attend： convolutional triplet attention module ［C］// Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021：3138-3147. 10.1109/wacv48630.2021.00318
19	QIN Z， SUN W X， DENG H， et al. cosFormer： rethinking softmax in attention［EB/OL］. （2022-02-17）［2022-05-10］..
20	LIU H J， LIU F Q， FAN X Y， et al. Polarized self-attention： towards high-quality pixel-wise regression［EB/OL］. （2021-07-08）［2022-05-10］.. 10.1016/j.neucom.2022.07.054
21	CHEN W Y， LIU Y H， KIRA Z， et al. A closer look at few-shot classification［EB/OL］. （2020-01-12）［2022-05-10］..
22	HE T， ZHANG Z， ZHANG H， et al. Bag of tricks for image classification with convolutional neural networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 558-567. 10.1109/cvpr.2019.00065
23	HE K， ZHANG X， REN S， et al. Delving deep into rectifiers： surpassing human-level performance on ImageNet classification［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1026-1034. 10.1109/iccv.2015.123
24	ZHU C， NI R K， XU Z， et al. GradInit： learning to initialize neural networks for stable and efficient training［C/OL］// Proceedings of the 35th Conference on Neural Information Processing Systems ［2022-05-10］..
25	DE S， SMITH S L. Batch normalization biases residual blocks towards the identity function in deep networks ［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2020： 19964-19975.
26	ZHANG H B， FENG L P， ZHANG X H， et al. Necessary conditions for convergence of CNNs and initialization of convolution kernels［J］. Digital Signal Processing， 2022， 123： No.103397. 10.1016/j.dsp.2022.103397
27	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. （2017-01-30）［2022-05-10］..

层名	输出map 尺寸	输出 channel 数	卷积操作方式	卷积操作数
Linear	64×6 average pool， 64-6 fc + Softmax
Conv1.X	112×112	16	3×3 S=2	1
Conv2.X	112×112	16	3×3 S=1 3个残差块	6
Attention		16
残差连接	56×56	32	改进的残差连接方法（图2）
Conv2.X	56×56	32	第一个卷积3×3 S=2 其他卷积3×3 S=1 3个残差块	6
残差连接	28×28	64	改进的残差连接方法（图2）
Conv3.X	28×28	64	第一个卷积3×3 S=2 其他卷积3×3 S=1 3个残差块	6

层名	输出map 尺寸	输出 channel 数	卷积操作方式	卷积操作数
Linear	64×6 average pool， 64-6 fc + Softmax
Conv1.X	112×112	16	3×3 S=2	1
Conv2.X	112×112	16	3×3 S=1 3个残差块	6
Attention		16
残差连接	56×56	32	改进的残差连接方法（图2）
Conv2.X	56×56	32	第一个卷积3×3 S=2 其他卷积3×3 S=1 3个残差块	6
残差连接	28×28	64	改进的残差连接方法（图2）
Conv3.X	28×28	64	第一个卷积3×3 S=2 其他卷积3×3 S=1 3个残差块	6

古壁画朝代	总样本数	训练集样本数	测试集样本数
总计	1 926	1 158	768
北魏	303	175	128
北周	276	148	128
隋代	271	143	128
唐朝	341	213	128
五代	270	142	128
西魏	465	337	128

古壁画朝代	总样本数	训练集样本数	测试集样本数
总计	1 926	1 158	768
北魏	303	175	128
北周	276	148	128
隋代	271	143	128
唐朝	341	213	128
五代	270	142	128
西魏	465	337	128

模型	总样本数	训练集		测试集		准确率/%
模型	总样本数	样本数	占比/%	样本数	占比/%	准确率/%
DunNet^［5］	3 860	3 000	77.7	700	18.1	71.64
文献［6］模型	9 630	8 430	87.5	1 200	12.5	84.44
文献［7］模型	2 538	2 030	80.0	254	10.0	88.46
文献［2］模型	9 700	7 760	80.0	970	10.0	88.70
本文网络模型	1 926	1 158	60.1	768	39.9	98.05

基于注意力机制和迁移学习的古壁画朝代识别

Ancient mural dynasty identification based on attention mechanism and transfer learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 27

相关文章 15

编辑推荐

Metrics

总样本数	训练集		测试集		测试准确率/%
总样本数	样本数	占比/%	样本数	百分比/%	测试准确率/%
1 926	964	50.1	962	49.9	97.56
1 926	1 158	60.1	768	39.9	98.05
1 926	1 542	80.1	384	19.9	98.70

网络模型	训练集样本数	测试集样本数	测试准确率/%
无POSA模块的ResNet20	1 158	768	92.84
有POSA模块的ResNet20	1 158	768	96.00

[1]	郑智雄, 刘建华, 孙水华, 徐戈, 林鸿辉. 融合多窗口局部信息的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1796-1802.
[2]	王辉, 李建红. 基于Transformer的三维模型小样本识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1750-1758.
[3]	方可, 刘蓉, 魏驰宇, 张心月, 刘杨. 复杂场景下的行人跌倒检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1811-1817.
[4]	鲁斌, 柳杰林. 基于特征增强的三维点云语义分割[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1818-1825.
[5]	张奕, 王真梅. 图自动编码器上二阶段融合实现的环状RNA-疾病关联预测[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1979-1986.
[6]	张凯, 覃正楚, 刘月, 秦心怡. 多学习行为协同的知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1422-1429.
[7]	许睿, 梁爽, 万航, 文益民, 沈世铭, 李建. 基于烛台图模式匹配的PM_2.5扩散特征的提取[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1394-1400.
[8]	隋佳宏, 毛莺池, 于慧敏, 王子成, 平萍. 基于图注意力网络的全局图像描述生成方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1409-1415.
[9]	石利锋, 倪郑威. 基于槽位相关信息提取的对话状态追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1430-1437.
[10]	蒋瑞林, 覃仁超. 基于深度可分离卷积的多神经网络恶意代码检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1527-1533.
[11]	何建辉, 胡春龙, 束鑫. 基于多峰标签分布学习的多任务年龄估计方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1578-1583.
[12]	丁正凯, 傅启明, 陈建平, 陆悠, 吴宏杰, 方能炜, 邢镔. 结合注意力机制与深度强化学习的超短期光伏功率预测[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1647-1654.
[13]	刘辉, 张琳玉, 王复港, 何如瑾. 基于注意力机制和上下文信息的目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1557-1564.
[14]	傅励瑶, 尹梦晓, 杨锋. 基于Transformer的U型医学图像分割网络综述[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1584-1595.
[15]	王彬, 向甜, 吕艺东, 王晓帆. 基于NSGA‑Ⅱ的自适应多尺度特征通道分组优化算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1401-1408.

分类器	测试准确率/%
Baseline	97.00
Baseline++	96.61
本文的分类器	98.05

分类器	测试准确率/%
Baseline	97.00
Baseline++	96.61
本文的分类器	98.05