Ancient mural dynasty identification based on attention mechanism and transfer learning

doi:10.11772/j.issn.1001-9081.2022071008

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (6): 1826-1832.DOI: 10.11772/j.issn.1001-9081.2022071008

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Ancient mural dynasty identification based on attention mechanism and transfer learning

Huibin ZHANG¹^,²(), Liping FENG¹, Yaojun HAO¹, Yining WANG¹

^1.Department of Computer，Xinzhou Normal University，Xinzhou Shanxi 034000，China
^2.School of Information Science and Engineering，Yanshan University，Qinhuangdao Hebei 066004，China

Received:2022-07-11 Revised:2022-11-18 Accepted:2022-11-30 Online:2023-01-04 Published:2023-06-10
Contact: Huibin ZHANG
About author:FENG Liping， born in 1976， Ph. D.， professor. Her research interests include distributed optimization， deep learning.
HAO Yaojun， born in 1979， Ph. D.， professor. His research interests include deep learning， information security of recommendation system.
WANG Yining， born in 1992， M. S.， teaching assistant. Her research interests include deep learning， artificial intelligence.
Supported by:
Youth Foundation of Humanities and Social Sciences Research of Ministry of Education(20YJC630034);Natural Science Foundation of Shanxi Province(20210302124330);Research Project Supported by Shanxi Scholarship Council of China(2020-139)

基于注意力机制和迁移学习的古壁画朝代识别

张慧斌¹^,²(), 冯丽萍¹, 郝耀军¹, 王一宁¹

^1.忻州师范学院计算机系，山西忻州 034000
^2.燕山大学信息科学与工程学院，河北秦皇岛 066004

通讯作者: 张慧斌
作者简介:张慧斌（1971—），男，山西忻州人，副教授，博士研究生，主要研究方向：深度学习、应用数学Email：927433441@qq.com
冯丽萍（1976—），女，山西忻州人，教授，博士，主要研究方向：分布式优化、深度学习
郝耀军（1979—），男，山西忻州人，教授，博士，主要研究方向：深度学习、推荐系统的信息安全
王一宁（1992—），女，山西长治人，助教，硕士，主要研究方向：深度学习、人工智能。
基金资助:
教育部人文社科青年基金资助项目(20YJC630034);山西省自然科学基金资助项目(20210302124330);山西省回国留学人员科研资助项目(2020-139)

Abstract

Abstract:

Convolutional Neural Networks （CNNs） have been successfully used to classify dynasties of ancient murals from Dunhuang. Aiming at the problem that using some data enhancement methods to expand the training set would reduce the prediction accuracy due to the limited amount of data of Dunhuang murals， a Residual Network （ResNet） model based on attention mechanism and transfer learning was proposed. Firstly， the residual connection method of the residual network was improved. Then， the POlarized Self-Attention （POSA） module was used to help the network model to extract the edge local detail features and global contour features of the images， and the learning ability of the network model in a small sample environment was enhanced. Finally， the algorithm for classifier was improved， so that the classification performance of the network model was improved. Experimental results show that the proposed model achieves 98.05% accuracy of dynastic classification on DH1926 small sample dataset of Dunhuang murals， and the dynasty identification accuracy of the proposed model is improved by 5.21 percentage points compared with that of the standard ResNet20 network model.

Key words: Convolutional Neural Network (CNN), attention mechanism, transfer learning, Residual Network (ResNet), ancient mural dynasty identification

摘要：

卷积神经网络（CNN）已成功用于敦煌古壁画的朝代分类。针对敦煌壁画的数据量有限，采用某些数据增强方法对训练集进行扩充时反而会降低预测准确率的问题，提出了一种基于注意力机制和迁移学习的残差网络（ResNet）模型。首先，改进了残差网络的残差连接方式；然后，使用极化自注意力（POSA）模块帮助网络模型提取图像的边缘局部细节特征和全局轮廓特征，增强网络模型在小样本环境下的学习能力；最后，改进分类器的算法，提高网络模型的分类性能。实验结果表明，所提模型在敦煌壁画DH1926小样本数据集上，取得了98.05%的朝代分类准确率，与标准的ResNet20网络模型相比，所提模型的朝代识别准确率提高了5.21个百分点。

关键词: 卷积神经网络, 注意力机制, 迁移学习, 残差网络, 古壁画朝代识别

CLC Number:

TP183

Huibin ZHANG, Liping FENG, Yaojun HAO, Yining WANG. Ancient mural dynasty identification based on attention mechanism and transfer learning[J]. Journal of Computer Applications, 2023, 43(6): 1826-1832.

张慧斌, 冯丽萍, 郝耀军, 王一宁. 基于注意力机制和迁移学习的古壁画朝代识别[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1826-1832.

Figures/Tables 11

References 27

1	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［EB/OL］. （2015-04-10）［2022-05-10］..
2	曹建芳，闫敏敏，贾一鸣，等. 融合迁移学习的Inception-v3模型在古壁画朝代识别中的应用［J］. 计算机应用， 2021， 41（11）： 3219-3227. 10.11772/j.issn.1001-9081.2020121924
	CAO J F， YAN M M， JIA Y M， et al. Application of Inception-v3 model integrated with transfer learning in dynasty identification of ancient murals［J］. Journal of Computer Applications， 2021， 41（11）： 3219-3227. 10.11772/j.issn.1001-9081.2020121924
3	BALAKRISHNAN T， ROSSTON S， TANG E. Using CNN to classify and understand artists from the Rijksmuseum［R/OL］. ［2022-05-10］..
4	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
5	LI Q Q， ZOU Q， MA D， et al. Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes［J］. Science China Information Sciences， 2018， 61（9）： No.092105. 10.1007/s11432-017-9308-x
6	曹建芳，闫敏敏，田晓东，等. 适应性增强胶囊网络的古壁画朝代识别算法［J］. 图学学报， 2021， 42（5）： 744-754.
	CAO J F， YAN M M， TIAN X D， et al. A dynasty classification algorithm of ancient murals based on adaptively enhanced capsule network［J］. Journal of Graphics， 2021， 42（5）： 744-754.
7	LI X Y， ZENG Y， GONG Y. Chronological classification of ancient paintings of Mogao Grottoes using convolutional neural networks［C］// Proceedings of the IEEE 4th International Conference on Signal and Image Processing. Piscataway： IEEE， 2019：51-55. 10.1109/siprocess.2019.8868392
8	ZHU Z D， LIN K X， JAIN A K， et al. Transfer learning in deep reinforcement learning： a survey［EB/OL］. （2022-05-16）［2022-06-10］..
9	KRIZHEVSKY A. Learning multiple layers of features from tiny images［R/OL］. （2009-04-08）［2022-05-10］.. 10.1016/j.tics.2007.09.004
10	YOSINSKI J， CLUNE J， BENGIO Y， et al. How transferable are features in deep neural networks？［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014： 3320-3328.
11	DONAHUE J， JIA Y Q， VINYALS O， et al. DeCAF： a deep convolutional activation feature for generic visual recognition ［C］// Proceedings of the 31st International Conference on Machine Learning. New York： JMLR.org， 2014： 647-655.
12	LONG M S， CAO Y， WANG J M， et al. Learning transferable features with deep adaptation networks［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015：97-105.
13	GANIN Y， LEMPITSKY V. Unsupervised domain adaptation by backpropagation［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 1180-1189.
14	GUO M H， XU T X， LIU J J， et al. Attention mechanisms in computer vision： a survey［J］. Computational Visual Media， 2022， 8（3）： 331-368. 10.1007/s41095-022-0271-y
15	BAHDANAU D， CHO K， BENGIO Y. Neural machine translation by jointly learning to align and translate ［EB/OL］. （2016-05-19）［2022-05-10］.. 10.1017/9781108608480.003
16	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
17	WANG X L， GIRSHICK R， GUPTA A， et al. Non-local neural networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7794-7803. 10.1109/cvpr.2018.00813
18	MISRA D， NALAMADA T， ARASANIPALAI A U， et al. Rotate to attend： convolutional triplet attention module ［C］// Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021：3138-3147. 10.1109/wacv48630.2021.00318
19	QIN Z， SUN W X， DENG H， et al. cosFormer： rethinking softmax in attention［EB/OL］. （2022-02-17）［2022-05-10］..
20	LIU H J， LIU F Q， FAN X Y， et al. Polarized self-attention： towards high-quality pixel-wise regression［EB/OL］. （2021-07-08）［2022-05-10］.. 10.1016/j.neucom.2022.07.054
21	CHEN W Y， LIU Y H， KIRA Z， et al. A closer look at few-shot classification［EB/OL］. （2020-01-12）［2022-05-10］..
22	HE T， ZHANG Z， ZHANG H， et al. Bag of tricks for image classification with convolutional neural networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 558-567. 10.1109/cvpr.2019.00065
23	HE K， ZHANG X， REN S， et al. Delving deep into rectifiers： surpassing human-level performance on ImageNet classification［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1026-1034. 10.1109/iccv.2015.123
24	ZHU C， NI R K， XU Z， et al. GradInit： learning to initialize neural networks for stable and efficient training［C/OL］// Proceedings of the 35th Conference on Neural Information Processing Systems ［2022-05-10］..
25	DE S， SMITH S L. Batch normalization biases residual blocks towards the identity function in deep networks ［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2020： 19964-19975.
26	ZHANG H B， FENG L P， ZHANG X H， et al. Necessary conditions for convergence of CNNs and initialization of convolution kernels［J］. Digital Signal Processing， 2022， 123： No.103397. 10.1016/j.dsp.2022.103397
27	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. （2017-01-30）［2022-05-10］..

层名	输出map 尺寸	输出 channel 数	卷积操作方式	卷积操作数
Linear	64×6 average pool， 64-6 fc + Softmax
Conv1.X	112×112	16	3×3 S=2	1
Conv2.X	112×112	16	3×3 S=1 3个残差块	6
Attention		16
残差连接	56×56	32	改进的残差连接方法（图2）
Conv2.X	56×56	32	第一个卷积3×3 S=2 其他卷积3×3 S=1 3个残差块	6
残差连接	28×28	64	改进的残差连接方法（图2）
Conv3.X	28×28	64	第一个卷积3×3 S=2 其他卷积3×3 S=1 3个残差块	6

层名	输出map 尺寸	输出 channel 数	卷积操作方式	卷积操作数
Linear	64×6 average pool， 64-6 fc + Softmax
Conv1.X	112×112	16	3×3 S=2	1
Conv2.X	112×112	16	3×3 S=1 3个残差块	6
Attention		16
残差连接	56×56	32	改进的残差连接方法（图2）
Conv2.X	56×56	32	第一个卷积3×3 S=2 其他卷积3×3 S=1 3个残差块	6
残差连接	28×28	64	改进的残差连接方法（图2）
Conv3.X	28×28	64	第一个卷积3×3 S=2 其他卷积3×3 S=1 3个残差块	6

古壁画朝代	总样本数	训练集样本数	测试集样本数
总计	1 926	1 158	768
北魏	303	175	128
北周	276	148	128
隋代	271	143	128
唐朝	341	213	128
五代	270	142	128
西魏	465	337	128

古壁画朝代	总样本数	训练集样本数	测试集样本数
总计	1 926	1 158	768
北魏	303	175	128
北周	276	148	128
隋代	271	143	128
唐朝	341	213	128
五代	270	142	128
西魏	465	337	128

模型	总样本数	训练集		测试集		准确率/%
模型	总样本数	样本数	占比/%	样本数	占比/%	准确率/%
DunNet^［5］	3 860	3 000	77.7	700	18.1	71.64
文献［6］模型	9 630	8 430	87.5	1 200	12.5	84.44
文献［7］模型	2 538	2 030	80.0	254	10.0	88.46
文献［2］模型	9 700	7 760	80.0	970	10.0	88.70
本文网络模型	1 926	1 158	60.1	768	39.9	98.05

Ancient mural dynasty identification based on attention mechanism and transfer learning

基于注意力机制和迁移学习的古壁画朝代识别

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 27

Related Articles 15

Recommended Articles

Metrics

总样本数	训练集		测试集		测试准确率/%
总样本数	样本数	占比/%	样本数	百分比/%	测试准确率/%
1 926	964	50.1	962	49.9	97.56
1 926	1 158	60.1	768	39.9	98.05
1 926	1 542	80.1	384	19.9	98.70

网络模型	训练集样本数	测试集样本数	测试准确率/%
无POSA模块的ResNet20	1 158	768	92.84
有POSA模块的ResNet20	1 158	768	96.00

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[3]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[4]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[5]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[6]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[7]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[8]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[9]	Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499.
[10]	Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242.
[11]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[12]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[13]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[14]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[15]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.

分类器	测试准确率/%
Baseline	97.00
Baseline++	96.61
本文的分类器	98.05

分类器	测试准确率/%
Baseline	97.00
Baseline++	96.61
本文的分类器	98.05