Transfer learning model based on improved domain separation network

doi:10.11772/j.issn.1001-9081.2022071103

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (8): 2382-2389.DOI: 10.11772/j.issn.1001-9081.2022071103

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Transfer learning model based on improved domain separation network

Zexi JIN¹, Lei LI¹^,², Ji LIU¹^,²

^1.Institute of Statistics and Data Science，Xinjiang University of Finance and Economics，Urumqi Xinjiang 830012，China
^2.Xinjiang Social and Economic Statistics and Big Data Application Research Center （Xinjiang University of Finance and Economics），Urumqi Xinjiang 830012，China

Received:2022-07-29 Revised:2022-11-21 Accepted:2022-11-30 Online:2023-01-15 Published:2023-08-10
Contact: Lei LI
About author:JIN Zexi， born in 1998， M. S. candidate. His research interests include machine learning， big data analysis.
LIU Ji， born in 1974， Ph. D.， professor. His research interests include data intelligent analysis.
Supported by:
National Natural Science Foundation of China(71762028)

基于改进领域分离网络的迁移学习模型

金泽熙¹, 李磊¹^,², 刘继¹^,²

^1.新疆财经大学统计与数据科学学院, 乌鲁木齐 830012
^2.新疆社会经济统计与大数据应用研究中心(新疆财经大学), 乌鲁木齐 830012

通讯作者: 李磊
作者简介:金泽熙（1998—），男，江苏盐城人，硕士研究生，主要研究方向：机器学习、大数据分析
刘继（1974—），男，四川达州人，教授，博士，主要研究方向：数据智能分析。
基金资助:
国家自然科学基金资助项目(71762028)

Abstract

Abstract:

In order to further improve the feature recognition and extraction efficiency of transfer learning， reduce negative transfer and enhance the learning performance of the model， a transfer learning model based on improved Domain Separation Network （DSN） — AMCN-DSN （Attention Mechanism Capsule Network-DSN） was proposed. Firstly， the extraction and reconstruction of feature information in the source and target domains were accomplished by using Multi-Head Attention CapsNet （MHAC）， the feature information was filtered effectively based on the attention mechanism， and the capsule network was adopted to improve the extraction quality of deep information. Secondly， a dynamic adversarial factor was introduced to optimize the reconstruction loss function， so that the reconstructor was able to dynamically measure the relative importance of the source and target domain information to improve the robustness and convergence speed of transfer learning. Finally， a multi-head self-attention mechanism was incorporated into the classifier to enhance the semantic understanding of the public features and improve the classification performance. In the sentiment analysis experiments， compared to other transfer learning models， the proposed model can transfer the learned knowledge to tasks with less data but high similarity with the least degradation of classification performance and good transfer performance. In the intent recognition experiments， the proposed model improves the precision， recall and F1 score by 4.5%， 4.3% and 4.4% respectively， compared to the model with suboptimal classification performance — Capsule Network improved Domain Adversarial Neural Network （DANN+CapsNet） model， showing certain advantages of the proposed model in dealing with small data problems and personalization problems. In comparison with DSN， AMCN-DSN has the F1 scores on the target domain in the above-mentioned two types of experiments improved by 6.0% and 12.4% respectively， further validating the effectiveness of the improved model.

Key words: transfer learning, Domain Separation Network (DSN), capsule network, attention mechanism, Natural Language Processing (NLP)

摘要：

为进一步提高迁移学习的特征识别和提取效率、减少负迁移并增强模型的学习性能，提出了一种基于改进领域分离网络（DSN）的迁移学习模型AMCN-DSN（Attention Mechanism Capsule Network-DSN）。首先，使用融合多头注意力机制的胶囊网络（MHAC）完成源域和目标域特征信息的提取与重构，基于注意力机制有效筛选特征信息，并利用胶囊网络提高深层信息的提取质量；其次，引入动态对抗因子优化重构损失函数，使重构器可动态衡量源域与目标域信息的相对重要性，从而增强迁移学习的鲁棒性和提升收敛速度；最后，在分类器中融入多头自注意力机制，以强化对公有特征的语义理解并提高分类性能。在情感分析实验中，相较于其他迁移学习模型，所提模型能够将学习到的知识迁移到数据量少但相似性高的任务中，分类性能的下降幅度最小，迁移表现较好；在意图识别实验中，相较于分类性能次优的胶囊网络改进领域对抗神经网络（DANN+CapsNet）模型，所提模型的精确度、召回率和F1值分别提升了4.5%、4.3%和4.4%，表明所提模型在处理小数据问题和个性化问题上具有一定优势。与DSN相比，AMCN-DSN在上述两类实验目标域上的F1值分别提高了6.0%和12.4%，进一步验证了改进模型的有效性。

关键词: 迁移学习, 领域分离网络, 胶囊网络, 注意力机制, 自然语言处理

CLC Number:

TP391.1

Zexi JIN, Lei LI, Ji LIU. Transfer learning model based on improved domain separation network[J]. Journal of Computer Applications, 2023, 43(8): 2382-2389.

金泽熙, 李磊, 刘继. 基于改进领域分离网络的迁移学习模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2382-2389.

Figures/Tables 10

Fig. 1 Architecture of DSN

Fig. 2 Model framework and training process of AMCN-DSN

Fig. 3 Structure of MHAC

Tab.1 Parameter setting of AMCN-DSN model

参数类型	可调参数	值
胶囊网络参数	初始胶囊维度	16
	主胶囊维度	32
	动态路由次数Routing	3
	胶囊网络优化函数	Adam
模型训练参数	学习率lr	0.001
	迭代次数epoch	20
	批处理大小batch_size	32

Tab.2 Performance comparison of transfer learning models in sentiment analysis

模型	源域数据集	目标域数据集
模型	F1	P	R	F1
DCGAN^［23］	0.855 3	0.773 9	0.784 1	0.779 0
DANN^［12］	0.895 4	0.810 4	0.812 6	0.811 5
DANN+CapsNet^［4］	0.915 5	0.824 2	0.831 5	0.827 8
ADDA^［14］	0.902 1	0.819 2	0.816 5	0.817 8
Res-CapsNet^［24］	0.917 4	0.832 2	0.830 7	0.831 4
DAAN^［15］	0.910 2	0.829 5	0.832 2	0.830 8
ASDA^［17］	0.889 6	0.807 9	0.797 0	0.802 4
AMCN-DSN	0.9239	0.8577	0.8429	0.8502

Tab.3 Performance comparison of transfer learning models in intent recognition

模型	源域数据集	目标域数据集
模型	F1	P	R	F1
DCGAN^［23］	0.746 3	0.658 7	0.651 4	0.655 0
DANN^［12］	0.791 0	0.702 7	0.706 9	0.704 8
DANN+CapsNet^［4］	0.823 1	0.742 6	0.741 1	0.741 8
ADDA^［14］	0.802 5	0.712 2	0.701 6	0.706 9
Res-CapsNet^［24］	0.815 8	0.740 9	0.740 3	0.740 6
DAAN^［15］	0.821 4	0.735 9	0.733 2	0.734 5
ADSA^［17］	0.771 9	0.673 3	0.691 4	0.682 2
AMCN-DSN	0.8439	0.7758	0.7733	0.7745

Tab.4 Ablation experimental results on sentiment analysis and intent recognition tasks

模型	情感分析任务				意图识别任务
	源域	目标域			源域	目标域
	F1	P	R	F1	F1	P	R	F1
DSN^［16］	0.892 4	0.809 2	0.795 5	0.802 2	0.774 2	0.686 1	0.691 8	0.688 9
AMCN-DSN（-MSHA， $- ω ̂$ ）	0.910 2	0.816 6	0.819 5	0.822 7	0.814 7	0.732 3	0.739 8	0.736 0
AMCN-DSN（ $- ω ̂$ ）	0.903 7	0.835 0	0.841 6	0.838 3	0.826 8	0.747 5	0.746 6	0.747 0
AMCN-DSN（-MSHA）	0.914 5	0.844 2	0.836 9	0.840 5	0.833 5	0.756 3	0.753 5	0.754 9
AMCN-DSN	0.9239	0.8577	0.8429	0.8502	0.8439	0.7758	0.7733	0.7745

Tab.4 Ablation experimental results on sentiment analysis and intent recognition tasks

模型	情感分析任务				意图识别任务
	源域	目标域			源域	目标域
	F1	P	R	F1	F1	P	R	F1
DSN^［16］	0.892 4	0.809 2	0.795 5	0.802 2	0.774 2	0.686 1	0.691 8	0.688 9
AMCN-DSN（-MSHA， $- ω ̂$ ）	0.910 2	0.816 6	0.819 5	0.822 7	0.814 7	0.732 3	0.739 8	0.736 0
AMCN-DSN（ $- ω ̂$ ）	0.903 7	0.835 0	0.841 6	0.838 3	0.826 8	0.747 5	0.746 6	0.747 0
AMCN-DSN（-MSHA）	0.914 5	0.844 2	0.836 9	0.840 5	0.833 5	0.756 3	0.753 5	0.754 9
AMCN-DSN	0.9239	0.8577	0.8429	0.8502	0.8439	0.7758	0.7733	0.7745

Fig. 4 Effect of improving feature extractor

Fig. 5 Loss-epoch curves based on different reconstruction loss functions

Fig.6 Heatmaps of sentiment analysis for different models

References 24

1	SCHMIDHUBER J. Deep learning in neural networks： an overview［J］. Neural Networks， 2015， 61： 85-117. 10.1016/j.neunet.2014.09.003
2	ZHUANG F Z， QI Z Y， DUAN K Y， et al. A comprehensive survey on transfer learning［J］. Proceedings of the IEEE， 2021， 109（1）： 43-76. 10.1109/jproc.2020.3004555
3	吴冬茵，桂林，陈钊，等. 基于深度表示学习和高斯过程迁移学习的情感分析方法［J］. 中文信息学报， 2017， 31（1）： 169-176.
	WU D Y， GUI L， CHEN Z， et al. Sentiment analysis based on deep representation learning and Gaussian processes transfer learning［J］. Journal of Chinese Information Processing， 2017， 31（1）：169-176.
4	赵鹏飞，李艳玲，林民. 结合胶囊网络的领域适应意图识别［J］. 计算机工程与应用， 2021， 57（21）： 188-194.
	ZHAO P F， LI Y L， LIN M. Intent detection of domain adaptation combined with capsule network［J］. Computer Engineering and Applications， 2021， 57（21）： 188-194.
5	范涛，王昊，陈玥彤. 基于深度迁移学习的地方志多模态命名实体识别研究［J］. 情报学报， 2022， 41（4）： 412-423. 10.3772/j.issn.1000-0135.2022.04.008
	FAN T， WANG H， CHEN Y T. Research on multimodal named entity recognition of local history based on deep transfer learning［J］. Journal of the China Society for Scientific and Technical Information， 2022， 41（4）： 412-423. 10.3772/j.issn.1000-0135.2022.04.008
6	赵鹏飞，李艳玲，林民. 面向迁移学习的意图识别研究进展［J］. 计算机科学与探索， 2020， 14（8）： 1261-1274.
	ZHAO P F， LI Y L， LIN M. Research progress on intent detection oriented to transfer learning［J］. Journal of Frontiers of Computer Science and Technology， 2020， 14（8）： 1261-1274.
7	PAN S J， YANG Q. A survey on transfer learning［J］. IEEE Transactions on Knowledge and Data Engineering， 2010， 22（10）： 1345-1359. 10.1109/tkde.2009.191
8	DAI W Y， YANG Q， XUE G R， et al. Boosting for transfer learning［C］// Proceedings of the 24th International Conference on Machine Learning. New York： ACM， 2007：193-200. 10.1145/1273496.1273521
9	BO C， LAM W， TSANG I， et al. Extracting discriminative concepts for domain adaptation in text mining［C］// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2009：179-188. 10.1145/1557019.1557045
10	CHANG H， HAN J， ZHONG C， et al. Unsupervised transfer learning via multi-scale convolutional sparse coding for biomedical applications［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（5）： 1182-1194. 10.1109/tpami.2017.2656884
11	GOODFELLOW I， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial nets［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. Cambridge： MIT Press， 2014：2672-2680.
12	GANIN Y， USTINOVA E， AJAKAN H， et al. Domain-adversarial training of neural networks［J］. Journal of Machine Learning Research， 2016， 17： 1-35. 10.1007/978-3-319-58347-1_10
13	TZENG E， HOFFMAN J， DARRELL T， et al. Simultaneous deep transfer across domains and tasks［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 4068-4076. 10.1109/iccv.2015.463
14	TZENG E， HOFFMAN J， SAENKO K， et al. Adversarial discriminative domain adaptation［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2962-2971. 10.1109/cvpr.2017.316
15	YU C H， WANG J D， CHEN Y Q， et al. Transfer learning with dynamic adversarial adaptation network［C］// Proceedings of the 2019 IEEE International Conference on Data Mining. Piscataway： IEEE， 2019： 778-786. 10.1109/icdm.2019.00088
16	BOUSMALIS K， TRIGEORGIS G， SILBERMAN N， et al. Domain separation networks［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2016：343-351.
17	TSAI J C， CHIEN J T. Adversarial domain separation and adaptation［C］// Proceedings of the IEEE 27th International Workshop on Machine Learning for Signal Processing. Piscataway： IEEE， 2017： 1-6. 10.1109/mlsp.2017.8168121
18	BEN-DAVID S， BLITZER J， CRAMMER K， et al. Analysis of representations for domain adaptation［C］// Proceedings of the 19th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2006：137-144. 10.7551/mitpress/7503.003.0022
19	BEN-DAVID S， BLITZER J， CRAMMER K， et al. A theory of learning from different domains［J］. Machine Learning， 2010， 79（1/2）： 151-175. 10.1007/s10994-009-5152-4
20	王家乾，龚子寒，薛云，等. 基于混合多头注意力和胶囊网络的特定目标情感分析［J］. 中文信息学报， 2020， 34（5）： 100-110. 10.3969/j.issn.1003-0077.2020.05.014
	WANG J Q， GONG Z H， XUE Y， et al. Aspect-based sentiment analysis based on hybrid multi-head attention and capsule networks［J］. Journal of Chinese Information Processing， 2020， 34（5）： 100-110. 10.3969/j.issn.1003-0077.2020.05.014
21	China Computer Federation. NLPCC 2014 Evaluation tasks test data download［DB/OL］. ［2022-11-20］..
22	Corporation iFLYTEK. SMP2018-ECDT task 1 dataset［DS/OL］. ［2022-11-20］..
23	FANG W， ZHANG F H， SHENG V S， et al. A method for improving CNN-based image recognition using DCGAN［J］. Computers， Materials and Continua， 2018， 57（1）： 167-178. 10.32604/cmc.2018.02356
24	戴宏，盛立杰，苗启广. 基于胶囊网络的对抗判别域适应算法［J］. 计算机研究与发展， 2021， 58（9）： 1997-2012. 10.7544/issn1000-1239.2021.20200569
	DAI H， SHENG L J， MIAO Q G. Adversarial discriminative domain adaptation algorithm with CapsNet［J］. Journal of Computer Research and Development， 2021， 58（9）： 1997-2012. 10.7544/issn1000-1239.2021.20200569

[1]	Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688.
[2]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[3]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[4]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[5]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[6]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[7]	Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482.
[8]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[9]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[10]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[11]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[12]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[13]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[14]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[15]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.

Transfer learning model based on improved domain separation network

基于改进领域分离网络的迁移学习模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 24

Related Articles 15

Recommended Articles

Metrics