News recommendation model with deep feature fusion injecting attention mechanism

doi:10.11772/j.issn.1001-9081.2021050907

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (2): 426-432.DOI: 10.11772/j.issn.1001-9081.2021050907

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

News recommendation model with deep feature fusion injecting attention mechanism

Yuxi LIU, Yuqi LIU, Zonglin ZHANG, Zhihua WEI, Ran MIAO

College of Electronic and Information Engineering，Tongji University，Shanghai 201804，China

Received:2021-06-01 Revised:2021-07-16 Accepted:2021-07-19 Online:2022-02-11 Published:2022-02-10
About author:LIU Yuxi， born in 2001. Her research interests include machine learning， deep learning.
LIU Yuqi， born in 2001. His research interests include Web mining， deep learning.
ZHANG Zonglin， born in 2001. His research interests include machine learning， deep learning.
WEI Zhihua， born in 1979， Ph. D.， professor. Her research interests include machine learning， data mining， image content analysis， natural language processing.
MIAO Ran， born in 2000. His research interests include natural language processing， data mining.
Supported by:
National Natural Science Foundation of China(61976160);Shanghai Innovative Training Program for College Students(S202110247021)

注入注意力机制的深度特征融合新闻推荐模型

刘羽茜, 刘玉奇, 张宗霖, 卫志华, 苗冉

同济大学电子与信息工程学院，上海 201804

作者简介:刘羽茜（2001—），女，福建南平人，主要研究方向：机器学习、深度学习；
刘玉奇（2001—），男，山东烟台人，CCF会员，主要研究方向：Web挖掘、深度学习；
张宗霖（2001—），男，辽宁丹东人，主要研究方向：机器学习、深度学习；
卫志华（1979—），女，山西晋中人，教授，博士，CCF会员，主要研究方向：机器学习、数据挖掘、图像内容分析、自然语言处理；
苗冉（2000—），男，安徽池州人，主要研究方向：自然语言处理、数据挖掘。
基金资助:
国家自然科学基金资助项目(61976160);上海市大学生创新训练计划项目(S202110247021)

Abstract

Abstract:

When mining news features and user features， the existing news recommendation models often lack comprehensiveness since they often fail to consider the relationship between the browsed news， the change of time series， and the importance of different news to users. At the same time， the existing models also have shortcomings in more fine-grained content feature mining. Therefore， a news recommendation model with deep feature fusion injecting attention mechanism was constructed， which can comprehensively and non-redundantly conduct user characterization and extract the features of more fine-grained news fragments. Firstly， a deep learning-based method was used to deeply extract the feature matrix of news text through the Convolutional Neural Network （CNN） injecting attention mechanism. By adding time series prediction to the news that users had browsed and injecting multi-head self-attention mechanism， the interest characteristics of users were extracted. Finally， a real Chinese dataset and English dataset were used to carry out experiments with convergence time， Mean Reciprocal Rank （MRR） and normalized Discounted Cumulative Gain （nDCG） as indicators. Compared with Neural news Recommendation with Multi-head Self-attention （NRMS） and other models， on the Chinese dataset， the proposed model has the average improvement rate of nDCG from -0.22% to 4.91% and MRR from -0.82% to 3.48%. Compared with the only model with negative improvement rate， the proposed model has the convergence time reduced by 7.63%. on the English dataset， the proposed model has the improvement rates reached 0.07% to 1.75% and 0.03% to 1.30% respectively on nDCG and MRR； At the same time this model always has fast convergence speed. Results of ablation experiments show that adding attention mechanism and time series prediction module is effective.

Key words: news recommendation, natural language processing, attention mechanism, neural network, time series prediction

摘要：

现有新闻推荐模型在挖掘新闻特征和用户特征时，往往没有考虑所浏览新闻之间的关系、时序变化以及不同新闻对用户的重要性，从而缺乏全面性；同时，现有模型在新闻更细粒度的内容特征挖掘方面有欠缺。因此构建了一个能够全面而不冗余地进行用户表征并能提取新闻更细粒度片段特征的新闻推荐模型——注入注意力机制的深度特征融合新闻推荐模型。该模型首先采用基于深度学习的方法，通过注入注意力机制的卷积神经网络（CNN）对新闻文本特征矩阵进行深度提取；然后，通过对用户已经浏览的新闻添加时序预测，并注入多头自注意力机制，来提取用户的兴趣特征；最后，使用真实的中文数据集与英文数据集，以收敛时间、平均值倒数秩（MRR）和归一化折现累积收益（nDCG）为指标进行实验。与基于多头自注意力的神经网络新闻推荐（NRMS）模型等进行对比，该模型在中文数据集上nDCG的提升率为-0.22%~4.91%，MRR的提升率为-0.82%~3.48%，而且，与唯一为负提升率的模型相比，收敛时间缩短7.63%；在英文数据集上该模型在nDCG和MRR上的提升率分别为0.07%~1.75%与0.03%~1.30%，且该模型始终具有较快的收敛速度。消融实验的结果表明增加注意力机制与时序模块是有效的。

CLC Number:

TP183

Yuxi LIU, Yuqi LIU, Zonglin ZHANG, Zhihua WEI, Ran MIAO. News recommendation model with deep feature fusion injecting attention mechanism[J]. Journal of Computer Applications, 2022, 42(2): 426-432.

刘羽茜, 刘玉奇, 张宗霖, 卫志华, 苗冉. 注入注意力机制的深度特征融合新闻推荐模型[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 426-432.

Figures/Tables 10

Fig. 1 Schematic diagram of news browsing history of a user

Fig. 2 Schematic diagram of the proposed model

Tab. 1 Definition of symbols

符号	含义
$l$	新闻文本的长度（单词数）
$W w o r d s$	一条新闻的所有词向量顺序组成的向量
$C o v$	卷积操作后一条新闻文本的所有词向量顺序组成的向量
$k$	卷积核数量
$m$	一个用户浏览的新闻数量
$q w$	单词层次的附加注意力机制查询矩阵
$α i w$	第 $i$ 个单词权重向量
$V n e w s$	编码表征后合并的 $m$ 条新闻特征矩阵
$R$	GRU模块输出的 $m$ 条新闻特征矩阵
$d i m$	用户特征向量维度
$S n e w s$	多头自注意力输出的 $m$ 条新闻特征矩阵
$q n$	新闻层次的附加注意力机制查询矩阵
$α i n$	第 $i$ 条新闻的权重向量
$u$	一个用户的特征向量

Tab. 1 Definition of symbols

符号	含义
$l$	新闻文本的长度（单词数）
$W w o r d s$	一条新闻的所有词向量顺序组成的向量
$C o v$	卷积操作后一条新闻文本的所有词向量顺序组成的向量
$k$	卷积核数量
$m$	一个用户浏览的新闻数量
$q w$	单词层次的附加注意力机制查询矩阵
$α i w$	第 $i$ 个单词权重向量
$V n e w s$	编码表征后合并的 $m$ 条新闻特征矩阵
$R$	GRU模块输出的 $m$ 条新闻特征矩阵
$d i m$	用户特征向量维度
$S n e w s$	多头自注意力输出的 $m$ 条新闻特征矩阵
$q n$	新闻层次的附加注意力机制查询矩阵
$α i n$	第 $i$ 条新闻的权重向量
$u$	一个用户的特征向量

Fig. 3 Structure of neural network based on gated recurrent unit

Tab. 2 Statistics of datasets

类别	中文数据集	英文数据集
用户数	9 457	100 000
新闻数	100 197	120 962
平均新闻标题长度	14.00	11.52
平均新闻内容长度	584.00	585.05

Tab. 3 Hyperparameter value

名称	含义	取值
Embedding dim	词向量维度	256
Seq length	新闻序列长度	300
Num classes	新闻类别数	10
Num filters	卷积核数量	256
Kernel size	卷积核尺寸	3*256
Vocab size	词汇表大小	5 000
Hidden dim	全连接层神经络神经元个数	256
Dropout keep prob	正则化保留比例	0.5
Learning rate	学习率	2.00E-04
Batch size	每批训练大小	64
Num epochs	总迭代轮次	10
Print per batch	每多少轮输出一次结果	10
Save per batch	每多少轮存入tensorboard	10
Attention size	附加新闻注意力机制的维度	128
Query vector dim	注意力机制query向量的维度	128
Candidate num	候选新闻个数	5
Click num	用户已浏览新闻个数	20
Num attention heads	多头注意力机制头个数	16

Tab. 4 Experimental results of different models on Chinese dataset

模型	nDCG	MRR	收敛时间
本文模型	0.820 6	0.781 9	0：01：49
NRMS	0.794 7	0.763 8	0：06：36
TANR	0.801 4	0.756 9	0：02：14
LSTUR	0.822 4	0.788 4	0：01：58
NAML	0.782 2	0.755 6	0：02：52

Tab. 5 Experimental results of different models on English dataset

模型	nDCG	MRR	收敛时间
本文模型	0.946 8	0.977 7	0：10：34
NRMS	0.945 1	0.966 3	0：21：14
TANR	0.940 9	0.973 6	0：09：41
LSTUR	0.946 1	0.977 4	0：09：09
NAML	0.938 1	0.971 0	0：11：39
DKN	0.930 5	0.965 2	0：15：48

Tab. 6 Performance comparison of different attention mechanisms

不同注意力机制模型	nDCG	MRR
no attention	0.743 8	0.662 8
words attention	0.815 6	0.662 8
news attention	0.745 7	0.772 6
both attention	0.820 6	0.781 9

Tab. 7 Comparison experimental results of the proposed model with and without time series prediction module

数据集	指标	有时序预测模块	无时序预测模块	有时序预测模块相较于无时序预测模块的提升率/%
中文数据集	nDCG	0.820 6	0.786 5	4.33
中文数据集	MRR	0.781 9	0.756 9	3.30
英文数据集	nDCG	0.946 8	0.934 1	1.36
英文数据集	MRR	0.977 7	0.962 6	1.57

References 15

1	OKURA S， TAGAMI Y， ONO S， et al. Embedding-based news recommendation for millions of users ［C］// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2017： 1933-1942. 10.1145/3097983.3098108
2	WANG H W， ZHANG F Z， XIE X， et al. DKN： deep knowledge-aware network for news recommendation ［C］// Proceedings of the 2018 World Wide Web Conference. Republic and Canton of Geneva： International World Wide Web Conferences Steering Committee， 2018： 1835-1844. 10.1145/3178876.3186175
3	WU C H， WU F Z， GE S Y， et al. Neural news recommendation with multi-head self-attention ［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2019： 6389-6394. 10.18653/v1/d19-1671
4	殷宏磊.基于深度学习的推荐系统研究与实现［D］.成都：电子科技大学， 2019： 1-3. 10.20937/rica.2019.35.esp01.20
	YIN H L. Research and implementation of recommendation system based on deep learning［D］. Chengdu： University of Electronic Science and Technology of China， 2019： 1-3. 10.20937/rica.2019.35.esp01.20
5	杨武，唐瑞，卢玲.基于内容的推荐与协同过滤融合的新闻推荐方法［J］.计算机应用， 2016， 36（2）： 414-418. 10.11772/j.issn.1001-9081.2016.02.0414
	YANG W， TANG R， LU L. News recommendation method by fusion of content-based recommendation and collaborative filtering［J］. Journal of Computer Applications， 2016， 36（2）： 414-418. 10.11772/j.issn.1001-9081.2016.02.0414
6	田萱，丁琪，廖子慧，等.基于深度学习的新闻推荐算法研究综述［J］.计算机科学与探索， 2021， 15（6）： 971-998.
	TIAN X， DING Q， LIAO Z H， et al. Survey on deep learning based news recommendation algorithm［J］. Journal of Frontiers of Computer Science and Technology， 2021， 15（6）： 971-998.
7	WU C H， WU F Z， AN M X， et al. Neural news recommendation with topic-aware news representation ［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2019： 1154-1159. 10.18653/v1/p19-1110
8	AN M X， WU F Z， WU C H， et al. Neural news recommendation with long- and short-term user representations ［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2019： 336-345. 10.18653/v1/p19-1033
9	WU C H， WU F Z， AN M X， et al. Neural news recommendation with attentive multi-view learning ［C］// Proceedings of the 28th International Joint Conference on Artificial Intelligence. ［S.l.］： IJCAI Organization， 2019： 3863-3869. 10.24963/ijcai.2019/536
10	GU J X， WANG Z H， KUEN J， et al. Recent advances in convolutional neural networks［J］. Pattern Recognition， 2018， 77： 354-377. 10.1016/j.patcog.2017.10.013
11	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010. 10.1016/s0262-4079(17)32358-8
12	YAMAK P T， LI Y J， GADOSEY P K. A comparison between ARIMA， LSTM， and GRU for time series forecasting ［C］// Proceedings of the 2nd International Conference on Algorithms， Computing and Artificial Intelligence. New York： ACM， 2019： 49-55. 10.1145/3377713.3377722
13	CHUNG J， GULCEHRE C， CHO K， et al. Empirical evaluation of gated recurrent neural networks on sequence modeling［EB/OL］. （2014-12-11）［2021-03-10］. . 10.1007/978-3-662-44848-9_34
14	WU F Z， QIAO Y， CHEN J H， et al. MIND： a large-scale dataset for news recommendation ［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2020： 3597-3606. 10.18653/v1/2020.acl-main.331
15	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. （2017-01-30）［2021-03-10］. .

[1]	Na WANG, Lin JIANG, Yuancheng LI, Yun ZHU. Optimization of tensor virtual machine operator fusion based on graph rewriting and fusion exploration [J]. Journal of Computer Applications, 2024, 44(9): 2802-2809.
[2]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[3]	Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718.
[4]	Rui ZHANG, Pengyun ZHANG, Meirong GAO. Self-optimized dual-modal multi-channel non-deep vestibular schwannoma recognition model [J]. Journal of Computer Applications, 2024, 44(9): 2975-2982.
[5]	Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688.
[6]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[7]	Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682.
[8]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[9]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[10]	Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725.
[11]	Yu DU, Yan ZHU. Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior [J]. Journal of Computer Applications, 2024, 44(9): 2726-2731.
[12]	Guanglei YAO, Juxia XIONG, Guowu YANG. Flower pollination algorithm based on neural network optimization [J]. Journal of Computer Applications, 2024, 44(9): 2829-2837.
[13]	Ying HUANG, Jiayu YANG, Jiahao JIN, Bangrui WAN. Siamese mixed information fusion algorithm for RGBT tracking [J]. Journal of Computer Applications, 2024, 44(9): 2878-2885.
[14]	Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710.
[15]	Chunxue ZHANG, Liqing QIU, Cheng’ai SUN, Caixia JING. Purchase behavior prediction model based on two-stage dynamic interest recognition [J]. Journal of Computer Applications, 2024, 44(8): 2365-2371.

News recommendation model with deep feature fusion injecting attention mechanism

注入注意力机制的深度特征融合新闻推荐模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 15

Related Articles 15

Recommended Articles

Metrics