《计算机应用》唯一官方网站

• •    下一篇

融合用户行为和改进长尾算法的推荐方法

史艳翠,秦浩哲   

  1. 天津科技大学
  • 收稿日期:2024-12-06 修回日期:2025-03-13 发布日期:2025-03-27 出版日期:2025-03-27
  • 通讯作者: 史艳翠
  • 基金资助:
    国家自然科学基金资助项目

Recommendation method integrating user behavior and improved long-tail algorithm

  • Received:2024-12-06 Revised:2025-03-13 Online:2025-03-27 Published:2025-03-27
  • Supported by:
    National Natural Science Foundation of China

摘要: 针对现有缓解长尾效应研究中划分热门物品和长尾物品时未能充分考虑用户个性化行为的问题,提出一种融合用户行为和改进长尾算法的推荐方法。首先使用基于Transformer的双向编码器表示(BERT)对物品属性信息进行编码并根据编码结果对物品执行聚类操作,根据用户与不同聚类的交互记录为其重新划分个性化的热门物品和长尾物品,从而将用户个性化行为融入进划分热门物品的过程中。其次根据交互记录评估用户的流行性敏感度,充分考虑流行性因素对用户的影响程度。最后,提出一种新的负采样方法,对不同流行性敏感度的用户采用不同的负采样策略,并融合用户偏好聚类筛选出质量更高的负样本。在3个公开的真实数据集上的实验结果表明,所提个性化划分方法相比于传统八二划分方法在召回率、命中率(HR)和归一化折损累积增益(NDCG)等指标上均有提升,其中在能较好反映推荐效果的NDCG@20指标上分别提升了0.56、1.10、2,41个百分点;所提负采样方法对比选取最优负采样方法在HR和NDCG等指标上均有提升,其中在NDCG@20指标上分别提升了3.36、1.45、7.56百分点,验证了所提方法的有效性。

关键词: 推荐系统, 流行度, 长尾效应, 聚类, 负采样

Abstract: Aiming at the issue that existing studies on alleviating the long-tail effect fail to fully consider users' personalized behaviors when dividing popular items and long-tail items, a recommendation method integrating user behavior and improved long-tail algorithm was proposed. Firstly, Bidirectional Encoder Representations from Transformers (BERT)was utilized to encode item attribute information, and items were clustered based on the encoding results. Personalized popular items and long-tail items were redefined according to the user's interaction records with different clusters, thereby integrating personalized user behavior into the process of categorizing popular items. Secondly, the user's popularity sensitivity was evaluated based on interaction records, fully considering the extent to which popularity factors influenced the user. Finally, a novel negative sampling method was proposed, in which different negative sampling approaches were adopted for users with varying popularity sensitivities, and user preference clustering was integrated to select higher-quality negative samples. Experiments were conducted on three publicly available real-world datasets, and the results demonstrate that the proposed personalized categorization method is improved compared to the traditional 80-20 split method in terms of Recall, Hit Rate(HR), and Normalized Discounted Cumulative Gain(NDCG). Specifically, on the NDCG@20 metric, which better reflects recommendation effectiveness, improvements of 0.56, 1.10, and 2.41 percentage points are achieved, respectively. Additionally, the proposed negative sampling method is shown to outperform the optimal negative sampling method in terms of HR and NDCG metrics, with improvements of 3.36, 1.45, and 7.56 percentage points on the NDCG@20 metric, respectively, validating the effectiveness of the proposed methods.

Key words: recommender systems, popularity, long tail, clustering, negative sampling

中图分类号: