Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (1): 95-103.DOI: 10.11772/j.issn.1001-9081.2024121727

• Data science and technology • Previous Articles     Next Articles

Recommendation method integrating user behaviors and improved long-tail algorithm

Yancui SHI, Haozhe QIN()   

  1. College of Artificial Intelligence,Tianjin University of Science and Technology,Tianjin 300457,China
  • Received:2024-12-06 Revised:2025-03-13 Accepted:2025-03-17 Online:2026-01-10 Published:2026-01-10
  • Contact: Haozhe QIN
  • About author:SHI Yancui, born in 1982, Ph. D., associate professor. Her research interests include recommender systems, social network.
  • Supported by:
    National Natural Science Foundation of China(62377036)

融合用户行为和改进长尾算法的推荐方法

史艳翠, 秦浩哲()   

  1. 天津科技大学 人工智能学院,天津 300457
  • 通讯作者: 秦浩哲
  • 作者简介:史艳翠(1982—),女,河北保定人,副教授,博士,CCF会员,主要研究方向:推荐系统、社会化网络
  • 基金资助:
    国家自然科学基金资助项目(62377036)

Abstract:

To solve the problem of the long tail effect failing to fully consider users' personalized behaviors when dividing popular items and long-tail items, a recommendation method integrating user behaviors and improved long-tail algorithm was proposed. Firstly, Bidirectional Encoder Representations from Transformers (BERT) was utilized to encode item attribute information, and items were clustered according to the encoding results. At the same time, personalized popular items and long-tail items were divided again for the user according to the user's interaction records with different clusters, thereby integrating personalized user behaviors into the process of dividing popular items. Secondly, the user's popularity sensitivity was evaluated on the basis of interaction records, thereby fully considering the extent of popularity factors influencing the user. Finally, a novel negative sampling method was proposed, in which different negative sampling strategies were adopted for users with varying popularity sensitivities, and user preference clustering was integrated to select higher-quality negative samples. Experimental results on three public real-world datasets demonstrate that compared to the traditional 80-20 division method, the proposed personalized division method is improved in terms of recall, Hit Rate (HR), and Normalized Discounted Cumulative Gain (NDCG). In the resampling experiment, the average NDCG@20 for the original, popular, and long-tail data across the three datasets increased by 0.45, 1.03, and 2.33 percentage points, respectively. When compared with the optimal baseline model NNS (Noise-free Negative Sampling), improvements in metrics such as HR and NDCG were demonstrated by the proposed negative sampling method. Improvements of 2.72, 1.37, and 5.93 percentage points in the average NDCG@20 metrics were achieved on the raw data, popular data, and long-tail data, respectively, which validated the effectiveness of the proposed negative sampling method.

Key words: recommender system, popularity, long tail effect, clustering, negative sampling

摘要:

为了解决长尾效应研究中划分热门物品和长尾物品时未能充分考虑用户个性化行为的问题,提出一种融合用户行为和改进长尾算法的推荐方法。首先,使用基于Transformer的双向编码器表示(BERT)对物品属性信息进行编码,并根据编码结果对物品执行聚类操作,同时根据用户与不同聚类的交互记录为用户重新划分个性化的热门物品和长尾物品,从而将用户个性化行为融入热门物品的划分过程中;其次,根据交互记录评估用户的流行度敏感度,从而充分考虑流行度因素对用户的影响程度;最后,提出一种新的负采样方法对不同流行度敏感度的用户采用不同的负采样策略,并融合用户偏好聚类筛选出质量更高的负样本。在3个公开的真实数据集上的实验结果表明,所提个性化划分方法相较于传统八二划分方法在召回率、命中率(HR)和归一化折损累积增益(NDCG)等指标上均有提升;在重采样中,3个数据集中的原始数据、热门数据和长尾数据的NDCG@20指标平均值分别提升了0.45、1.03和2.33个百分点;所提负采样方法与最优基线模型NNS (Noise-free Negative Sampling)相比,在HR和NDCG等指标上均有提升,其中在原始数据、热门数据和长尾数据的NDCG@20指标平均值上分别提升了2.72、1.37和5.93个百分点,验证了所提负采样方法的有效性。

关键词: 推荐系统, 流行度, 长尾效应, 聚类, 负采样

CLC Number: