Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (10): 3136-3141.DOI: 10.11772/j.issn.1001-9081.2022101489

Special Issue: 数据科学与技术

• Data science and technology • Previous Articles     Next Articles

Collaborative filtering algorithm based on collaborative training and Boosting

Xiaohan YANG, Guosheng HAO, Xiehua ZHANG(), Zihao YANG   

  1. College of Computer Science and Technology,Jiangsu Normal University,Xuzhou Jiangsu 221116,China
  • Received:2022-10-11 Revised:2023-01-13 Accepted:2023-01-16 Online:2023-04-12 Published:2023-10-10
  • Contact: Xiehua ZHANG
  • About author:YANG Xiaohan, born in 1995, M. S. candidate. Her research interests include machine learning, recommender system.
    HAO Guosheng, born in 1972, Ph. D., professor. His research interests include machine learning, evolutionary computation,personalized learning.
    ZHANG Xiehua, born in 1977, Ph. D., associate professor. Her research interests include machine learning, moving target detection and tracking.
    YANG Zihao, born in 1998, M. S. candidate. His researchinterests include machine learning, computer vision.
  • Supported by:
    National Natural Science Foundation of China(62277030);Postgraduate Scientific Research and Practical Innovation Program of Jiangsu Normal University(2022XKT1536)

基于协同训练与Boosting的协同过滤算法

杨晓菡, 郝国生, 张谢华(), 杨子豪   

  1. 江苏师范大学 计算机科学与技术学院,江苏 徐州 221116
  • 通讯作者: 张谢华
  • 作者简介:杨晓菡(1995—),女,江苏徐州人,硕士研究生,主要研究方向:机器学习、推荐系统
    郝国生(1972—),男,河北万全人,教授,博士,主要研究方向:机器学习、进化计算、个性化学习
    张谢华(1977—),女,安徽宿松人,副教授,博士,主要研究方向:机器学习、运动目标检测与跟踪. 6019980030@jsnu. edu. cn
    杨子豪(1998—),男,陕西咸阳人,硕士研究生,主要研究方向:机器学习、计算机视觉。
  • 基金资助:
    国家自然科学基金资助项目(62277030);江苏师范大学研究生科研与实践创新计划项目(2022XKT1536)

Abstract:

Collaborative Filtering (CF) algorithm can realize personalized recommendation on the basis of the similarity between items or users. However, data sparsity has always been one of the challenges faced by CF algorithm. In order to improve the prediction accuracy, a CF algorithm based on Collaborative Training and Boosting (CFCTB) was proposed to solve the problem of sparse user-item scores. First, two CFs were integrated into a framework by using collaborative training, pseudo-labeled samples with high confidence were added to each other’s training set by the two CFs, and Boosting weighted training data were used to assist the collaborative training. Then, the weighted integration was used to predict the final user scores, and the accumulation of noise generated by pseudo-labeled samples was avoided effectively, thereby further improving the recommendation performance. Experimental results show that the accuracy of the proposed algorithm is better than that of the single models on four open datasets. On CiaoDVD dataset with the highest sparsity, compared with Global and Local Kernels for recommender systems (GLocal-K), the proposed algorithm has the Mean Absolute Error (MAE) reduced by 4.737%. Compared with ECoRec (Ensemble of Co-trained Recommenders) algorithm, the proposed algorithm has the Root Mean Squared Error (RMSE) decreased by 7.421%. The above rasults verify the effectiveness of the proposed algorithm.

Key words: recommendation algorithm, Collaborative Filtering (CF), data sparsity, collaborative training, Boosting

摘要:

协同过滤(CF)算法基于物品之间或用户之间的相似度能实现个性化推荐,然而CF算法普遍存在数据稀疏性的问题。针对用户?物品评分稀疏问题,为使预测更加准确,提出一种基于协同训练与Boosting的协同过滤算法(CFCTB)。首先,利用协同训练将两种CF集成于一个框架,两种CF互相添加置信度高的伪标记样本到对方的训练集中,并利用Boosting加权训练数据辅助协同训练;其次,采用加权集成预测最终的用户评分,有效避免伪标记样本所产生的噪声累加,进一步提高推荐性能。实验结果表明,在4个公开数据集上,所提算法的准确率优于单模型;在稀疏度最高的CiaoDVD数据集上,与面向推荐系统的全局和局部核(GLocal-K)相比,所提算法的平均绝对误差(MAE)降低了4.737%;与ECoRec(Ensemble of Co-trained Recommenders)算法相比,所提算法的均方根误差(RMSE)降低了7.421%。以上结果验证了所提算法的有效性。

关键词: 推荐算法, 协同过滤, 数据稀疏, 协同训练, Boosting

CLC Number: