Journal of Computer Applications ›› 2012, Vol. 32 ›› Issue (04): 1082-1085.DOI: 10.3724/SP.J.1087.2012.01082

• Database technology • Previous Articles     Next Articles

Optimization of sparse data sets to improve quality of collaborative filtering systems

LIU Qing-peng1,CHEN Ming-rui2   

  1. 1. College of Information Science and Technology, Hainan University, Haikou Hainan 570228, China
    2. College of Information Science and Technology, Hainan University, Haikou Hainan 570228, China
  • Received:2011-09-14 Revised:2011-12-19 Online:2012-04-20 Published:2012-04-01
  • Contact: CHEN Ming-rui

优化稀疏数据集提高协同过滤推荐系统质量的方法

刘庆鹏,陈明锐   

  1. 海南大学 信息科学技术学院,海口 570228
  • 通讯作者: 陈明锐
  • 作者简介:刘庆鹏(1986-),男,山东临沂人,硕士研究生,主要研究方向:软件工程;
    陈明锐(1960-),男,海南澄迈人,教授,主要研究方向:软件工程。
  • 基金资助:
    海南慧人公司项目

Abstract: Currently, the collaborative filtering is one of the successful and better personalized recommendation technologies that have been applied to the personalized recommendation systems. As the number of users and items increase dramatically, the score matrix which reflects the users preference information is very sparse. The sparse matrix seriously affects the recommendation quality of collaborative filtering. To solve this problem, this paper presented a comprehensive mean optimal filling method. Compared to the default method and the mode method, this method has two advantages. First, the method takes account of user rating scale issues. Second, the method does not have the "multiple mode" and the "no mode" problems. On the same data set, using traditional user-based collaborative filtering to test the effectiveness of the method, and the results prove that the new method can improve the recommendation quality of recommendation systems.

Key words: recommendation system, collaborative filtering, mean value, mode, information overload

摘要: 协同过滤是目前个性化推荐系统中效果较好的一种推荐技术。由于用户和项目数量的急剧增加,使得反映用户喜好信息的评分矩阵非常稀疏,严重影响了协同过滤技术的推荐质量。针对这一问题提出了综合均值优化填充方法,该方法相比较于缺省值法和众数法,考虑到了用户评分尺度问题,同时也不存在众数法中的“多众数”和“无众数”问题。在同一数据集上,通过使用传统的基于用户的协同过滤算法进行验证,表明此方法可以有效提高推荐系统的推荐质量。

关键词: 推荐系统, 协同过滤, 均值, 众数, 信息过载

CLC Number: