计算机应用 ›› 2018, Vol. 38 ›› Issue (8): 2236-2242.DOI: 10.11772/j.issn.1001-9081.2018010264

• 数据科学与技术 • 上一篇    下一篇

基于会话时序相似性的矩阵分解数据填充

乔永卫1, 张宇翔2, 肖春景2,3   

  1. 1. 中国民航大学 工程技术训练中心, 天津 300300;
    2. 中国民航大学 计算机科学与技术学院, 天津 300300;
    3. 河北工业大学 电子信息工程学院, 天津 300401
  • 收稿日期:2018-01-29 修回日期:2018-03-22 出版日期:2018-08-10 发布日期:2018-08-11
  • 通讯作者: 乔永卫
  • 作者简介:乔永卫(1976-),男,山西祁县人,讲师,硕士,主要研究方向:机器学习、民航智能信息处理;张宇翔(1975-),男,山西大同人,副教授,博士,主要研究方向:机器学习、数据挖掘、人工智能;肖春景(1978-),女,河北唐山人,讲师,博士研究生,主要研究方向:推荐系统、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(U1533104);河北省自然科学基金资助项目(E2016202341);中央高校基本科研业务费资助项目(ZXH2012P009)。

Data imputation using matrix factorization based on session-based temporal similarity

QIAO Yongwei1, ZHANG Yuxiang2, XIAO Chunjing2,3   

  1. 1. Engineering and Technical Training Center, Civil Aviation University of China, Tianjin 300300, China;
    2. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China;
    3. School of Electronics and Information Engineering, Hebei University of Technology, Tianjin 300401, China
  • Received:2018-01-29 Revised:2018-03-22 Online:2018-08-10 Published:2018-08-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (U1533104), the Natural Science Foundation of Hebei Province (E2016202341), the Fundamental Research Funds for the Central Universities (ZXH2012P009).

摘要: 针对已有数据填充方法只考虑评分信息和传统相似性,无法捕获用户间真实相似关系的问题,提出了基于会话时序相似性的矩阵分解数据填充方法来缓解数据稀疏性、提高推荐精度。首先,分析了传统相似性的缺陷,并根据时序相似性和相异性提出了基于会话时序相似性度量,它结合了时间上下文和评分信息,能更好地捕获用户间的真实关系,从而识别近邻;接着,根据目标用户的近邻及其消费的项目抽取了具有用户和项目潜在影响因素的待填充的关键项目集合,并利用矩阵分解填充关键项目集合;然后,利用隐含狄利克雷分布(LDA)抽取用户在每个时间段内的概率主题分布,并利用时间惩罚权值建立用户动态偏好模型;最后,根据用户间概率主题分布的相关性和基于用户的协同过滤完成项目推荐。实验结果表明,与其他数据填充方法相比,基于会话时序相似性的矩阵分解数据填充方法在不同稀疏度下都能降低平均绝对误差(MAE),提高推荐性能。

关键词: 数据稀疏, 数据填充, 时序上下文, 矩阵分解, 时间权值

Abstract: The actual relationship between users cannot be captured by the existing data imputation methods because they only consider the rating information and traditional similarity. To alleviate data sparsity and improve recommendation accuracy, a data imputation method was proposed. Firstly, the defects of traditional similarity were analyzed and a new session-based temporal similarity based on tempoaral similarity and dissimilarity was defined, which integrated time context into rating patterns to better identify neighbors for active user. Additionally, the rating sub-matrix of key item set was extracted from similar users and their consumption items which can mine the potential influence factors of users and items, and it was imputed by using matrix factorization. Then the user probabilistic topic distribution for each stage was obtained by using Latent Dirichlet Allocation (LDA) and the user dynamic profile was built with the temporal penalty weights. Finally, the items were recommended based on the correlation of probabilistic topic distribution between users and user-based collaborative filtering. Experimental results show that compared with other imputation-based methods, the proposed method can reduce the Mean Absolute Error (MAE) and improve the recommendation performance under different sparsity.

Key words: data sparisity, data imputation, temporal context, matrix factorization, temporal weight

中图分类号: