计算机应用 ›› 2017, Vol. 37 ›› Issue (2): 360-366.DOI: 10.11772/j.issn.1001-9081.2017.02.0360

• 第33届中国数据库学术会议 • 上一篇    下一篇

面向用户偏好发现的隐变量模型构建与推理

高艳1, 岳昆1, 武浩1, 付晓东2, 刘惟一1   

  1. 1. 云南大学 信息学院, 昆明 650504;
    2. 昆明理工大学 信息工程与自动化学院, 昆明 650500
  • 收稿日期:2016-08-12 修回日期:2016-09-06 出版日期:2017-02-10 发布日期:2017-02-11
  • 通讯作者: 岳昆,kyue@ynu.edu.cn
  • 作者简介:高艳(1991-),女,云南曲靖人,硕士研究生,CCF会员,主要研究方向:知识发现、社会媒体数据分析;岳昆(1979-),男,云南曲靖人,教授,博士生导师,博士,CCF会员,主要研究方向:海量数据分析与服务;武浩(1979-),男,河南平顶山人,副教授,博士,主要研究方向:信息检索、推荐系统、服务计算;付晓东(1975-),男,云南镇雄人,教授,博士,CCF会员,主要研究方向:服务计算、智能决策;刘惟一(1950-),男,云南昆明人,教授,博士生导师,CCF会员,主要研究方向:人工智能、数据与知识工程。
  • 基金资助:
    国家自然科学基金资助项目(61472345,61562090,61462056);云南省应用基础研究计划项目(2014FA023,2014FA028);云南省中青年学术和技术带头人才后备人才培育计划项目(2012HB004);云南大学青年英才培育计划项目(XT412003);云南大学创新团队培育计划项目(XT412011)。

Construction and inference of latent variable model oriented to user preference discovery

GAO Yan1, YUE Kun1, WU Hao1, FU Xiaodong2, LIU Weiyi1   

  1. 1. School of Information Science and Engineering, Yunnan University, Kunming Yunnan 650504, China;
    2. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming Yunnan 650500, China
  • Received:2016-08-12 Revised:2016-09-06 Online:2017-02-10 Published:2017-02-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61472345, 61562090, 61462056), the Applied Basic Research Project of Yunnan Province (2014FA023, 2014FA028), the Program of Yunnan Provincial Foundation for Leaders of Disciplines in Science and Technology (2012HB004),the Program for Excellent Young Talents in Yunnan University (XT412003), the Program for Innovative Research Team in Yunnan University (XT412011).

摘要: 电子商务应用中产生了大量用户评分数据,而这些数据中富含了用户观点和偏好信息,为了能够从这些数据中准确地推断出用户偏好,提出一种面向评分数据中用户偏好发现的隐变量模型(即含隐变量的贝叶斯网)构建和推理的方法。首先,针对评分数据的稀疏性,使用带偏置的矩阵分解(BMF)模型对其进行填补;其次,用隐变量表示用户偏好,给出了基于互信息(MI)、最大半团和期望最大化(EM)算法的隐变量模型构建方法;最后,给出了基于Gibbs采样的隐变量模型概率推理和用户偏好发现方法。实验结果表明,与协同过滤的方法相比,该方法能有效地描述评分数据中相关属性之间的依赖关系及其不确定性,从而能够更准确地推断出用户偏好。

关键词: 用户偏好, 评分数据, 贝叶斯网, 隐变量模型, 概率推理, 带偏置的矩阵分解

Abstract: Large amount of user rating data, involving plentiful users' opinion and preference, is produced in e-commerce applications. An construction and inference method for latent variable model (i.e., Bayesian Network with a latent variable) oriented to user preference discovery from rating data was proposed to accurately infer user preference. First, the unobserved values in the rating data were filled by Biased Matrix Factorization (BMF) model to address the sparseness problem of rating data. Second, latent variable was used to represent user preference, and the construction of latent variable model based on Mutual Information (MI), maximal semi-clique and Expectation Maximization (EM) was given. Finally, an Gibbs sampling based algorithm for probabilistic inference of the latent variable model and the user preference discovery was given. The experimental results demonstrate that, compared with collaborative filtering, the latent variable model is more efficient for describing the dependence relationships and the corresponding uncertainties of related attributes among rating data, which can more accurately infer the user preference.

Key words: user preference, rating data, Bayesian network, latent variable model, probabilistic inference, biased matrix factorization

中图分类号: