计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3288-3292.DOI: 10.11772/j.issn.1001-9081.2019040728

• 数据科学与技术 • 上一篇    下一篇

融合潜在狄利克雷分布与元路径分析的用户相关性度量方法

徐红艳, 王丹, 王富海, 王嵘冰   

  1. 辽宁大学 信息学院, 沈阳 110036
  • 收稿日期:2019-04-28 修回日期:2019-07-19 出版日期:2019-11-10 发布日期:2019-08-26
  • 通讯作者: 王嵘冰
  • 作者简介:徐红艳(1972-),女,辽宁丹东人,副教授,硕士,主要研究方向:deep Web、个性化推荐、数据挖掘;王丹(1994-),女,辽宁丹东人,硕士研究生,CCF会员,主要研究方向:个性化推荐、数据挖掘;王富海(1990-),男,辽宁海城人,硕士研究生,CCF会员,主要研究方向:深度学习、个性化推荐;王嵘冰(1979-),男,辽宁沈阳人,副教授,博士,CCF会员,主要研究方向:大数据分析、云计算。
  • 基金资助:
    国家自然科学基金资助项目(71771110);辽宁省社会科学规划基金资助项目(L18AGL007);吉林大学符号计算与知识工程教育部重点实验室项目(93K172018K01)。

User relevance measure method combining latent Dirichlet allocation and meta-path analysis

XU Hongyan, WANG Dan, WANG Fuhai, WANG Rongbing   

  1. School of Information, Liaoning University, Shenyang Liaoning 110036, China
  • Received:2019-04-28 Revised:2019-07-19 Online:2019-08-26 Published:2019-11-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (71771110), the Social Science Planning Foundation of Liaoning Province of China (L18AGL007), the Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education in Jilin University (93K172018K01).

摘要: 用户相关性度量是异构信息网络研究的基础与核心。现有的用户相关性度量方法由于未充分开展多维度分析和链路分析,其准确性尚存在提升空间。为此,提出了一种融合狄利克雷分布(LDA)与元路径分析的用户相关性度量方法。首先利用LDA进行主题建模,通过分析网络中节点的内容来计算节点的相关性;然后,引入元路径来刻画节点间关系类型,通过关联度量(DPRel)方法对异构信息网络中的用户进行相关性测量;接着,将节点的相关性融入到用户相关性度量计算中;最后,采用IMDB真实电影数据集进行实验,将所提方法和嵌入LDA主题模型的协同过滤推荐方法(ULR-CF)、基于元路径的相关性度量方法(PathSim)进行了对比分析。实验结果表明,所提方法能够克服数据稀疏性弊端,提高用户相关性度量的准确性。

关键词: 用户相关性, 异构信息网络, 主题模型, 元路径, 度量

Abstract: User relevance measure is the foundation and core of heterogeneous information network research. The existing user relevance measure methods still have improvement space due to insufficient multi-dimensional analysis and link analysis. Aiming at the fact, a user relevance measure method combining Latent Dirichlet Allocation (LDA) and meta-path analysis was proposed. Firstly, the LDA was used to model the topic, and the relevance of nodes was analyzed by the node contents in the network. Secondly, the meta-path was introduced to describe the relationship type between nodes, and relevance measure was carried out for users in heterogeneous information network by relevance measure method (DPRel). Thirdly, the relevance of nodes was incorporated into the calculation of user relevance measure. Finally, the experiment was carried out on IMDB real movie dataset, and the proposed method was compared with the collaborative filtering recommendation method embedded in LDA topic model ULR-CF (Unifying LDA and Ratings Collaborative Filtering) and meta-path based similarity method (PathSim).The experimental results show that the proposed method can overcome the drawback of data sparsity and improve the accuracy of user relevance measure.

Key words: user relevance, heterogeneous information network, topic model, meta-path, measure

中图分类号: