计算机应用 ›› 2016, Vol. 36 ›› Issue (2): 336-341.DOI: 10.11772/j.issn.1001-9081.2016.02.0336

• 第三届CCF大数据学术会议(CCF BigData 2015) • 上一篇    下一篇

基于隐回归的用户关系强度模型

韩忠明, 谭旭升, 陈炎, 杨伟杰   

  1. 北京工商大学 计算机与信息工程学院, 北京 100048
  • 收稿日期:2015-08-29 修回日期:2015-09-13 出版日期:2016-02-10 发布日期:2016-02-03
  • 通讯作者: 韩忠明(1972-),男,山西文水人,副教授,博士,CCF会员,主要研究方向:大数据处理、数据挖掘。
  • 作者简介:谭旭升(1989-),男,广西柳州人,硕士研究生,主要研究方向:数据挖掘;陈炎(1991-),男,江苏泰州人,硕士研究生,主要研究方向:数据挖掘;杨伟杰(1980-),女,山东潍坊人,讲师,博士,CCF会员,主要研究方向:数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61170112);教育部人文社会科学研究基金资助项目(13YJC860006);北京市属高等学校科学技术与研究生教育创新工程建设项目(PXM2012_014213_000037)。

Strength model of user relationship based on latent regression

HAN Zhongming, TAN Xusheng, CHEN Yan, YANG Weijie   

  1. School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China
  • Received:2015-08-29 Revised:2015-09-13 Online:2016-02-10 Published:2016-02-03

摘要: 为了科学合理地度量社会网络中用户间的有向关系强度,基于用户有向交互次,提出一个度量用户交互强度的光滑模型。将用户关系强度作为隐变量,交互强度作为因变量,构建度量用户关系强度的隐变量回归模型,并给出求解隐变量回归模型的最大期望(EM)算法。分别从人人网和新浪微博采集了数据集,从最佳好友、强度排序等方面进行了大量实验。在人人网实验中,通过关系模型选择的TOP-10好友与人工标注结果比较,得出NDCG均值为69.48%,平均查准率均值(MAP)为66.3%,与对比算法相比有明显提高;在大规模新浪微博数据集实验中,将关系强度大的节点作为传染模型的源节点的传播范围相较于选择其他节点作为源节点平均提高了80%。实验结果说明所提模型能够有效度量用户间的关系强度。

关键词: 关系强度, 社会网络, 隐变量回归模型

Abstract: To effectively measure the strength of the directed relationship among the users in social network, based on the directed interaction frequency, a smooth model for computing the interaction strength of the user was proposed. Furthermore, user interaction strength was taken as dependent variable and user relationship strength was taken as latent variable, a latent regression model was constructed, and an Expectation-Maximization (EM) algorithm for parameter estimation of the latent regression model was given. Comprehensive experiments were conducted on two datasets extracted from Renren and Sina Weibo in the aspects of the best friends and the intensity ranking. On Renren dataset, the result of TOP-10 best friends chosen by the proposed model was compared with that of manual annotation, the mean of Normalized Discounted Cumulative Gain (NDCG) of the model was 69.48%, the average of Mean Average Precision (MAP) of the model was 66.3%, both of the parameters were significantly improved; on Sina Weibo dataset, the range of infection spread by nodes with higher relationship strength increased by 80% compared to the other nodes. The experimental results show that the proposed model can effectively measure user relationship strength.

Key words: relationship strength, social network, latent regression model

中图分类号: