计算机应用 ›› 2016, Vol. 36 ›› Issue (2): 364-371.DOI: 10.11772/j.issn.1001-9081.2016.02.0364

• 第三届CCF大数据学术会议(CCF BigData 2015) • 上一篇    下一篇

面向移动社会网络的用户年龄与性别特征识别

李源昊1, 陆平2, 吴一凡1, 韦薇2, 宋国杰1   

  1. 1. 北京大学 信息科学技术学院, 北京 100871;
    2. 中兴通讯股份有限公司, 广东 深圳 518057
  • 收稿日期:2015-08-29 修回日期:2015-09-19 出版日期:2016-02-10 发布日期:2016-02-03
  • 通讯作者: 宋国杰(1975-),男,河南新乡人,副教授,博士,CCF会员,主要研究方向:数据挖掘、机器学习、社会网络分析、智能交通系统。
  • 作者简介:李源昊(1998-),男,北京人,主要研究方向:社会网络分析、数据挖掘、机器学习;陆平(1971-),男,江苏南通人,高级工程师,硕士,CCF会员,主要研究方向:机器学习、人工智能、增强现实、多媒体;吴一凡(1991-),男,北京人,硕士,主要研究方向:社会网络分析、数据挖掘、机器学习;韦薇(1980-),女,重庆人,工程师,硕士,主要研究方向:位置服务、位置大数据。
  • 基金资助:
    国家863计划项目(2014AA015103);国家科技支撑计划项目(2014BAG01B02);北京市自然科学基金资助项目(4152023);中兴通讯研究基金资助项目。

Mobile social network oriented user feature recognition of age and sex

LI Yuanhao1, LU Ping2, WU Yifan1, WEI Wei2, SONG Guojie1   

  1. 1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;
    2. Zhongxing Telecommunication Equipment Corporation, Shenzhen Guangdong 518057, China
  • Received:2015-08-29 Revised:2015-09-19 Online:2016-02-10 Published:2016-02-03

摘要: 移动社会网络数据存在网络结构复杂,节点间标签相互影响,包含交互信息、位置信息等多种复杂信息等特点,给识别用户的特征带来了许多挑战。针对这些挑战,通过分析一个真实的移动网络数据,利用统计学分析提取出已标记的不同特征用户间的差异,并利用这些差异,借助关系马尔可夫网络建立预测模型对未标记用户的年龄与性别进行特征识别。分析表明,不同年龄、性别的用户在不同时段的通话概率、通话熵,位置信息的分布、离散性,在社会网络中的集聚程度,以及相互之间二元、三元的交互频率方面都存在明显的差异。利用这些特征,提出了利用二元和三元交互的关系基团模板,结合用户自身的时间空间特征,通过关系马尔可夫网络计算用户特征的全联合分布概率,进而以此推断用户的年龄与性别的方法。经过实验分析,利用关系马尔可夫网络、用户时空信息和用户交互的关系基团的分类方法相较于传统的C4.5决策树、随机森林、Logistic回归和Naive Bayes等分类方法,能够提高最高约8%的预测准确率。

关键词: 移动社会网络, 社会网络分析, 特征识别, 关系马尔可夫网络

Abstract: Mobile social network data has complex network structure, mutual label influence between nodes, variety of information including interactive information, location information, and other complex information. As a result, it brings many challenges to identify the characteristics of the user. In response to these challenges, a real mobile network was studied, the differences between the tagged users with different characteristics were extracted using statistical analysis, then the user's features of age and sex were recognized using relational Markov network prediction model. Analysis shows that the user of different age and sex has significant difference in call probability at different times, call entropy, distribution and discreteness of location information, gather degree in social networks, as well as binary and ternary interaction frequency. With these features, an approach for inferring the user's age and gender was put forward, which used the binary and ternary interaction relation group template, combined with the user's own temporal and spatial characteristics, and calculated the total joint probability distribution by relational Markov network. The experimental results show that the prediction accuracy of the proposed recognition model is at least 8% higher compared to the traditional classification methods, such as C4.5 decision tree, random forest, Logistic regression and Naive Bayes.

Key words: mobile social network, social network analysis, feature recognition, relational Markov network

中图分类号: