计算机应用 ›› 2017, Vol. 37 ›› Issue (3): 673-679.DOI: 10.11772/j.issn.1001-9081.2017.03.673

• 第四届大数据学术会议(CCF BIGDATA2016) • 上一篇    下一篇

基于聚类分析分库策略的社交网络数据库查询性能与数据迁移

梁双, 周丽华, 杨培忠   

  1. 云南大学 信息学院, 昆明 650000
  • 收稿日期:2016-09-26 修回日期:2016-10-23 出版日期:2017-03-10 发布日期:2017-03-22
  • 通讯作者: 周丽华
  • 作者简介:梁双(1987-),男,河南信阳人,硕士,主要研究方向:社交网络分析、分布式数据库;周丽华(1968-),女,云南昆明人,教授,博士,CCF会员,主要研究方向:数据挖掘、社交网络分析;杨培忠(1992-),男,云南保山人,硕士研究生,CCF会员,主要研究方向:数据挖掘、社交网络分析。
  • 基金资助:
    国家自然科学基金资助项目(61262069,61472346);云南省自然科学基金资助项目(2016FA026,2015FB114,2015FB149);云南大学中青年骨干教师、创新研究团队发展计划(XT412011)。

Query performance and data migration for social network database with shard strategy based on clustering analysis

LIANG Shuang, ZHOU Lihua, YANG Peizhong   

  1. College of Information, Yunnan University, Kunming Yunnan 650000, China
  • Received:2016-09-26 Revised:2016-10-23 Online:2017-03-10 Published:2017-03-22
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61262069, 61472346), the Natural Science Foundation of Yunnan Province (2015FB114, 2015FB149, 2016FA026), the Program for Young and Middle-aged Skeleton Teachers in Yunnan University, the Program for Innovation Research Team in Yunnan University (XT412011).

摘要: 社交网络数据具有一定的聚合性,即特征上相近的用户之间更容易产生某种行为。依照常规的水平切分方法,在执行这些事件的信息查询时,将会耗费大量的时间和连接损耗去依次访问多个数据库。针对此问题,提出了基于聚类分析的社交网络数据库分库策略。将社交网络主体的特征标量进行聚类,使得聚集程度高的主体尽量分割到一个或尽可能少的几个分库中去,从而提高事件的查询效率,并在此基础上兼顾负载均衡与大数据迁移等问题。实验结果表明,该策略在社交网络的主流事件查询上都表现出不同程度的性能提升,最高提升程度达到23.4%,并且实现了局部最优负载均衡和零数据迁移。总的来说,基于聚类分析的社交网络数据库分库策略在提高查询效率、平衡负载以及大数据迁移可行性上,比传统水平切割分库有了相当的优势。

关键词: 社交网络, 数据库分库, 聚类分析, 查询性能, 数据迁移

Abstract: Social network data has a certain degree of aggregation, namely the similar users are more prone to the same behavior. According to the conventional horizontal database shard method, a large amount of time and connection loss were consumed in order to access a plurality of databases in turn when performing the information query of these events. In order to solve this problem, the database shard strategy based on clustering analysis was proposed. Through clustering the characteristic scalars of social network subjects, the main body with the high aggregation was divided into one or as possible libraries to improve the query efficiency of the events, and to give consideration to load balancing, large data migration and other issues. The experimental results show that for the mainstream social networking events, the performance improvement of the proposed strategy is up to 23.4% at most, and local optimal load balance and zero data migration are realized. In general, the database shard strategy based on clustering analysis of social network, has a considerable advantage on improving query efficiency, balance load balancing and large data migration feasibility over the traditional conventional horizontal database shard method of cutting library.

Key words: social network, database shard, clustering analysis, query performance, data migration

中图分类号: