计算机应用 ›› 2017, Vol. 37 ›› Issue (5): 1251-1256.DOI: 10.11772/j.issn.1001-9081.2017.05.1251

• 第22届全国信息存储技术学术会议 • 上一篇    下一篇

分布式数据库聚合计算性能优化

肖子达1,2, 朱立谷1,2, 冯东煜1,2, 张迪1,2   

  1. 1. 中国传媒大学 计算机学院, 北京 100024;
    2. 安防大数据处理与应用北京市重点实验室, 北京 100024
  • 收稿日期:2016-07-01 修回日期:2016-11-17 出版日期:2017-05-10 发布日期:2017-05-16
  • 通讯作者: 肖子达
  • 作者简介:肖子达(1992-),男,湖南长沙人,硕士研究生,主要研究方向:分布式系统、数据可视化;朱立谷(1965-),男,北京人,教授,博士生导师,博士,主要研究方向:计算机系统结构、海量存储;冯东煜(1989-),男,辽宁锦州人,博士研究生,主要研究方向:大数据系统架构、分布式系统;张迪(1987-),男,甘肃兰州人,博士,主要研究方向:数据可视化、云存储。
  • 基金资助:
    国家自然科学基金资助项目(61730063)。

Performance optimization of distributed database aggregation computing

XIAO Zida1,2, ZHU Ligu1,2, FENG Dongyu1,2, ZHANG Di1,2   

  1. 1. School of Computer Science, Communication University of China, Beijing 100024, China;
    2. Beijng Key Laboratory of Big Data in Security & Protection Industry, Beijing 100024, China
  • Received:2016-07-01 Revised:2016-11-17 Online:2017-05-10 Published:2017-05-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61730063).

摘要: 针对分布式数据库在分析应用方面的聚合计算性能较低的问题,以MongoDB数据库为研究实例,提出了一种基于片键和索引的数据库性能提升方法。首先,通过分析业务特征指导选择的片键字段,该字段需要保证数据在分片节点上的均匀布局;其次,通过研究分布式数据库的索引效率,利用删除查询字段索引的方法进一步提升计算性能,该方法能充分利用硬件资源提高聚合计算的性能。实验结果表明,采用高基数粒度的分片片键能够让数据在集群上均匀地分布在各个数据节点上,而舍弃索引使用全表查询能够有效提高聚合计算的速度,聚合计算优化方法能够有效提高聚合计算的性能。

关键词: NoSQL, MongoDB, MapReduce, 聚合计算, 性能优化

Abstract: Aiming at the problem of low computational performance of distributed database in analysis applications, taking MongoDB database as an example, a method was put forward to improve the performance of database based on chip and index. Firstly, the characteristics of the business was analyzed to guide the choice of shard key field, and the selected key field needed to ensure that the data is evenly distributed on the cluster nodes. Secondly, by studying the index efficiency of the distributed database, the method of deleting the query field index was used to further improve the computing performance, which could make full use of hardware resources to improve the performance of aggregation computing. The analysis and experimental results show that the shard key field with high cordinality can distribute data evenly on each data node in the cluster, and the use of full table query can effectively improve the convergence speed, thus the optimization method can effectively improve the performance of aggregation computing.

Key words: Not Only SQL (NoSQL), MongoDB, MapReduce, aggregation computing, performance optimization

中图分类号: