《计算机应用》唯一官方网站 ›› 2020, Vol. 40 ›› Issue (2): 321-327.DOI: 10.11772/j.issn.1001-9081.2019091616

• 2019年全国开放式分布与并行计算学术年会(DPCS 2019)论文 • 上一篇    下一篇

基于增量学习的RocksDB键值系统主动缓存机制

骆克云1, 叶保留1(), 唐斌1, 梅峰2, 卢文达2   

  1. 1.计算机软件新技术国家重点实验室(南京大学),江苏 南京 210023
    2.国网浙江省电力有限公司,浙江 杭州 310007
  • 收稿日期:2019-07-31 修回日期:2019-09-24 接受日期:2019-09-29 发布日期:2019-10-14 出版日期:2020-02-10
  • 通讯作者: 叶保留
  • 作者简介:骆克云(1993—),男,安徽芜湖人,硕士研究生,主要研究方向:分布式存储
    唐斌(1986—),男,江苏东台人,助理研究员,博士,CCF会员,主要研究方向:分布式计算、编码理论
    梅峰(1977—),男,浙江湖州人,高级工程师,硕士,主要研究方向:电力信息系统、大数据
    卢文达(1989—),男,吉林松原人,助理工程师,硕士,主要研究方向:数据挖掘、云计算。
  • 基金资助:
    国家重点研发计划项目(2018YFB1004704);国家自然科学基金资助项目(61832005);国家电网公司科技项目(52110418001M)

Incremental learning based proactive caching mechanism for RocksDB key-value system

Keyun LUO1, Baoliu YE1(), Bin TANG1, Feng MEI2, Wenda LU2   

  1. 1.State Key Laboratory for Novel Software Technology (Nanjing University),Nanjing Jiangsu 210023,China
    2.State Grid Zhejiang Electric Power Company Limited,Hangzhou Zhejiang 310007,China
  • Received:2019-07-31 Revised:2019-09-24 Accepted:2019-09-29 Online:2019-10-14 Published:2020-02-10
  • Contact: Baoliu YE
  • About author:LUO Keyun, born in 1993, M. S. candidate. His research interests include distributed storage.
    TANG Bin, born in 1986, Ph. D., assistant research fellow. His research interests include distributed computing, coding theory.
    MEI Feng, born in 1977, M. S., senior engineer. His research interests include power information system, big data.
    LU Wenda, born in 1989, M. S., assistant engineer. His research interests include data mining, cloud computing.
  • Supported by:
    the National Key Research and Development Program of China(2018YFB1004704);the National Natural Science Foundation of China(61832005);the Science and Technology Project of State Grid Corporation of China(52110418001M)

摘要:

由于分层结构的约束,基于日志结构合并(LSM)树的RocksDB键值存储系统面临着读取性能低下的问题。一种有效的解决方法是对热点数据进行主动缓存,但其面临两个挑战:一是如何在数据分布持续动态变化时对热点数据进行预测,二是如何将主动缓存机制与RocksDB存储结构衔接起来。针对这些挑战,基于预测分析技术,构建了由数据采集、系统交互、系统测试等部分组成的面向RocksDB键值系统的主动缓存框架,能够将热点数据缓存在LSM树的较低层级中;并对数据访问模式进行建模,设计并实现了基于增量学习的热点数据预测分析方法,能够有效减少存储介质的I/O访问次数。实验结果表明该机制能有效提升RocksDB在不同动态工作负载下的数据读取性能。

关键词: RocksDB, 主动缓存, 增量学习, 日志结构合并树

Abstract:

RocksDB key-value storage system based on Log-Structured Merge (LSM) tree has the problem of low read performance caused by the constraints of its hierarchical structure. One effective solution is to cache hot spot data proactively, but it faces two challenges. One is how to predict the hot spot data when the data distribution keeps on changing constantly, the other is how to integrate the proactive caching mechanism with the RocksDB storage structure. To tackle these challenges, a proactive caching framework for RocksDB key-value system with multiple components including data collection, system interaction and system evaluation was built, which can cache the hot spot data at the low levels of the LSM tree. And with the modeling of data access patterns, an incremental learning based prediction analysis method for hot spot data was designed and implemented, which can reduce the number of I/O operations of storage medium. Experimental results show that the proposed mechanism can effectively improve the read performance of RocksDB under different dynamic workloads.

Key words: RocksDB, proactive caching, incremental learning, Log-Structured Merge (LSM) tree

中图分类号: