《计算机应用》唯一官方网站 ›› 2020, Vol. 40 ›› Issue (2): 571-577.DOI: 10.11772/j.issn.1001-9081.2019081462

• 数据科学与技术 • 上一篇    下一篇

基于HBase的多维索引查询机制的优化

徐江峰, 谭玉龙()   

  1. 郑州大学 信息工程学院,郑州 450001
  • 收稿日期:2019-08-22 修回日期:2019-11-04 接受日期:2019-11-18 发布日期:2019-12-04 出版日期:2020-02-10
  • 通讯作者: 谭玉龙
  • 作者简介:徐江峰(1965—),男,河南禹州人,教授,博士,CCF会员,主要研究方向:数据加密、网络安全;
  • 基金资助:
    中央高校基本科研业务费专项资助项目(20190605)

Optimization of multidimensional index query mechanism based on HBase

Jiangfeng XU, Yulong TAN()   

  1. School of Information Engineering,Zhengzhou University,Zhengzhou Henan 450001,China
  • Received:2019-08-22 Revised:2019-11-04 Accepted:2019-11-18 Online:2019-12-04 Published:2020-02-10
  • Contact: Yulong TAN
  • About author:XU Jiangfeng,born in 1965,Ph. D., professor. His research interests include data encryption, network security.
  • Supported by:
    the Fundamental Research Funds for the Central Universities(20190605)

摘要:

键值存储旨在从非常大的数据量中提取值,同时具有高可用性、容错性和可伸缩性,因此提供了非常需要的基础设施来支持基于位置的服务(LBS)。然而,多维数据上的复杂查询不能有效地处理,因为键值存储不提供访问多个属性的方法。针对键值存储HBase不能有效处理多维数据的问题,提出了一个统一的索引框架——New-grid,使键值存储HBase支持多维查询。在改进的P-grid覆盖网络中,组织了一组节点,提供了高效的数据分布、容错和多维数据的查询处理。为了进行索引,使用基于Hilbert空间填充曲线来保存数据的局部性,从而有效地管理键值存储中的多维数据。同时使用HBase底层存储管理数据,并提出了一种范围查询和K最近邻查询的算法,以消除维护单独索引表的开销。在Amazon EC2上使用4、8和16个普通节点的集群进行了广泛的实验。实验结果表明,New-grid的性能相比MD-Hbase以及MapReduce更优。

关键词: 基于位置的服务, 多维索引, Hbase, 空间填充曲线, 覆盖网络

Abstract:

The key value store is designed to extract values from very large amounts of data and is highly available, fault-tolerant, and scalable, providing a much needed infrastructure to support Location-Based Service (LBS). However, complex queries on multidimensional data cannot be processed effectively because the key value store does not provide a way to access multiple properties. For the key value storage, HBase cannot effectively deal with the problem of multidimensional data, a uniform indexing framework named New-grid was proposed. In the improved P-grid coverage network, a group of nodes was organized to provide efficient data distribution, fault tolerance and multi-dimensional data query processing. For indexing purposes, the locality of data storage based on Hilbert space filling curves was used to effectively manage the multidimensional data in the key value store. Simultaneously, HBase underlying storage was used to manage data, and an algorithm of range query and K-Nearest Neighbors (KNN) query were given to eliminate the overhead of maintaining separate index tables. Extensive experiments were conducted on Amazon EC2 using cluster sizes of 4, 8 and 16 normal nodes. Experimental results show that New-grid performance is more optimized than MD-HBase and MapReduce.

Key words: Location-Based Service (LBS), multidimensional index, HBase, space filling curve, coverage network

中图分类号: