计算机应用 ›› 2015, Vol. 35 ›› Issue (4): 956-959.DOI: 10.11772/j.issn.1001-9081.2015.04.0956

• 先进计算 • 上一篇    下一篇

基于一致性Hash的分布式海量分子检索模型

孙霞1, 禹龙2,3, 田生伟1,3, 闫奕霖4, 林江丽5   

  1. 1. 新疆大学 软件学院, 乌鲁木齐 830008;
    2. 新疆大学 网络中心, 乌鲁木齐 830046;
    3. 江苏理工学院 计算机工程学院, 江苏 常州 213001;
    4. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046;
    5. 新疆大学 化学化工学院, 乌鲁木齐 830046
  • 收稿日期:2014-11-04 修回日期:2014-12-31 出版日期:2015-04-10 发布日期:2015-04-08
  • 通讯作者: 禹龙
  • 作者简介:孙霞(1989-),女,新疆乌鲁木齐人,硕士研究生,主要研究方向:云计算; 禹龙(1974-),女,新疆乌鲁木齐人,教授,主要研究方向:云计算、计算机网络; 田生伟(1973-),男,新疆乌鲁木齐人,教授,博士,CCF会员,主要研究方向:云计算、计算机网络; 闫奕霖(1990-),女,辽宁沈阳人,硕士研究生,主要研究方向:云计算; 林江丽(1975-),女,新疆乌鲁木齐人,高级实验师,博士,主要研究方向:应用化学。
  • 基金资助:

    国家自然科学基金资助项目(31160341)。

Distributed massive molecule retrieval model based on consistent Hash

SUN Xia1, YU Long2,3, TIAN Shengwei1,3, YAN Yilin4, LIN Jiangli5   

  1. 1. School of Software, Xinjiang University, Urumqi Xinjiang 830008, China;
    2. Network Center, Xinjiang University, Urumqi Xinjiang 830046, China;
    3. School of Computer Engineering, Jiangsu University of Technology, Changzhou Jiangsu 213001, China;
    4. School of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046, China;
    5. School of Chemistry and Chemical Engineering, Xinjiang University, Urumqi Xinjiang 830046, China
  • Received:2014-11-04 Revised:2014-12-31 Online:2015-04-10 Published:2015-04-08

摘要:

针对大数据环境下,传统通用图匹配检索低效、折射率数据无法快速定位的问题,建立了基于一致性Hash的分布式海量分子检索模型。模型结合分子特点,将连续的折射率通过等宽算法离散化建立高速Hash索引,实现分布式海量分子检索系统,有效减小了参与计算的分子数据规模,并根据分子访问频次处理冲突从而提高分子检索效率。实验结果表明,在包含20万个分子的数据中,该方法平均检索耗时约为通用图匹配平均检索耗时的5%,模型性能稳定,具有高可扩展性;对于海量数据环境下依据折射率检索高频次分子较为适用。

关键词: 分子检索, 离散化, 一致性Hash, 冲突处理, 分布式计算

Abstract:

In view of the problems that the traditional general graph matching search is inefficient, and refractive index data cannot be positioned fast in large data environment, a distributed massive molecular retrieval model based on consistent Hash function was established. Combined with the characteristics of molecular storage structures, to improve retrieval efficiency of molecules, the continuous refractive index was discretized by fixed width algorithm to establish high-speed Hash index, and the distributed massive retrieval system was realized. The size of dataset was effectively reduced, and Hash collision was handled according to the visiting frequency. The experimental results show that, in the chemical data containing 200 thousand structures of molecules, the average time of this method is about five percent of the traditional general graph matching search. Besides, the model has the steady performance with high scalability. It is applicable to retrieve high-frequency molecules in accordance with refractive index under the environment of massive data.

Key words: molecular retrieval, discretization, consistent Hash, conflict settlement, distributed computation

中图分类号: