基于一致性Hash的分布式海量分子检索模型

doi:10.11772/j.issn.1001-9081.2015.04.0956

计算机应用 ›› 2015, Vol. 35 ›› Issue (4): 956-959.DOI: 10.11772/j.issn.1001-9081.2015.04.0956

基于一致性Hash的分布式海量分子检索模型

孙霞¹, 禹龙^2,3, 田生伟^1,3, 闫奕霖⁴, 林江丽⁵

1. 新疆大学软件学院, 乌鲁木齐 830008;
2. 新疆大学网络中心, 乌鲁木齐 830046;
3. 江苏理工学院计算机工程学院, 江苏常州 213001;
4. 新疆大学信息科学与工程学院, 乌鲁木齐 830046;
5. 新疆大学化学化工学院, 乌鲁木齐 830046

收稿日期:2014-11-04 修回日期:2014-12-31 出版日期:2015-04-10 发布日期:2015-04-08
通讯作者: 禹龙
作者简介:孙霞(1989-),女,新疆乌鲁木齐人,硕士研究生,主要研究方向:云计算; 禹龙(1974-),女,新疆乌鲁木齐人,教授,主要研究方向:云计算、计算机网络; 田生伟(1973-),男,新疆乌鲁木齐人,教授,博士,CCF会员,主要研究方向:云计算、计算机网络; 闫奕霖(1990-),女,辽宁沈阳人,硕士研究生,主要研究方向:云计算; 林江丽(1975-),女,新疆乌鲁木齐人,高级实验师,博士,主要研究方向:应用化学。
基金资助:
国家自然科学基金资助项目(31160341)。

Distributed massive molecule retrieval model based on consistent Hash

SUN Xia¹, YU Long^2,3, TIAN Shengwei^1,3, YAN Yilin⁴, LIN Jiangli⁵

1. School of Software, Xinjiang University, Urumqi Xinjiang 830008, China;
2. Network Center, Xinjiang University, Urumqi Xinjiang 830046, China;
3. School of Computer Engineering, Jiangsu University of Technology, Changzhou Jiangsu 213001, China;
4. School of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046, China;
5. School of Chemistry and Chemical Engineering, Xinjiang University, Urumqi Xinjiang 830046, China

Received:2014-11-04 Revised:2014-12-31 Online:2015-04-10 Published:2015-04-08

摘要/Abstract

摘要：

针对大数据环境下,传统通用图匹配检索低效、折射率数据无法快速定位的问题,建立了基于一致性Hash的分布式海量分子检索模型。模型结合分子特点,将连续的折射率通过等宽算法离散化建立高速Hash索引,实现分布式海量分子检索系统,有效减小了参与计算的分子数据规模,并根据分子访问频次处理冲突从而提高分子检索效率。实验结果表明,在包含20万个分子的数据中,该方法平均检索耗时约为通用图匹配平均检索耗时的5%,模型性能稳定,具有高可扩展性;对于海量数据环境下依据折射率检索高频次分子较为适用。

关键词: 分子检索, 离散化, 一致性Hash, 冲突处理, 分布式计算

Abstract:

In view of the problems that the traditional general graph matching search is inefficient, and refractive index data cannot be positioned fast in large data environment, a distributed massive molecular retrieval model based on consistent Hash function was established. Combined with the characteristics of molecular storage structures, to improve retrieval efficiency of molecules, the continuous refractive index was discretized by fixed width algorithm to establish high-speed Hash index, and the distributed massive retrieval system was realized. The size of dataset was effectively reduced, and Hash collision was handled according to the visiting frequency. The experimental results show that, in the chemical data containing 200 thousand structures of molecules, the average time of this method is about five percent of the traditional general graph matching search. Besides, the model has the steady performance with high scalability. It is applicable to retrieve high-frequency molecules in accordance with refractive index under the environment of massive data.

Key words: molecular retrieval, discretization, consistent Hash, conflict settlement, distributed computation

中图分类号:

TP392
O6

孙霞, 禹龙, 田生伟, 闫奕霖, 林江丽. 基于一致性Hash的分布式海量分子检索模型[J]. 计算机应用, 2015, 35(4): 956-959.

SUN Xia, YU Long, TIAN Shengwei, YAN Yilin, LIN Jiangli. Distributed massive molecule retrieval model based on consistent Hash[J]. Journal of Computer Applications, 2015, 35(4): 956-959.

参考文献

[1] PAN K. Design and implementation of molecules substructure in ChemDataBase [D]. Lanzhou: Lanzhou University, 2009.(潘凯. ChemDataBase数据库中化学分子子结构检索方法的设计与实现[D]. 兰州:兰州大学, 2009.)
[2] DAHM N, BUNKE H, CAELLI T, et al. Topological features and iterative node elimination for speeding up subgraph isomorphism detection[C]// Proceedings of the 2012 IEEE 21st International Conference on Pattern Recognition. Piscataway: IEEE Press, 2012: 1164-1167.
[3] ZHANG Z, LIU K, DAI Z, et al. Research progress of high birefringence liquid crystal compounds [J]. Chinese Journal of Liquid Crystals and Displays, 2014, 29(6): 873-880.(张智勇, 刘可庆, 戴志群, 等. 高双折射液晶化合物的研究进展[J]. 液晶与显示, 2014, 29(6): 873-880.)
[4] PENG Z, LIU Y, YAO L, et al. Improvement of response performance of liquid crystal optical devices by using a low viscosity component [J]. Chinese Physics Letters, 2011, 28(9): 094207.
[5] XUE J, HANG L, LIU H. Study on controlled refractive index of optical thin films by PECVD [J] Optical Technique, 2014,40(4): 353-356.(薛俊, 杭凌侠, 刘昊轩. PECVD制备光学薄膜材料折射率控制技术[J]. 光学技术, 2014,40(4):353-356.)
[6] SONG X, LYU X, ZHANG J. Study on the infrared stealth technology of plane [J]. Laser and Infrared, 2012, 42(1): 3-7.(宋新波, 吕雪艳, 章建军. 飞机红外隐身技术研究[J]. 激光与红外, 2012, 42(1): 3-7.)
[7] LI J, ZHANG L, CHEN J, et al. Compounds biological active analysis system based on Hadoop [J]. Computer Engineering, 2012, 38(13): 48-50.(李杰辉, 张亮, 陈健, 等. 基于Hadoop的化合物生物活性分析系统[J]. 计算机工程, 2012, 38(13): 48-50.)
[8] XU Y, ZHONG C. Unsupervised discretization algorithm based on ensemble learning [J]. Journal of Computer Applications, 2014, 34(8): 2184-2187.(徐盈盈, 钟才明. 基于集成学习的无监督离散化算法[J]. 计算机应用, 2014, 34(8): 2184-2187.)
[9] LEE B, JEONG Y, SONG H, et al. A scalable and highly available network management architecture on consistent hashing[C]// Proceedings of the 2010 IEEE Global Telecommunications Conference. Piscataway: IEEE Press, 2010: 1-6.
[10] CHI X, LIU B, NIU Q, et al. Web load balance and cache optimization design based Nginx under high-concurrency environment[C]// Proceedings of the 2012 IEEE Third International Conference on Digital Manufacturing and Automation. Piscataway: IEEE Press, 2012: 1029-1032.
[11] ZHANG Z, LIU Y. Effective solution to Hash collision[J]. Journal of Computer Applications, 2010, 30(11): 2965-2966.(张朝霞, 刘耀军. 有效的哈希冲突解决办法[J]. 计算机应用, 2010, 30(11): 2965-2966.)
[12] SONG T, HE X, WEN H. The processing of molecular structure information in ECDB[J]. Computers and Applied Chemistry, 2009, 25(9): 1152-1158.(宋婷婷,何险峰,温浩.工程化学数据库中分子结构信息的处理 [J].计算机与应用化学,2009, 25(9): 1152-1158.)

[1]	王周恺, 张炯, 马维纲, 王怀军. 面向高速列车监测数据的并行解压缩算法[J]. 计算机应用, 2021, 41(9): 2586-2593.
[2]	李萍, 汪芬, 陈祺东, 孙俊. 求解多目标社区发现问题的离散化随机漂移粒子群优化算法[J]. 计算机应用, 2021, 41(3): 803-811.
[3]	韩俊樱, 张振宇, 孔德仕. 移动群智感知中面向用户区域的分布式多任务分配方法[J]. 计算机应用, 2020, 40(2): 358-362.
[4]	赵永柱, 黎卫东, 唐斌, 梅峰, 卢文达. 面向期限感知分布式矩阵相乘的高效存储方案[J]. 计算机应用, 2020, 40(2): 311-315.
[5]	潘鸣宇, 张禄, 龙国标, 李香龙, 马冬雪, 徐亮. 用于重复充电运营记录的基于块采样的高效聚集查询算法[J]. 计算机应用, 2018, 38(6): 1596-1600.
[6]	杨张龙, 陈明. 大规模网格模型间的快速视觉布尔运算[J]. 计算机应用, 2017, 37(7): 2050-2056.
[7]	江洋, 李成海. 基于改进变精度粗糙集的漏洞威胁评估[J]. 计算机应用, 2017, 37(5): 1353-1356.
[8]	曾沁, 李永生. 基于分布式计算框架的风暴三维追踪方法[J]. 计算机应用, 2017, 37(4): 941-944.
[9]	丁梦苏, 陈世敏. 轻量级大数据运算系统Helius[J]. 计算机应用, 2017, 37(2): 305-310.
[10]	霍纬纲, 程震, 程文莉. 面向不等长多维时间序列的聚类改进算法[J]. 计算机应用, 2017, 37(12): 3477-3481.
[11]	董跃华, 刘力. 基于自适应改进粒子群优化的数据离散化算法[J]. 计算机应用, 2016, 36(1): 188-193.
[12]	张钰陈靖王涌天周琪. 增强现实浏览器的密集热点定位与显示[J]. 计算机应用, 2014, 34(5): 1435-1438.
[13]	杨辉华任洪军李灵巧段礼新郭拓杜玲玲漆小泉. 基于Sector/Sphere的气相色谱-质谱联用多样本并行对齐算法[J]. 计算机应用, 2013, 33(01): 215-218.
[14]	丁利向来生刘希玉宋超超. 改进图聚类算法及其应用[J]. 计算机应用, 2012, 32(12): 3278-3282.
[15]	李楠冯涛刘斌李贤徽刘磊. 基于面向服务对象体系结构的交通噪声地图分布式计算方法[J]. 计算机应用, 2012, 32(08): 2146-2149.

基于一致性Hash的分布式海量分子检索模型

Distributed massive molecule retrieval model based on consistent Hash

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics