计算机应用 ›› 2016, Vol. 36 ›› Issue (3): 675-680.DOI: 10.11772/j.issn.1001-9081.2016.03.675

• 大数据 • 上一篇    下一篇

基于并行遗传最大最小蚁群算法的分布式数据库查询优化

林基明1, 班文娇1, 王俊义1,2, 童记超1   

  1. 1. 桂林电子科技大学 信息与通信学院, 广西 桂林 541004;
    2. 桂林电子科技大学 广西密码学与信息安全重点实验室, 广西 桂林 541004
  • 收稿日期:2015-08-13 修回日期:2015-11-11 出版日期:2016-03-10 发布日期:2016-03-17
  • 通讯作者: 林基明
  • 作者简介:林基明(1970-),男,四川三台人,教授,博士生导师,博士,主要研究方向:无线通信、无线传感网络;班文娇(1989-),女,广西南宁人,硕士研究生,主要研究方向:分布式数据库、云计算;王俊义(1977-),男,河北任县人,副教授,博士,主要研究方向:网络优化、应用层组播;童记超(1990-),男,河南信阳人,硕士研究生,主要研究方向:分布式数据库、云计算。
  • 基金资助:
    国家自然科学基金资助项目(61261017);广西自然科学基金资助项目(2014GXNSFAA118387);广西无线宽带通信与信号处理重点实验室资助项目(GXKL0614202);桂林电子科技大学研究生科研创新项目(YJCXS201523)。

Query optimization for distributed database based on parallel genetic algorithm and max-min ant system

LIN Jiming1, BAN Wenjiao1, WANG Junyi1,2, TONG Jichao1   

  1. 1. College of Information and Communication, Guilin University of Electronic Technology, Guilin Guangxi 541004, China;
    2. Guangxi Key Laboratory of Cryptography and Information Security, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
  • Received:2015-08-13 Revised:2015-11-11 Online:2016-03-10 Published:2016-03-17
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61261017), the Natural Science Foundation of Guangxi Province of China (2013GXNSFAA019334), the Foundation of Guangxi Key Laboratory of Wireless Broadband Communication and Signal Processing (GXKL0614202), the Graduate Research and Innovation Projects of Guilin University of Electronic Technology (YJCXS201523).

摘要: 针对分布式数据库中关系及其分片多副本、多站点存储的特性会增加查询搜索空间及时间复杂度,从而降低查询执行计划(QEP)搜索效率的问题,提出一种基于分片分配选择器(FSS)设计准则的并行遗传-最大最小蚁群算法(PGA-MMAS)。首先,结合实际的企业分布式信息管理系统设计FSS,启发式选择较优关系副本,以减少查询连接代价并缩小PGA-MMAS的搜索空间;然后结合遗传算法(GA)收敛较快的优势,对最终连接关系进行编码和并行遗传操作,得到一组相对较优的QEP,并将其转化为并行最大最小蚁群算法(MMAS)的初始信息素分布,从而使其更快速地搜索到全局最优QEP;最后分别在不同关系数情况下对算法进行仿真实验,结果表明,基于FSS的PGA-MMAS搜索最优QEP的效率高于原GA以及基于FFS的GA、MMAS和GA-MMAS;经实际工程应用验证,所提算法搜索出的高质量QEP可以提高分布式数据库多关系查询效率。

关键词: 分布式数据库, 遗传算法, 最大最小蚁群算法, 最优查询执行计划, 并行

Abstract: Since relation and its fragments in the distributed database have multiple copies and multi-site storage characteristics, which increases the time and space complexity of query and results in lower search efficiency of Query Execution Plan (QEP), a Parallel Genetic Algorithm and Max-Min Ant System (PGA-MMAS) based on design principles of Fragments Site Selector (FSS) was proposed. Firstly, based on the design requirement of the distributed information management system for actual business, FSS was designed, which selected the best one from many copies of relationship heuristically to decrease query join cost and search space of PGA-MMAS. Secondly, Genetic Algorithm (GA) encoded the final join relations and conducted parallel genetic operation to get a set of relative optimal QEPs by taking advantage of quick convergence of GA. Then, QEPs were transformed into the initial pheromone distribution of Max-Min Ant System (MMAS) to obtain the optimal QEP quickly and efficiently. Finally, simulation experiments were conducted under different number of relation conditions, and the results show that PGA-MMAS based on FSS searches optimal QEP more efficiently than original GA, Fragments Site Selector-Genetic Algorithm (FSS-GA), Fragments Site Selector-Max-Min Ant System (FSS-MMAS) and Fragments Site Selector-Genetic Algorithm-Max-Min Ant System (FSS-GA-MMAS). And in the actual engineering application, the proposed algorithm can search high-quality QEP to improve the efficiency of multi-join query in distributed database.

Key words: distributed database, Genetic Algorithm (GA), Max-Min Ant System (MMAS), optimal Query Execution Plan (QEP), parallel

中图分类号: