Journal of Computer Applications

    Next Articles

MG-SQL: A SQL Generation Framework with Enhanced Schema Linking and Multi-Generator Collaboration

  

  • Received:2025-04-25 Revised:2025-06-11 Accepted:2025-06-12 Online:2025-06-23 Published:2025-06-23

MG-SQL:增强模式链接与多生成器协同的SQL生成框架

吴定佳1,2,崔 喆1*   

  1. 1.中国科学院 成都计算机应用研究所,成都 610213;2. 中国科学院大学 计算机科学与技术学院,北京 100049
  • 通讯作者: 崔 喆
  • 基金资助:
    智能联网工业控制系统主动安全理论与技术

Abstract: To address the limitations of Large Language Models (LLMs) in generating Structured Query Language (SQL) statements for complex multi-table database scenarios, a Multi-Generator SQL framework (MG-SQL) based on collaborative generators was proposed. First, to mitigate noise interference caused by irrelevant schema information in complex databases, the schema linking process was enhanced by generating initial SQL, combined with semantic similarity-based retrieval. Second, to improve the quality and diversity of candidate SQL statements, a multi-strategy collaborative generation framework was developed using refined schema descriptions: 1) the experience generator retrieved dynamic examples; 2) the chain-of-thought generator strengthened logical reasoning; 3) the query planner generator simulated database execution workflows; and 4) the progressive generator performed iterative optimization. Finally, the optimal SQL was ultimately selected through voting mechanisms. A reflective learning mechanism was further proposed, where comparative analysis between generated results and reference SQL formed reflective samples to dynamically construct domain-specific knowledge bases for continuous learning. Evaluations on the BIRD benchmark demonstrated that when employing the lightweight GPT-4o-mini model, the schema linking module achieved 98% Strict Recall Rate (SRR) while effectively filtering 45% redundant columns. The framework attained 69.69% EXecution accuracy (EX) and 79.59% Valid Efficiency Score (VES), outperforming mainstream GPT-4o-based approaches, which validates its practical effectiveness in complex scenarios.

Key words: schema linking, large Language Model (LLM), Text-to- Structured Query Language (SQL), Retrieval-Augmented, In-Context Learning (ICL)

摘要: 针对大语言模型(LLM)在复杂多表数据库场景下生成结构化查询语言(SQL)的局限性,提出基于多生成器协同的Text-to-SQL框架MG-SQL(Multi-Generator SQL)。首先,针对无关模式信息导致的噪声干扰,通过生成初始SQL语句,结合语义相似度检索,提出增强模式链接优化方法。其次,为增强候选SQL的质量与多样性,基于精简模式构建多策略协同生成框架:1)经验生成器检索动态示例;2)思维链生成器强化逻辑推理;3)查询计划生成器模拟数据库的执行流程;4)渐进生成器迭代优化,并通过投票机制择优。最后,进一步提出反思学习机制,通过对比生成结果与参考SQL形成反思样本,动态构建领域经验库以实现持续学习。在BIRD基准测试中,采用轻量级GPT-4o-mini模型时,模式链接实现98%的严格召回率(SRR),有效筛除45%无关列;生成的SQL执行准确率(EX)达69.69%,有效效率分数(VES)达79.59%,超越基于GPT-4o的主流方法,验证了框架在复杂场景下的实用性。

关键词: 模式链接, 大语言模型, Text-to-SQL, 检索增强, 上下文学习

CLC Number: