MG-SQL: A SQL Generation Framework with Enhanced Schema Linking and Multi-Generator Collaboration

doi:10.11772/j.issn.1001-9081.2025040454

Journal of Computer Applications

Received:2025-04-25 Revised:2025-06-11 Accepted:2025-06-12 Online:2025-06-23 Published:2025-06-23

MG-SQL:增强模式链接与多生成器协同的SQL生成框架

吴定佳^1,2，崔喆^1*

1.中国科学院成都计算机应用研究所，成都 610213；2. 中国科学院大学计算机科学与技术学院，北京 100049

通讯作者: 崔喆
基金资助:
智能联网工业控制系统主动安全理论与技术

Abstract

Abstract: To address the limitations of Large Language Models (LLMs) in generating Structured Query Language (SQL) statements for complex multi-table database scenarios, a Multi-Generator SQL framework (MG-SQL) based on collaborative generators was proposed. First, to mitigate noise interference caused by irrelevant schema information in complex databases, the schema linking process was enhanced by generating initial SQL, combined with semantic similarity-based retrieval. Second, to improve the quality and diversity of candidate SQL statements, a multi-strategy collaborative generation framework was developed using refined schema descriptions: 1) the experience generator retrieved dynamic examples; 2) the chain-of-thought generator strengthened logical reasoning; 3) the query planner generator simulated database execution workflows; and 4) the progressive generator performed iterative optimization. Finally, the optimal SQL was ultimately selected through voting mechanisms. A reflective learning mechanism was further proposed, where comparative analysis between generated results and reference SQL formed reflective samples to dynamically construct domain-specific knowledge bases for continuous learning. Evaluations on the BIRD benchmark demonstrated that when employing the lightweight GPT-4o-mini model, the schema linking module achieved 98% Strict Recall Rate (SRR) while effectively filtering 45% redundant columns. The framework attained 69.69% EXecution accuracy (EX) and 79.59% Valid Efficiency Score (VES), outperforming mainstream GPT-4o-based approaches, which validates its practical effectiveness in complex scenarios.

Key words: schema linking, large Language Model (LLM), Text-to- Structured Query Language (SQL), Retrieval-Augmented, In-Context Learning (ICL)

摘要： 针对大语言模型(LLM)在复杂多表数据库场景下生成结构化查询语言(SQL)的局限性，提出基于多生成器协同的Text-to-SQL框架MG-SQL(Multi-Generator SQL)。首先，针对无关模式信息导致的噪声干扰，通过生成初始SQL语句，结合语义相似度检索，提出增强模式链接优化方法。其次，为增强候选SQL的质量与多样性，基于精简模式构建多策略协同生成框架：1）经验生成器检索动态示例；2）思维链生成器强化逻辑推理；3）查询计划生成器模拟数据库的执行流程；4）渐进生成器迭代优化，并通过投票机制择优。最后，进一步提出反思学习机制，通过对比生成结果与参考SQL形成反思样本，动态构建领域经验库以实现持续学习。在BIRD基准测试中，采用轻量级GPT-4o-mini模型时，模式链接实现98%的严格召回率(SRR)，有效筛除45%无关列；生成的SQL执行准确率(EX)达69.69%，有效效率分数(VES)达79.59%，超越基于GPT-4o的主流方法，验证了框架在复杂场景下的实用性。

关键词: 模式链接, 大语言模型, Text-to-SQL, 检索增强, 上下文学习

CLC Number:

TP311.13','1');return false;" target="_blank"> TP311.13

吴定佳崔喆. MG-SQL:增强模式链接与多生成器协同的SQL生成框架[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025040454.

[1]	Yiheng SUN, Maofu LIU. Tender information extraction method based on prompt tuning of knowledge [J]. Journal of Computer Applications, 2025, 45(4): 1169-1176.
[2]	Chenwei SUN, Junli HOU, Xianggen LIU, Jiancheng LYU. Large language model prompt generation method for engineering drawing understanding [J]. Journal of Computer Applications, 2025, 45(3): 801-807.
[3]	Yanmin DONG, Jiajia LIN, Zheng ZHANG, Cheng CHENG, Jinze WU, Shijin WANG, Zhenya HUANG, Qi LIU, Enhong CHEN. Design and practice of intelligent tutoring algorithm based on personalized student capability perception [J]. Journal of Computer Applications, 2025, 45(3): 765-772.
[4]	Can MA, Ruizhang HUANG, Lina REN, Ruina BAI, Yaoyao WU. Chinese spelling correction method based on LLM with multiple inputs [J]. Journal of Computer Applications, 2025, 45(3): 849-855.
[5]	Chaofeng LU, Ye TAO, Lianqing WEN, Fei MENG, Xiugong QIN, Yongjie DU, Yunlong TIAN. Speaker-emotion voice conversion method with limited corpus based on large language model and pre-trained model [J]. Journal of Computer Applications, 2025, 45(3): 815-822.
[6]	Kun SHENG, Zhongqing WANG. Synaesthesia metaphor analysis based on large language model and data augmentation [J]. Journal of Computer Applications, 2025, 45(3): 794-800.
[7]	Jing HE, Yang SHEN, Runfeng XIE. Recognition and optimization of hallucination phenomena in large language models [J]. Journal of Computer Applications, 2025, 45(3): 709-714.
[8]	Wei CHEN, Changyong SHI, Chuanxiang MA. Crop disease recognition method based on multi-modal data fusion [J]. Journal of Computer Applications, 2025, 45(3): 840-848.
[9]	Xiaolin QIN, Xu GU, Dicheng LI, Haiwen XU. Survey and prospect of large language models [J]. Journal of Computer Applications, 2025, 45(3): 685-696.
[10]	Chengzhe YUAN, Guohua CHEN, Dingding LI, Yuan ZHU, Ronghua LIN, Hao ZHONG, Yong TANG. ScholatGPT： a large language model for academic social networks and its intelligent applications [J]. Journal of Computer Applications, 2025, 45(3): 755-764.
[11]	Yanping ZHANG, Meifang CHEN, Changhai TIAN, Zibo YI, Wenpeng HU, Wei LUO, Zhunchen LUO. Multi-strategy retrieval-augmented generation method for military domain knowledge question answering systems [J]. Journal of Computer Applications, 2025, 45(3): 746-754.
[12]	Yuemei XU, Yuqi YE, Xueyi HE. Bias challenges of large language models： identification， evaluation， and mitigation [J]. Journal of Computer Applications, 2025, 45(3): 697-708.
[13]	Yan YANG, Feng YE, Dong XU, Xuejie ZHANG, Jin XU. Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning [J]. Journal of Computer Applications, 2025, 45(3): 785-793.
[14]	Xuefei ZHANG, Liping ZHANG, Sheng YAN, Min HOU, Yubo ZHAO. Personalized learning recommendation in collaboration of knowledge graph and large language model [J]. Journal of Computer Applications, 2025, 45(3): 773-784.
[15]	Peng CAO, Guangqi WEN, Jinzhu YANG, Gang CHEN, Xinyi LIU, Xuechun JI. Efficient fine-tuning method of large language models for test case generation [J]. Journal of Computer Applications, 2025, 45(3): 725-731.