Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (6): 1811-1817.DOI: 10.11772/j.issn.1001-9081.2025060745

• Data science and technology • Previous Articles    

CORER: collaborative multi-knowledge large language model prompt framework for IT application innovation database migration

Yusheng YI, Zhaohao HUANG, Zihao DENG, Leilei KONG(), Haoliang QI   

  1. School of Computer Science and Artificial Intelligence,Foshan University,Foshan Guangdong 528000,China
  • Received:2025-07-08 Revised:2025-09-18 Accepted:2025-09-22 Online:2025-10-16 Published:2026-06-10
  • Contact: Leilei KONG
  • About author:YI Yusheng, born in 1989, Ph. D., lecturer. His research interests include natural language processing, IT application innovation.
    HUANG Zhaohao, born in 1999, M. S. His research interests include natural language processing, IT application innovation.
    DENG Zihao, born in 2001, M. S. candidate. His research interests include natural language processing, IT application innovation.
    QI Haoliang, born in 1972, Ph. D., professor. His research interests include natural language processing, machine learning.
    First author contact:KONG Leilei, born in 1979, Ph. D., professor. Her research interests include natural language processing, machine learning.
  • Supported by:
    General Program of National Natural Science Foundation of China(62276064)

面向信创数据库迁移的多知识库协同大语言模型提示框架CORER

易宇声, 黄兆豪, 邓梓昊, 孔蕾蕾(), 齐浩亮   

  1. 佛山大学 计算机与人工智能学院,广东 佛山 528000
  • 通讯作者: 孔蕾蕾
  • 作者简介:易宇声(1989—),男,湖南益阳人,讲师,博士,CCF会员,主要研究方向:自然语言处理、信息技术应用创新
    黄兆豪(1999—),男,广东茂名人,硕士,主要研究方向:自然语言处理、信息技术应用创新
    邓梓昊(2001—),男,广东江门人,硕士研究生,主要研究方向:自然语言处理、信息技术应用创新
    齐浩亮(1972—),男,黑龙江哈尔滨人,教授,博士,CCF会员,主要研究方向:自然语言处理、机器学习。
    第一联系人:孔蕾蕾(1979—),女,黑龙江哈尔滨人,教授,博士,CCF会员,主要研究方向:自然语言处理、机器学习
  • 基金资助:
    国家自然科学基金面上项目(62276064)

Abstract:

The main task of Information Technology (IT) application innovation database migration is to migrate the data structure and data from non-domestic databases to domestic databases smoothly. In view of the challenges of syntax differences and complex business logic adaptation between heterogeneous databases in the current IT application innovation database migration, a collaborative multi-knowledge Large Language Model (LLM) prompt framework for IT application innovation-oriented databases migration, CORER (Context-Objective-Rules-Examples-Response), was proposed, the openGauss Structured Query Language (SQL) syntax rule knowledge base covering 199 SQL syntax rule types and containing 4 162 syntax rules was constructed, and the migration sample knowledge base covering 20.6% of the syntax rule types was constructed by integrating official templates and real cases. Then, the syntax rule knowledge and migration sample knowledge were injected into the LLM context based on the prompt elements, thereby matching the syntax, logic and architecture characteristics of heterogeneous databases adaptively, and guiding the LLM to complete the SQL statement refactoring accurately. Experimental results show that the accuracy of CORER in the MySQL to openGauss migration task is 93.44%, which is 1.31 percentage points higher than that of the rule-based method, and is increased by 7.02% in advanced feature scenarios such as storage procedures and triggers, verifying the comprehensive advantages of CORER in IT innovation-oriented database migration scenarios.

Key words: Information Technology (IT) application innovation, database migration, Structured Query Language (SQL) refactoring, prompt framework, Large Language Model (LLM)

摘要:

信息技术(IT)应用创新(简称“信创”)数据库迁移的主要任务是将数据结构与数据从非国产数据库平稳迁移至国产数据库。针对当前信创数据库迁移中存在的异构数据库间语法差异和业务逻辑适配复杂等挑战,提出一种面向信创数据库迁移的多知识库协同的大语言模型(LLM)提示框架CORER(Context-Objective-Rules-Examples-Response),构建覆盖199种结构化查询语言(SQL)语法规则类型且包含4 162条语法规则的openGauss SQL语法规则知识库,并融合官方模板与真实案例构建覆盖20.6%语法规则类型的迁移样例知识库。基于提示要素,在LLM上下文中注入语法规则知识与迁移样例知识,以自适应地完成对异构数据库语法、逻辑和架构特性的匹配,引导LLM精准地完成SQL语句重构。实验结果表明,CORER在MySQL到openGauss的迁移任务中的准确率达93.44%,相较于基于规则的方法提升了1.31个百分点,且在存储过程和触发器等高级特性场景中提升了7.02%,验证了CORER在信创数据库迁移场景下的综合优势。

关键词: 信息技术(IT)应用创新, 数据库迁移, 结构化查询语言重构, 提示框架, 大语言模型

CLC Number: