Journal of Computer Applications

    Next Articles

NL2SQL implementation method for auction data of cultural relics and artworks based on RAG

  

  • Received:2025-02-07 Revised:2025-04-16 Online:2025-05-26 Published:2025-05-26

基于RAG的文物艺术品拍卖数据NL2SQL实现方法

李成华1,张浏鹏2,石鸿凌1   

  1. 1. 中南民族大学 智能无线通信湖北省重点实验室,武汉 430074
    2. 中南民族大学
  • 通讯作者: 张浏鹏
  • 基金资助:
    中南民族大学学术创新团队经费项目资助

Abstract: NL2SQL (Natural Language to SQL) can reduce the technical threshold for non-professionals to operate databases, and improve user experience and work efficiency. Retrieval-Augmented Generation (RAG) technology can improve the performance of NL2SQL by introducing external knowledge bases. In view of the problems of high missed detection rate of retrieval strategy and weak relevance of recall context in the current RAG application of NL2SQL, this paper provides a RAG method of sequential retrieval reranking (RAG-SRR), which optimizes the knowledge base construction, retrieval recall strategy, and prompt design. Firstly, the domain knowledge base was constructed from three aspects: question-answer pairs, professional terms, and database structures. The question-answer pairs were constructed based on the high-frequency processing and query questions of the auction supervision of cultural relics and artworks, the professional terms were constructed based on auction industry standards, and the database structures were constructed based on data of Artron Art Auction Network. Secondly, a sequential retrieval strategy was adopted in the retrieval stage, and different priorities were set for three types of knowledge bases. In the recall stage, retrieved information were reranked. Finally, the principles of prompt optimization design were given in the prompt word design, and prompt template was provided. Experimental results show that in domain dataset and Spider dataset, the execution accuracy of RAG-SRR and BERT-based model and RESDSQL model are improved by at least 19.50 percentage points, 24.20 percentage points, and 12.17 percentage points, 8.90 percentage points, respectively. Under the same large language model, the execution accuracy of RAG-SRR is at least 12.83 percentage points and 16.33 percentage points higher than that of the unoptimized RAG, respectively. The execution accuracy is at least improved by 0.30 percentage points and 3.90 percentage points respectively compared with DIN-SQL and other methods. It can be seen that the RAG-SRR method has strong practicality and portability.

Key words: Chinese NL2SQL, Retrieval-Augmented Generation &#40, RAG&#41, large language model

摘要: 自然语言转换结构化查询语言(NL2SQL)能降低非专业人员操作数据库的技术门槛,提升用户体验和工作效率。检索增强生成(RAG)技术通过引入外部知识库可提升NL2SQL的性能。针对目前RAG在NL2SQL落地应用中存在检索策略漏检率高、召回上下文的相关性不强等问题,提供一种分序检索重排序的RAG(RAG-SRR),该方法从知识库构建、检索召回策略、提示词设计等环节进行优化。首先,从问答对、专业名词、数据库结构三个方面进行领域知识库的构建,问答对根据文物艺术品拍卖监管高频处理和查询的问题构建,专业名词根据拍卖行业标准构建,数据库结构根据雅昌艺术拍卖网数据构建;其次,在检索阶段采取分序检索的策略,对三类知识库设置不同的优先级,且在召回阶段将检索的信息进行重排序;最后,在提示词设计中给出提示词优化设计的原则及提示词模板。实验结果表明,在领域数据集和Spider数据集中,RAG-SRR与基于BERT模型和RESDSQL模型的执行准确率至少分别提高了19.50、24.20和12.17、8.90个百分点,在相同大模型下,RAG-SRR比未优化的RAG的执行准确率至少分别提高了12.83、16.33个百分点,与DIN-SQL等方法的执行准确率至少分别提高了0.30和3.90个百分点。可见RAG-SRR方法具备较强的实用性和可移植性。

关键词: 中文NL2SQL, 检索增强生成, 大语言模型, 重排序, 文物艺术品拍卖

CLC Number: