Text-to-SQL model based on semantic enhanced schema linking

doi:10.11772/j.issn.1001-9081.2023091360

Abstract

Abstract:

To optimize Text-to-SQL generation performance based on heterogeneous graph encoder， SELSQL model was proposed. Firstly， an end-to-end learning framework was employed by the model， and the Poincaré distance metric in hyperbolic space was used instead of the Euclidean distance metric to optimize semantically enhanced schema linking graph constructed by the pre-trained language model using probe technology. Secondly， K-head weighted cosine similarity and graph regularization method were used to learn the similarity metric graph so that the initial schema linking graph was iteratively optimized during training. Finally， the improved Relational Graph ATtention network （RGAT） graph encoder and multi-head attention mechanism were used to encode the joint semantic schema linking graphs of the two modules， and Structured Query Language （SQL） statement decoding was solved using a grammar-based neural semantic decoder and a predefined structured language. Experimental results on Spider dataset show that when using ELECTRA-large pre-training model， the accuracy of SELSQL model is increased by 2.5 percentage points compared with the best baseline model， which has a great improvement effect on the generation of complex SQL statements.

Key words: schema linking, graph structure learning, pre-trained language model, Text-to-SQL, heterogeneous graph

摘要：

为优化基于异构图编码器的Text-to-SQL生成效果，提出SELSQL模型。首先，模型采用端到端的学习框架，使用双曲空间下的庞加莱距离度量替代欧氏距离度量，以此优化使用探针技术从预训练语言模型中构建的语义增强的模式链接图；其次，利用K头加权的余弦相似度以及图正则化方法学习相似度度量图使得初始模式链接图在训练中迭代优化；最后，使用改良的关系图注意力网络（RGAT）图编码器以及多头注意力机制对两个模块的联合语义模式链接图进行编码，并且使用基于语法的神经语义解码器和预定义的结构化语言进行结构化查询语言（SQL）语句解码。在Spider数据集上的实验结果表明，使用ELECTRA-large预训练模型时，SELSQL模型比最佳基线模型的准确率提升了2.5个百分点，对于复杂SQL语句生成的提升效果很大。

关键词: 模式链接, 图结构学习, 预训练语言模型, Text-to-SQL, 异构图

CLC Number:

TP183

Xianglan WU, Yang XIAO, Mengying LIU, Mingming LIU. Text-to-SQL model based on semantic enhanced schema linking[J]. Journal of Computer Applications, 2024, 44(9): 2689-2695.

吴相岚, 肖洋, 刘梦莹, 刘明铭. 基于语义增强模式链接的Text-to-SQL模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2689-2695.

Figures/Tables 7

References 33

1	BOGIN B， BERANT J， GARDNER M. Representing schema structure with graph neural networks for Text-to-SQL parsing ［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2019： 4560-4565.
2	SCHLICHTKRULL M， KIPF T N， BLOEM P， et al. Modeling relational data with graph convolutional networks ［C］// Proceedings of the 15th International Conference on Semantic Web. Berlin： Springer， 2018： 593-607.
3	BOGIN B， GARDNER M， BERANT J. Global reasoning over database structures for Text-to-SQL parsing ［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg： ACL， 2019： 3659-3664.
4	黄君扬，王振宇，梁家卿，等.基于自裁剪异构图的NL2SQL模型［J］.计算机工程，2022，48（9）：71-77.
	HUANG J Y， WANG Z Y， LIANG J Q， et al. NL2SQL model based on self-pruning heterogeneous graph ［J］. Computer Engineering， 2022，48（9）：71-77.
5	王秋月，程路易，徐波，等.基于知识增强的NL2SQL方法［J］.智能计算机与应用，2022，12（7）：1-7.
	WANG Q Y， CHENG L Y， XU B， et al. NL2SQL method based on knowledge enhancement ［J］. Intelligent Computer and Applications， 2022，12（7）：1-7.
6	WANG B， SHIN R， LIU X， et al. RAT-SQL： relation-aware schema encoding and linking for Text-to-SQL parsers ［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 7567-7578.
7	SHAW P， USZKOREIT J， VASWANI A. Self-attention with relative position representations ［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 2 （Short Papers）. Stroudsburg： ACL， 2018： 464-468.
8	CAO R， CHEN L， CHEN Z， et al. LGESQL： line graph enhanced Text-to-SQL model with mixed local and non-local relations ［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg： ACL， 2021： 2541-2555.
9	WANG K， SHEN W， YANG Y， et al. Relational graph attention network for aspect-based sentiment analysis ［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 3229-3238.
10	GAN Y， PURVER M， WOODWARD J R. A review of cross-domain Text-to-SQL models ［C］// Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing： Student Research Workshop. Stroudsburg： ACL， 2020： 108-115.
11	GUO J， ZHAN Z， GAO Y， et al. Towards complex Text-to-SQL in cross-domain database with intermediate representation ［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2019： 4524-4535.
12	LI S， HU X， LIN L， et al. A multi-level supervised contrastive learning framework for low-resource natural language inference［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2023， 31： 1771-1783.
13	XU P， KUMAR D， YANG W， et al. Optimizing deeper Transformers on small datasets ［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg： ACL， 2021： 2089-2102.
14	LIU A， HU X， LIN L， et al. Semantic enhanced Text-to-SQL parsing via iteratively learning schema linking graph ［C］// Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York： ACM， 2022： 1021-1030.
15	LIM D， HOHNE F， LI X， et al. Large scale learning on non-homophilous graphs： new benchmarks and strong simple methods ［C］// Proceedings of the 35th Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 20887-20902.
16	LIM D， LI X， HOHNE F， et al. New benchmarks for learning on non-homophilous graphs ［EB/OL］. ［2023-02-14］. .
17	BO D， WANG X， SHI C， et al. Beyond low-frequency information in graph convolutional networks ［C］// Proceedings of the 2021 AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021， 35（5）： 3950-3957.
18	YANG L， LI M， LIU L， et al. Diverse message passing for attribute with heterophily ［C］// Proceedings of the 35th Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 4751-4763.
19	WANG X， JI H， SHI C， et al. Heterogeneous graph attention network ［C］// Proceedings of the 2019 World Wide Web Conference. New York： ACM， 2019： 2022-2032.
20	ZHOU J， CUI G， HU S， et al. Graph neural networks： a review of methods and applications ［J］. AI Open， 2020， 1： 57-81.
21	ANASTASIU D C， KARYPIS G. L2Knng： fast exact k-nearest neighbor graph construction with l2-norm pruning ［C］// Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. New York： ACM， 2015： 791-800.
22	CHEN Y， WU L， ZAKI M J. Iterative deep graph learning for graph neural networks： better and robust node embeddings ［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 19314-19326.
23	FRANCESCHI L， NIEPERT M， PONTIL M， et al. Learning discrete structures for graph neural networks ［C］// Proceedings of the 36th International Conference on Machine Learning. New York： PMLR， 2019， 97： 1972-1982.
24	ZHAO J， WANG X， SHI C， et al. Heterogeneous graph structure learning for graph neural networks ［C］// Proceedings of the 2021 AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021， 35（5）： 4697-4705.
25	WU Z， CHEN Y， KAO B， et al. Perturbed masking： parameter-free probing for analyzing and interpreting BERT ［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 4166-4176.
26	DEVLIN J， CHANG M-W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
27	NICKEL M， KIELA D. Poincaré embeddings for learning hierarchical representations ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6341-6350.
28	CHAMI I， YING R， RE C， et al. Hyperbolic graph convolutional neural networks ［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2019： 4868-4879.
29	YIN P， NEUBIG G. A syntactic neural model for general-purpose code generation ［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2017： 440-450.
30	GAN Y， CHEN X， XIE J， et al. Natural SQL： making SQL easier to infer from natural language specifications ［C］// Proceedings of the 2021 Findings of the Association for Computational Linguistics. Stroudsburg： ACL， 2021： 2030-2042.
31	YU T， ZHANG R， YANG K， et al. Spider： a large-scale human-labeled dataset for complex and cross-domain semantic parsing and Text-to-SQL task ［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2018： 3911-3921.
32	RUBIN O， BERANT J. SmBoP： semi-autoregressive bottom-up semantic parsing ［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2021： 311-324.
33	LOSHCHILOV I， HUTTER F. Decoupled weight decay regularization ［EB/OL］. ［2023-10-11］. .

错误类型	错误占比/%	错误类型	错误占比/%
模式链接	44.7	condition	11.1
JOIN	23.3	嵌套	11.3
GROUG BY	9.6

错误类型	错误占比/%	错误类型	错误占比/%
模式链接	44.7	condition	11.1
JOIN	23.3	嵌套	11.3
GROUG BY	9.6

预训练模型	模型	准确率
BERT-large	RAT-SQL	69.7
	SMOP	—
	LGESQL	74.1
	ISESL	74.7
	SELSQL	77.1
Model Adaptive PLM	RAT-SQL+Grappa	73.4
	SMOP+Grappa	74.1
	LGESQL+ELECTRA-large	75.1
	ISESL+ELECTRA-large	75.8
	SELSQL+ELECTRA-large	78.3

预训练模型	模型	准确率
BERT-large	RAT-SQL	69.7
	SMOP	—
	LGESQL	74.1
	ISESL	74.7
	SELSQL	77.1
Model Adaptive PLM	RAT-SQL+Grappa	73.4
	SMOP+Grappa	74.1
	LGESQL+ELECTRA-large	75.1
	ISESL+ELECTRA-large	75.8
	SELSQL+ELECTRA-large	78.3

模型	准确率/%	下降百分点
SELSQL	78.3	—
移除初始图生成	76.5	1.8
移除相似度度量图	76.7	1.6
移除语义模式链接	75.5	2.7
移除庞加莱距离度量	77.5	0.8
移除NatSQL	77.5	0.8