Complex query-based question-answering model integrating bidirectional sequence embeddings

doi:10.11772/j.issn.1001-9081.2025040497

Abstract

Abstract:

Traditional Knowledge Graph （KG） embedding methods mainly focus on link prediction for simple triples， and their modeling paradigm of “head entity-relation-tail entity” have significant limitations in handling conjunctive queries containing multiple unknown variables. To address the above issues， a complex query-based question-answering model integrating Bidirectional Sequence Embedding （BSE） was proposed. Firstly， a query encoder was constructed on the basis of a bidirectional Transformer architecture to convert the query structure into a serialized representation. Secondly， positional encoding was utilized to preserve graph structure information. Thirdly， the deep semantic associations among all elements in the query graph were modeled dynamically through Additive Attention Mechanism （AAM）. Finally， global information interaction across nodes was realized， and the shortcomings of traditional methods in modeling long-distance dependencies were addressed effectively. Experiments were conducted on different benchmark datasets to verify the performance advantages of BSE model. The experimental results show that on the WN18RR-PATHS dataset， compared with GQE-DistMult-MP， BSE model achieves a 53.01% improvement in the Mean Reciprocal Rank （MRR） metric； on the EDUKG dataset， BSE model outperforms GQE-Bilinear with a 6.09% increase in the Area Under the Curve （AUC） metric. To sum up， the proposed model can be applied to query-based question-answering in different fields， and has high scalability and application value.

Key words: Knowledge Graph (KG), bidirectional sequence, semantic association, long-distance dependency, GQE-DistMult-MP

摘要：

传统知识图谱（KG）嵌入方法主要聚焦于简单三元组的链接预测，它的“头实体?关系?尾实体”的建模范式在处理包含多个未知变量的合取查询时存在显著局限性。针对上述问题，提出融合双向序列嵌入（BSE）的复杂查询问答模型。首先，基于双向Transformer架构构建查询编码器，将查询结构转换为序列化表示；其次，利用位置编码保留图结构信息；再次，通过加法注意力机制（AAM）动态建模查询图中所有元素的深层语义关联；最后，实现跨节点的全局信息交互，克服传统方法在长距离依赖建模方面的缺陷。在不同基准数据集上进行实验，验证BSE模型的性能优势。实验结果表明，在WN18RR-PATHS数据集上，与GQE-DistMult-MP相比，BSE模型的平均倒数排名（MRR）指标提高了53.01%；在EDUKG数据集上，与GQE-Bilinear相比，BSE模型的曲线下面积（AUC）指标提高了6.09%。综上所述，所提模型可用于不同领域的查询问答，并且具有较高扩展性与应用价值。

关键词: 知识图谱, 双向序列, 语义关联, 长距离依赖, GQE-DistMult-MP

CLC Number:

TP183

Hao LIANG, Shaojie QIAO. Complex query-based question-answering model integrating bidirectional sequence embeddings[J]. Journal of Computer Applications, 2026, 46(4): 1096-1103.

梁豪, 乔少杰. 融合双向序列嵌入的复杂查询问答模型[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1096-1103.

Figures/Tables 10

References 31

[1]	刘莎. 我国民航教育研究的现状、热点及趋势——基于CNKI的CiteSpace知识图谱分析［J］. 中国民航飞行学院学报， 2025， 36（2）： 20-24， 49.
	LIU S. Research on status， hot spots and trends of civil aviation education in China — analysis of CiteSpace knowledge graph based on CNKI［J］. Journal of Civil Aviation Flight University of China， 2025， 36（2）： 20-24， 49.
[2]	苏若涵，央青. 基于知识图谱的国际中文教育数字化研究现状与趋势［J］. 天津师范大学学报（社会科学版）， 2025（2）： 22-33.
	SU R H， YANG Q. The current status and trends of digitalization research in international Chinese language education based on knowledge graphs［J］. Journal of Tianjin Normal University （Social Sciences）， 2025（2）： 22-33.
[3]	BORDES A， USUNIER N， GARCIA-DURÁN A， et al. Translating embeddings for modeling multi-relational data［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems — Volume 2. Cambridge： MIT Press， 2013： 2787-2795.
[4]	YANG B， YIH W T， HE X， et al. Embedding entities and relations for learning and inference in knowledge bases［EB/OL］. ［2025-03-11］..
[5]	HAMILTON W L， BAJAJ P， ZITNIK M， et al. Embedding logical queries on knowledge graphs［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2018： 2030-2041.
[6]	REN H， HU W， LESKOVEC J. Query2box： reasoning over knowledge graphs in vector space using box embeddings［EB/OL］. ［2025-03-11］..
[7]	NICKEL M， MURPHY K， TRESP V， et al. A review of relational machine learning for knowledge graphs［J］. Proceedings of the IEEE， 2016， 104（1）： 11-33.
[8]	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
[9]	BALAŽEVIĆ I， ALLEN C， HOSPEDALES T M. TuckER： tensor factorization for knowledge graph completion［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg： ACL， 2019： 5185-5194.
[10]	ZHANG S， TAY Y， YAO L， et al. Quaternion knowledge graph embeddings［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2019： 2735-2745.
[11]	WANG M， SHEN H， WANG S， et al. Learning to hash for efficient search over incomplete knowledge graphs［C］// Proceedings of the 2019 IEEE International Conference on Data Mining. Piscataway： IEEE， 2019： 1360-1365.
[12]	ABBOUD R， CEYLAN I I， LUKASIEWICZ T， et al. BoxE： a box embedding model for knowledge base completion［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 9649-9661.
[13]	LUO Y， WANG Q， WANG B， et al. Context-dependent knowledge graph embedding［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2015： 1656-1661.
[14]	DAS R， DHULIAWALA S， ZAHEER M， et al. Go for a walk and arrive at the answer： reasoning over paths in knowledge bases using reinforcement learning［EB/OL］. ［2025-03-11］..
[15]	VAKULENKO S， FERNANDEZ GARCIA J D， POLLERES A， et al. Message passing for complex question answering over knowledge graphs［C］// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York： ACM， 2019： 1431-1440.
[16]	BANSAL T， JUAN D C， RAVI S， et al. A2N： attending to neighbors for knowledge graph inference［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2019： 4387-4392.
[17]	CAI L， YAN B， MAI G， et al. TransGCN： coupling transformation assumptions with graph convolutional networks for link prediction［C］// Proceedings of the 10th International Conference on Knowledge Capture. New York： ACM， 2019： 131-138.
[18]	MAI G， JANOWICZ K， YAN B， et al. Contextual graph attention for answering logical queries over incomplete knowledge graphs［C］// Proceedings of the 10th International Conference on Knowledge Capture. New York： ACM， 2019： 171-178.
[19]	ARAKELYAN E， DAZA D， MINERVINI P， et al. Complex query answering with neural link predictors［EB/OL］. ［2025-03-11］..
[20]	FAN A， GARDENT C， BRAUD C， et al. Using local knowledge graph construction to scale Seq2Seq models to multi-document inputs［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg： ACL， 2019： 4186-4196.
[21]	PETRONI F， ROCKTÄSCHEL T， LEWIS P， et al. Language models as knowledge bases？［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg： ACL， 2019： 2463-2473.
[22]	YAO L， MAO C， LUO Y. KG-BERT： BERT for knowledge graph completion［EB/OL］. ［2025-03-11］..
[23]	WANG Q， HUANG P， WANG H， et al. CoKE contextualized knowledge graph embedding［EB/OL］. ［2025-03-11］..
[24]	KOTNIS B， LAWRENCE C， NIEPERT M. Answering complex queries in knowledge graphs with bidirectional sequence encoders［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 4968-4977.
[25]	GU K， MILLER J， LIANG P. Traversing knowledge graphs in vector space［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Computer Science. Stroudsburg： ACL， 2015： 318-327.
[26]	SCHLICHTKRULL M， KIPF T N， BLOEM P， et al. Modeling relational data with graph convolutional networks［C］// Proceedings of the 2018 European Semantic Web Conference， LNCS 10843. Cham： Springer， 2018： 593-607.
[27]	FRIEDMAN T， VAN DEN BROECK G. Symbolic querying of vector spaces： probabilistic databases meets relational embeddings［C］// Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence. New York： JMLR.org， 2020： 1268-1277.
[28]	ZHANG S， WU Y， ZHANG X， et al. Relation-aware heterogeneous graph network for learning intermodal semantics in textbook question answering［J］. IEEE Transactions on Neural Networks and Learning System， 2024， 35（9）： 11872-11883.
[29]	LIU L， WANG Z， QIU R， et al. Logic query of thoughts： guiding large language models to answer complex logic queries with knowledge graphs［EB/OL］. ［2025-03-11］..
[30]	ZHAO B， SUN J， XU B， et al. EDUKG： a heterogeneous sustainable K-12 educational knowledge graph［EB/OL］. ［2025-03-11］..
[31]	DAZA D， COCHEZ M. Message passing for query answering over knowledge graphs［EB/OL］. ［2025-03-11］..

数据集		三元组数	路径数	DAG数	平均掩码数	平均长度
FB15K-237- CQ	训练集	272 115	50 000	48 865	1.86	152
	验证集	—	—	2 785	5.91	460
	测试集	—	—	2 599	6.05	479
WN18RR- CQ	训练集	86 835	10 000	9 465	1.84	71
	验证集	—	—	112	5.13	198
	测试集	—	—	95	4.91	199

数据集		三元组数	路径数	DAG数	平均掩码数	平均长度
FB15K-237- CQ	训练集	272 115	50 000	48 865	1.86	152
	验证集	—	—	2 785	5.91	460
	测试集	—	—	2 599	6.05	479
WN18RR- CQ	训练集	86 835	10 000	9 465	1.84	71
	验证集	—	—	112	5.13	198
	测试集	—	—	95	4.91	199

数据集	算法	1p	2p	3p	2i	3i	ip	pi	平均值
FB15K-237	GQE^［24］	0.402	0.213	0.155	0.292	0.406	0.083	0.170	0.246
	GQE-Double^［24］	0.405	0.213	0.153	0.298	0.411	0.085	0.182	0.249
	Q2B^［24］	0.467	0.240	0.186	0.324	0.453	0.108	0.205	0.283
	AnyCQ^［24］	0.450	0.270	0.220	0.340	0.460	0.100	0.190	0.290
	LGOT^［24］	0.430	0.260	0.230	0.330	0.480	0.105	0.195	0.290
	BiQE^［24］	0.439	0.281	0.239	0.333	0.474	0.110	0.177	0.293
	BSE	0.442	0.292	0.248	0.351	0.492	0.124	0.191	0.306
NELL-995	GQE^［24］	0.418	0.228	0.205	0.316	0.447	0.081	0.186	0.269
	GQE-Double^［24］	0.417	0.231	0.203	0.318	0.454	0.081	0.188	0.270
	Q2B^［24］	0.555	0.266	0.233	0.343	0.480	0.132	0.212	0.317
	AnyCQ^［24］	0.590	0.290	0.310	0.360	0.520	0.110	0.200	0.340
	LGOT^［24］	0.580	0.300	0.320	0.370	0.540	0.115	0.205	0.348
	BiQE^［24］	0.587	0.305	0.326	0.371	0.531	0.103	0.187	0.344
	BSE	0.595	0.318	0.340	0.385	0.551	0.117	0.203	0.358

数据集	算法	1p	2p	3p	2i	3i	ip	pi	平均值
FB15K-237	GQE^［24］	0.402	0.213	0.155	0.292	0.406	0.083	0.170	0.246
	GQE-Double^［24］	0.405	0.213	0.153	0.298	0.411	0.085	0.182	0.249
	Q2B^［24］	0.467	0.240	0.186	0.324	0.453	0.108	0.205	0.283
	AnyCQ^［24］	0.450	0.270	0.220	0.340	0.460	0.100	0.190	0.290
	LGOT^［24］	0.430	0.260	0.230	0.330	0.480	0.105	0.195	0.290
	BiQE^［24］	0.439	0.281	0.239	0.333	0.474	0.110	0.177	0.293
	BSE	0.442	0.292	0.248	0.351	0.492	0.124	0.191	0.306
NELL-995	GQE^［24］	0.418	0.228	0.205	0.316	0.447	0.081	0.186	0.269
	GQE-Double^［24］	0.417	0.231	0.203	0.318	0.454	0.081	0.188	0.270
	Q2B^［24］	0.555	0.266	0.233	0.343	0.480	0.132	0.212	0.317
	AnyCQ^［24］	0.590	0.290	0.310	0.360	0.520	0.110	0.200	0.340
	LGOT^［24］	0.580	0.300	0.320	0.370	0.540	0.115	0.205	0.348
	BiQE^［24］	0.587	0.305	0.326	0.371	0.531	0.103	0.187	0.344
	BSE	0.595	0.318	0.340	0.385	0.551	0.117	0.203	0.358

算法	FB15K-237-CQ		FB15K-237-PATHS		WN18RR-CQ		WN18RR-PATHS
算法	MRR	HITS@10	MRR	HITS@10	MRR	HITS@10	MRR	HITS@10
GQE-DistMult-MP^［24］	0.157	0.269	0.241	0.376	0.149	0.148	0.349	0.400
BiQE^［24］	0.228	0.372	0.473	0.602	0.150	0.158	0.520	0.620
BSE	0.241	0.391	0.489	0.635	0.162	0.171	0.534	0.647