基于大语言模型重构案件信息的类案检索方法

doi:10.11772/j.issn.1001-9081.2025050662

《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (6): 1785-1792.DOI: 10.11772/j.issn.1001-9081.2025050662

基于大语言模型重构案件信息的类案检索方法

王劲滔, 高志霖, 孟琪翔, 卜凡亮()

中国人民公安大学信息网络安全学院，北京 100038

收稿日期:2025-06-23 修回日期:2025-09-07 接受日期:2025-09-09 发布日期:2025-09-15 出版日期:2026-06-10
通讯作者: 卜凡亮
作者简介:王劲滔（2000—），男，江苏扬州人，硕士研究生，CCF会员，主要研究方向：自然语言处理、信息检索
高志霖（2000—），男，江苏徐州人，硕士研究生，主要研究方向：信息提取、深度学习
孟琪翔（1999—），男，江苏徐州人，硕士研究生，主要研究方向：信息提取、深度学习
第一联系人：卜凡亮（1965—），男，江苏徐州人，教授，博士，主要研究方向：安全防范工程、深度学习。
基金资助:
中国人民公安大学安全防范工程双一流专项(2023SYL08)

Legal case retrieval method via case information reformulation using large language model

Jintao WANG, Zhilin GAO, Qixiang MENG, Fanliang BU()

School of Information and Cyber Security，People’s Public Security University of China，Beijing 100038，China

Received:2025-06-23 Revised:2025-09-07 Accepted:2025-09-09 Online:2025-09-15 Published:2026-06-10
Contact: Fanliang BU
About author:WANG Jintao， born in 2000， M. S. candidate. His research interests include natural language processing， information retrieval.
GAO Zhilin， born in 2000， M. S. candidate. His research interests include information extraction， deep learning.
MENG Qixiang， born in 1999， M. S. candidate. His research interests include information extraction， deep learning.
First author contact:BU Fanliang， born in 1965， Ph. D.， professor. His research interests include security and prevention engineering， deep learning.
Supported by:
Double First-Class Innovation Research Project for Security and Protection Engineering of People’s Public Security University of China(2023SYL08)

摘要/Abstract

摘要：

随着智慧司法建设的推进，类案检索技术因为在保障司法公正性与效率性中的关键作用备受关注。然而，现有文本检索方法仍面临以下挑战：传统模型易受语义结构相似性干扰，难以精准捕捉影响判决的要素；预训练语言模型受限于输入长度，对冗长法律文本的全局语义建模不足；现有的聚合相似度评分机制易受噪声干扰，可解释性不强。针对上述问题，提出一种基于大语言模型（LLM）重构案件信息的类案检索方法。首先，利用LLM对案件文本进行信息抽取，以将案件要素、罪行适用法条描述与案件行为链组合成案件子事实，从而减少信息冗余；其次，在编码部分，设计SFA-SAILER （Selective Feature Attention & Structure-Aware pre-traIned language model for LEgal case Retrieval）编码架构；再次，通过在词与特征两个不同维度对案件信息进行深度编码，增强案件信息与编码维度间的依赖关系；最后，使用MaxSim操作符聚合相似度分数。实验结果表明，所提模型在LeCaRD （Legal Case Retrieval Dataset）上的平均精确率均值（mAP）与前3个结果的精确率（P@3）指标分别达到了67.45%和60.95%，而前K个结果的归一化折损累计增益（NDCG@K）指标也均高于对比模型。可见，所提模型可为类案检索提供兼顾法律逻辑与深度语义理解的新思路，在司法智能化应用中具有实践价值。

关键词: 大语言模型, 类案检索, 智慧司法, 文本匹配

Abstract:

With the advancement of intelligent judiciary construction， legal case retrieval technology has garnered significant attention due to its crucial role in ensuring judicial fairness and efficiency. However， the existing text retrieval methods still face the following challenges： the traditional models are susceptible to interference from semantic structural similarities， making it difficult to capture elements that influence judgments accurately； the pre-trained language models are constrained by input length， leading to insufficient global semantic modeling of lengthy legal texts； and the existing aggregated similarity scoring mechanisms are prone to noise interference and lack strong interpretability. To address these challenges， a legal case retrieval method via case information reformulation using large language model （LLM） was proposed. Firstly， LLM was employed to extract information from case texts， so as to combine case elements， descriptions of applicable legal provisions for crimes， and case behavior chains into sub-facts of cases， thereby reducing information redundancy. Secondly， in the encoding part， an SFA-SAILER （Selective Feature Attention & Structure-Aware pre-traIned language model for LEgal case Retrieval） encoding architecture was designed. Thirdly， by encoding case information at two different dimensions deeply — word and feature， the dependency between case information and encoding dimensions was enhanced. Finally， the MaxSim operator was used to aggregate similarity scores. Experimental results show that on the LeCaRD （Legal Case Retrieval Dataset）， the proposed model achieves the mean Average Precision （mAP） and Top-3 Precision （P@3） of 67.45% and 60.95%， respectively， and has the Top-K Normalized Discounted Cumulative Gain （NDCG@K） higher than those of comparison models. It can be seen that the proposed model offers a new idea that integrates legal logic with deep semantic understanding for legal case retrieval， and has practical value for intelligent judiciary applications.

Key words: Large Language Model (LLM), legal case retrieval, intelligent judiciary, text matching

中图分类号:

TP391.1

王劲滔, 高志霖, 孟琪翔, 卜凡亮. 基于大语言模型重构案件信息的类案检索方法[J]. 计算机应用, 2026, 46(6): 1785-1792.

Jintao WANG, Zhilin GAO, Qixiang MENG, Fanliang BU. Legal case retrieval method via case information reformulation using large language model[J]. Journal of Computer Applications, 2026, 46(6): 1785-1792.

图/表 8

图1 类案检索模型的整体架构

Fig. 1 Overall architecture of legal case retrieval model

表1 案件要素示例

Tab.1 Examples of case elements

案件要素	示例
犯罪人物类型	未成年人、精神病人等
罪名	抢劫罪、交通肇事罪等
犯罪行为	伤害、殴打、贩卖毒品等
涉案物品	甲基苯丙胺、机动车辆等
量刑情节	悔罪、立功、认罪等
和解情况	达成和解协议等
犯罪后果	死亡、轻伤、造成损失等

图2 重构案件的信息组成部分

Fig.2 Components of reformulated case information

表2 案件子事实的表示示例

Tab.2 Example representations of sub-facts of cases

案件

子事实

案件要素

适用法律条款罪状描述

案件行为链

F 1

嫌疑人职业：无固定职业，

茶楼经营者

人物类型：黑社会性质组织者

涉案物品：水果刀……

非法拘禁罪：规定非法拘禁或剥夺他人

人身自由的行为将受法律制裁，包括有期

徒刑、拘役等。如有殴打、侮辱情节，将加重

处罚。致人重伤或死亡者，刑罚更重……

离婚财产纠纷 → 房某1指使宋x波等人将宋某1

诱骗后挟持至临沂民房，捆绑、殴打，逼迫签订离婚

协议及房产过户申请 → 宋某1被拘禁10天，致锁骨

骨折、多处擦伤（轻伤二级、轻微伤） → 触犯《刑法》

第238条 → 构成非法拘禁罪

F 2

嫌疑人职业：采矿场工人

人物类型：黑社会组织成员

涉案物品：和田玉、烟灰缸……

开设赌场罪：组织多人赌博、长期经营赌场、通过网络建立赌博平台或担任代理接受

投注、参与赌网利润分成等行为……

为牟取非法利益 → 王x龙、王x等人在xx县xx村

开设赌场，由王x找场地和车辆，组织赌博，朱x峰

放高利贷，宋x波、胡x桥看场子，刘某1记账，多次

组织张某2、姚某1等人赌博 → 非法获利14万

余元 → 构成开设赌场罪……

F 3

嫌疑人职业：无固定职业

人物类型：黑社会性质组织者

涉案物品：房屋买卖合同、

收款收据……

寻衅滋事罪：寻衅滋事罪指的是随意殴打

他人，情节恶劣的、追逐、拦截、辱骂、恐吓

他人，情节恶劣的、强拿硬要或者任意损毁、

占用公私财物，情节严重的、在公共场所

起哄闹事，造成公共场所秩序严重混乱的，

破坏社会秩序的犯罪……

王x为强行承包沙场，安排宋x波等人站场 → 双方

发生冲突，宋x波捅伤多人，王x龙带人追打 → 虽

故意伤害行为由宋卫x实施，但王x指使站场、引发

事端，构成寻衅滋事罪 → 触犯《刑法》第293条 →

构成寻衅滋事罪……

表2 案件子事实的表示示例

Tab.2 Example representations of sub-facts of cases

案件

子事实

案件要素

适用法律条款罪状描述

案件行为链

F 1

嫌疑人职业：无固定职业，

茶楼经营者

人物类型：黑社会性质组织者

涉案物品：水果刀……

非法拘禁罪：规定非法拘禁或剥夺他人

人身自由的行为将受法律制裁，包括有期

徒刑、拘役等。如有殴打、侮辱情节，将加重

处罚。致人重伤或死亡者，刑罚更重……

离婚财产纠纷 → 房某1指使宋x波等人将宋某1

诱骗后挟持至临沂民房，捆绑、殴打，逼迫签订离婚

协议及房产过户申请 → 宋某1被拘禁10天，致锁骨

骨折、多处擦伤（轻伤二级、轻微伤） → 触犯《刑法》

第238条 → 构成非法拘禁罪

F 2

嫌疑人职业：采矿场工人

人物类型：黑社会组织成员

涉案物品：和田玉、烟灰缸……

开设赌场罪：组织多人赌博、长期经营赌场、通过网络建立赌博平台或担任代理接受

投注、参与赌网利润分成等行为……

为牟取非法利益 → 王x龙、王x等人在xx县xx村

开设赌场，由王x找场地和车辆，组织赌博，朱x峰

放高利贷，宋x波、胡x桥看场子，刘某1记账，多次

组织张某2、姚某1等人赌博 → 非法获利14万

余元 → 构成开设赌场罪……

F 3

嫌疑人职业：无固定职业

人物类型：黑社会性质组织者

涉案物品：房屋买卖合同、

收款收据……

寻衅滋事罪：寻衅滋事罪指的是随意殴打

他人，情节恶劣的、追逐、拦截、辱骂、恐吓

他人，情节恶劣的、强拿硬要或者任意损毁、

占用公私财物，情节严重的、在公共场所

起哄闹事，造成公共场所秩序严重混乱的，

破坏社会秩序的犯罪……

王x为强行承包沙场，安排宋x波等人站场 → 双方

发生冲突，宋x波捅伤多人，王x龙带人追打 → 虽

故意伤害行为由宋卫x实施，但王x指使站场、引发

事端，构成寻衅滋事罪 → 触犯《刑法》第293条 →

构成寻衅滋事罪……

图3 SFA机制

Fig.3 SFA mechanism

表3 LeCaRD数据集的标注说明

Tab.3 Annotation instructions for LeCaRD dataset

要件事实	案情事实	相似度
相似	相似	3
相似	不相似	2
不相似	相似	1
不相似	不相似	0

表4 不同模型的测评结果 (%)

Tab.4 Evaluation results of different models

模型	MAP	P@3	NDCG@3	NDCG@5	NDCG@10
TF-IDF	45.05	35.36	63.77	64.70	66.81
BM-25	48.19	41.77	64.76	65.94	68.79
BERT	48.83	41.11	68.35	68.97	72.42
RoBERTa	53.83	47.62	74.40	74.33	76.70
coCondenser	52.13	47.11	67.22	66.86	69.21
COT-MAE	56.35	49.42	69.45	67.13	70.72
RetroMAE	55.76	49.97	68.01	67.20	68.83
SAILER	55.92	51.67	78.97	79.33	80.16
Lawformer	54.58	50.79	73.19	73.43	75.54
BERT-PLI（BERT）	47.91	39.99	63.22	68.56	72.24
PromptCase	64.92	55.45	78.15	78.30	80.23
KELLER	64.77	55.87	79.79	81.62	84.34
本文模型	67.45	60.95	83.01	83.44	85.37

表5 消融实验结果 (%)

Tab.5 Ablation experimental results

消除模块	MAP	P@3	NDCG@3	NDCG@5	NDCG@10
案件行为链	61.91	54.28	79.31	80.37	81.92
法律条款罪状描述	65.28	56.82	80.32	81.86	84.66
案件要素	66.44	59.68	81.95	82.21	84.04
SFA机制	64.89	55.55	79.12	80.71	83.82

参考文献 30

[1]	谢永峰，尹华，乔丹. 类案检索技术研究综述［J］. 软件导刊， 2024， 23（6）： 198-204.
	XIE Y F， YIN H， QIAO D. A survey on law case retrieval technology［J］. Software Guide， 2024， 23（6）： 198-204.
[2]	LI H， AI Q， CHEN J， et al. SAILER： structure-aware pre-trained language model for legal case retrieval［C］// Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2023： 1035-1044.
[3]	DENG C， MAO K， DOU Z. Learning interpretable legal case retrieval via knowledge-guided case reformulation［C］// Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2024： 1253-1265.
[4]	TANG Y， QIU R， LI X. Prompt-based effective input reformulation for legal case retrieval［C］// Proceedings of the 2023 Australasian Database Conference， LNCS 14386. Cham： Springer， 2024： 87-100.
[5]	李林睿，王东升，范红杰. 基于法条知识的事理型类案检索方法［J］.浙江大学学报（工学版）， 2024， 58（7）： 1357-1365.
	LI L R， WANG D S， FAN H J. Fact-based similar case retrieval methods based on statutory knowledge［J］. Journal of Zhejiang University （Engineering Science）， 2024， 58（7）： 1357-1365.
[6]	XIAO C， HU X， LIU Z， et al. Lawformer： a pre-trained language model for Chinese legal long documents［J］. AI Open， 2021， 2： 79-84.
[7]	VAN OPIJNEN M， SANTOS C. On the concept of relevance in legal information retrieval［J］. Artificial Intelligence and Law， 2017， 25（1）： 65-87.
[8]	KHATTAB O， ZAHARIA M. ColBERT： efficient and effective passage search via contextualized late interaction over BERT［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 39-48.
[9]	MA Y， SHAO Y， WU Y， et al. LeCaRD： a legal case retrieval dataset for Chinese law system［C］// Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2021： 2342-2348.
[10]	SALTON G， BUCKLEY C. Term-weighting approaches in automatic text retrieval［J］. Information Processing and Management， 1988， 24（5）： 513-523.
[11]	ROBERTSON S， ZARAGOZA H. The probabilistic relevance framework： BM25 and beyond［J］. Foundations and Trends^® in Information Retrieval， 2009， 3（4）： 333-389.
[12]	PONTE J M， CROFT W B. A language modeling approach to information retrieval［C］// Proceeding of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 1998： 275-281.
[13]	BLEI D M， NG A Y， JORDAN M I. Latent Dirichlet allocation［J］. Journal of Machine Learning Research， 2003， 3： 993-1022.
[14]	詹力林，秦永彬，黄瑞章，等. 融合时序行为链与事件类型的类案检索方法［J］. 计算机应用， 2025， 45（6）： 1741-1747.
	ZHAN L L， QIN Y B， HUANG R Z， et al. Legal case retrieval method integrating temporal behavior chain and event type［J］. Journal of Computer Applications， 2025， 45（6）： 1741-1747.
[15]	TRAN V， NGUYEN M L， SATOH K. Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model［C］// Proceeding of the 17th International Conference on Artificial Intelligence and Law. New York： ACM， 2019： 275-282.
[16]	ASKARI A， VERBERNE S. Combining lexical and neural retrieval with Longformer-based summarization for effective case law retrieval［C］// Proceeding of the 2nd International Conference on Design of Experimental Search and Information Retrieval Systems. Aachen： CEUR-WS.org， 2021： 162-170.
[17]	YU W， SUN Z， XU J， et al. Explainable legal case matching via inverse optimal transport-based rationale extraction［C］// Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2022： 657-668.
[18]	SHAO Y， MAO J， LIU Y， et al. BERT-PLI： modeling paragraph-level interactions for legal case retrieval［C］// Proceeding of the 29th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2020： 3501-3507.
[19]	ALTHAMMER S， HOFSTÄTTER S， HANBURY A. Cross-domain retrieval in the legal and patent domains： a reproducibility study［C］// Proceeding of the 2021 European Conference on Information Retrieval， LNCS 12657. Cham： Springer， 2021： 3-17.
[20]	HU W， ZHAO S， ZHAO Q， et al. BERT_LF： a similar case retrieval method based on legal facts［J］. Wireless Communications and Mobile Computing， 2022， 2022： No.2511147.
[21]	ZANG J， LIU H. Modeling selective feature attention for lightweight text matching［C］// Proceedings of the 33rd International Joint Conference on Artificial Intelligence. California： ijcai.org， 2024： 6624-6632.
[22]	曹发鑫，孙媛媛，王治政，等. 面向借贷案件的相似案例匹配模型［J］. 计算工程， 2024， 50（1）： 306-312.
	CAO F X， SUN Y Y， WANG Z Z， et al. Similar case matching model for lending cases［J］. Computer Engineering， 2024， 50（1）： 306-312.
[23]	刘权，余正涛，高盛祥，等. 融合案件要素的相似案例匹配［J］. 中文信息学报， 2022， 36（11）： 140-147.
	LIU Q， YU Z T， GAO S X， et al. Incorporating case elements for case matching［J］. Journal of Chinese Information Processing， 2022， 36（11）： 140-147.
[24]	刘博阳，李尚，叶麟，等. 基于法律要素引导的相似案例推荐算法［J］. 智能计算机与应用， 2021， 11（6）： 1-4， 13.
	LIU B Y， LI S， YE L， et al. Similar case recommendation algorithm based on legal elements［J］. Intelligent Computer and Applications， 2021， 11（6）： 1-4， 13.
[25]	LYU Y， WANG Z， REN Z， et al. Improving legal judgment prediction through reinforced criminal element extraction［J］. Information Processing and Management， 2022， 59（1）： No.102780.
[26]	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding［C］// Proceeding of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
[27]	LIU Y， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach［EB/OL］. ［2024-03-25］..
[28]	GAO L， CALLAN J. Unsupervised corpus aware language model pre-training for dense passage retrieval［C］// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2022： 2843-2853.
[29]	XIAO S， LIU Z， SHAO Y， et al. RetroMAE： pre-training retrieval-oriented language models via masked auto-encoder［C］// Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2022： 538-548.
[30]	WU X， MA G Y， LIN M， et al. ConTextual masked auto-encoder for dense passage retrieval［C］// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2023： 4738-4746.

基于大语言模型重构案件信息的类案检索方法

Legal case retrieval method via case information reformulation using large language model

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 30

相关文章 15

编辑推荐

Metrics

[1]	熊龙雨, 杜圣东, 史浩琛, 胡节, 杨燕, 李天瑞. 基于知识增强大语言模型架构的政务热线问答系统[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1721-1727.
[2]	易宇声, 黄兆豪, 邓梓昊, 孔蕾蕾, 齐浩亮. 面向信创数据库迁移的多知识库协同大语言模型提示框架CORER[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1811-1817.
[3]	蔡泰鑫, 魏凤凤. 面向多解旅行商问题的大语言模型增强蚁群优化算法[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1712-1720.
[4]	王倩飞, 李旸, 李德玉, 王素格. 基于大语言模型的双通道特征融合表示的短文本聚类方法[J]. 《计算机应用》唯一官方网站, 2026, 46(5): 1441-1449.
[5]	盛兴, 翁孙贤, 陈扩松, 王忠平, 任芮锋, 刘勇. 基于深度学习的电网企业专利价值评估[J]. 《计算机应用》唯一官方网站, 2026, 46(5): 1468-1474.
[6]	郑嘉丽, 周刚, 陈静, 李顺航. 基于多特征自适应融合的智能生成文本检测方法[J]. 《计算机应用》唯一官方网站, 2026, 46(5): 1433-1440.
[7]	陈浩轩, 叶培昌, 刘磊, 刘承明, 胡文华. 自动代码编辑推荐综述[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1227-1237.
[8]	王晓宇, 李欣, 薛迪, 蒋章涛, 王威, 肖岩军. 基于大语言模型的视频监控网络安全漏洞分类框架[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1158-1170.
[9]	师凯洲, 何旋, 候国义, 李根, 李泷杲, 黄翔. 基于大语言模型的机载产品计量溯源知识图谱构建方法[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1086-1095.
[10]	张昊洋, 张丽萍, 闫盛, 李娜, 张学飞. 面向知识图谱补全的大模型方法综述[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 683-695.
[11]	沈斌, 陈晓宁, 程华, 房一泉, 王慧锋. 基于大语言模型的本科教学评估智能系统[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 993-1003.
[12]	郗恩康, 范菁, 金亚东, 董华, 俞浩, 孙伊航. 联邦学习在隐私安全领域面临的威胁综述[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 798-808.
[13]	黄奕明, 邹喜华, 邓果, 郑狄. 预回答与召回过滤：双阶段RAG问答系统优化方法[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 696-707.
[14]	王日龙, 李振平, 李晓松, 高强, 何亚, 钟勇, 赵英潇. 多Agent协作的知识推理框架[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 708-714.
[15]	吴定佳, 崔喆. 增强模式链接与多生成器协同的SQL生成框架MG-SQL[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 723-731.