Legal case retrieval method via case information reformulation using large language model

doi:10.11772/j.issn.1001-9081.2025050662

Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (6): 1785-1792.DOI: 10.11772/j.issn.1001-9081.2025050662

• Artificial intelligence • Previous Articles

Legal case retrieval method via case information reformulation using large language model

Jintao WANG, Zhilin GAO, Qixiang MENG, Fanliang BU()

School of Information and Cyber Security，People’s Public Security University of China，Beijing 100038，China

Received:2025-06-23 Revised:2025-09-07 Accepted:2025-09-09 Online:2025-09-15 Published:2026-06-10
Contact: Fanliang BU
About author:WANG Jintao， born in 2000， M. S. candidate. His research interests include natural language processing， information retrieval.
GAO Zhilin， born in 2000， M. S. candidate. His research interests include information extraction， deep learning.
MENG Qixiang， born in 1999， M. S. candidate. His research interests include information extraction， deep learning.
First author contact:BU Fanliang， born in 1965， Ph. D.， professor. His research interests include security and prevention engineering， deep learning.
Supported by:
Double First-Class Innovation Research Project for Security and Protection Engineering of People’s Public Security University of China(2023SYL08)

基于大语言模型重构案件信息的类案检索方法

王劲滔, 高志霖, 孟琪翔, 卜凡亮()

中国人民公安大学信息网络安全学院，北京 100038

通讯作者: 卜凡亮
作者简介:王劲滔（2000—），男，江苏扬州人，硕士研究生，CCF会员，主要研究方向：自然语言处理、信息检索
高志霖（2000—），男，江苏徐州人，硕士研究生，主要研究方向：信息提取、深度学习
孟琪翔（1999—），男，江苏徐州人，硕士研究生，主要研究方向：信息提取、深度学习
第一联系人：卜凡亮（1965—），男，江苏徐州人，教授，博士，主要研究方向：安全防范工程、深度学习。
基金资助:
中国人民公安大学安全防范工程双一流专项(2023SYL08)

Abstract

Abstract:

With the advancement of intelligent judiciary construction， legal case retrieval technology has garnered significant attention due to its crucial role in ensuring judicial fairness and efficiency. However， the existing text retrieval methods still face the following challenges： the traditional models are susceptible to interference from semantic structural similarities， making it difficult to capture elements that influence judgments accurately； the pre-trained language models are constrained by input length， leading to insufficient global semantic modeling of lengthy legal texts； and the existing aggregated similarity scoring mechanisms are prone to noise interference and lack strong interpretability. To address these challenges， a legal case retrieval method via case information reformulation using large language model （LLM） was proposed. Firstly， LLM was employed to extract information from case texts， so as to combine case elements， descriptions of applicable legal provisions for crimes， and case behavior chains into sub-facts of cases， thereby reducing information redundancy. Secondly， in the encoding part， an SFA-SAILER （Selective Feature Attention & Structure-Aware pre-traIned language model for LEgal case Retrieval） encoding architecture was designed. Thirdly， by encoding case information at two different dimensions deeply — word and feature， the dependency between case information and encoding dimensions was enhanced. Finally， the MaxSim operator was used to aggregate similarity scores. Experimental results show that on the LeCaRD （Legal Case Retrieval Dataset）， the proposed model achieves the mean Average Precision （mAP） and Top-3 Precision （P@3） of 67.45% and 60.95%， respectively， and has the Top-K Normalized Discounted Cumulative Gain （NDCG@K） higher than those of comparison models. It can be seen that the proposed model offers a new idea that integrates legal logic with deep semantic understanding for legal case retrieval， and has practical value for intelligent judiciary applications.

Key words: Large Language Model (LLM), legal case retrieval, intelligent judiciary, text matching

摘要：

随着智慧司法建设的推进，类案检索技术因为在保障司法公正性与效率性中的关键作用备受关注。然而，现有文本检索方法仍面临以下挑战：传统模型易受语义结构相似性干扰，难以精准捕捉影响判决的要素；预训练语言模型受限于输入长度，对冗长法律文本的全局语义建模不足；现有的聚合相似度评分机制易受噪声干扰，可解释性不强。针对上述问题，提出一种基于大语言模型（LLM）重构案件信息的类案检索方法。首先，利用LLM对案件文本进行信息抽取，以将案件要素、罪行适用法条描述与案件行为链组合成案件子事实，从而减少信息冗余；其次，在编码部分，设计SFA-SAILER （Selective Feature Attention & Structure-Aware pre-traIned language model for LEgal case Retrieval）编码架构；再次，通过在词与特征两个不同维度对案件信息进行深度编码，增强案件信息与编码维度间的依赖关系；最后，使用MaxSim操作符聚合相似度分数。实验结果表明，所提模型在LeCaRD （Legal Case Retrieval Dataset）上的平均精确率均值（mAP）与前3个结果的精确率（P@3）指标分别达到了67.45%和60.95%，而前K个结果的归一化折损累计增益（NDCG@K）指标也均高于对比模型。可见，所提模型可为类案检索提供兼顾法律逻辑与深度语义理解的新思路，在司法智能化应用中具有实践价值。

关键词: 大语言模型, 类案检索, 智慧司法, 文本匹配

CLC Number:

TP391.1

Jintao WANG, Zhilin GAO, Qixiang MENG, Fanliang BU. Legal case retrieval method via case information reformulation using large language model[J]. Journal of Computer Applications, 2026, 46(6): 1785-1792.

王劲滔, 高志霖, 孟琪翔, 卜凡亮. 基于大语言模型重构案件信息的类案检索方法[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1785-1792.

Figures/Tables 8

Fig. 1 Overall architecture of legal case retrieval model

Tab.1 Examples of case elements

案件要素	示例
犯罪人物类型	未成年人、精神病人等
罪名	抢劫罪、交通肇事罪等
犯罪行为	伤害、殴打、贩卖毒品等
涉案物品	甲基苯丙胺、机动车辆等
量刑情节	悔罪、立功、认罪等
和解情况	达成和解协议等
犯罪后果	死亡、轻伤、造成损失等

Fig.2 Components of reformulated case information

Tab.2 Example representations of sub-facts of cases

案件

子事实

案件要素

适用法律条款罪状描述

案件行为链

F 1

嫌疑人职业：无固定职业，

茶楼经营者

人物类型：黑社会性质组织者

涉案物品：水果刀……

非法拘禁罪：规定非法拘禁或剥夺他人

人身自由的行为将受法律制裁，包括有期

徒刑、拘役等。如有殴打、侮辱情节，将加重

处罚。致人重伤或死亡者，刑罚更重……

离婚财产纠纷 → 房某1指使宋x波等人将宋某1

诱骗后挟持至临沂民房，捆绑、殴打，逼迫签订离婚

协议及房产过户申请 → 宋某1被拘禁10天，致锁骨

骨折、多处擦伤（轻伤二级、轻微伤） → 触犯《刑法》

第238条 → 构成非法拘禁罪

F 2

嫌疑人职业：采矿场工人

人物类型：黑社会组织成员

涉案物品：和田玉、烟灰缸……

开设赌场罪：组织多人赌博、长期经营赌场、通过网络建立赌博平台或担任代理接受

投注、参与赌网利润分成等行为……

为牟取非法利益 → 王x龙、王x等人在xx县xx村

开设赌场，由王x找场地和车辆，组织赌博，朱x峰

放高利贷，宋x波、胡x桥看场子，刘某1记账，多次

组织张某2、姚某1等人赌博 → 非法获利14万

余元 → 构成开设赌场罪……

F 3

嫌疑人职业：无固定职业

人物类型：黑社会性质组织者

涉案物品：房屋买卖合同、

收款收据……

寻衅滋事罪：寻衅滋事罪指的是随意殴打

他人，情节恶劣的、追逐、拦截、辱骂、恐吓

他人，情节恶劣的、强拿硬要或者任意损毁、

占用公私财物，情节严重的、在公共场所

起哄闹事，造成公共场所秩序严重混乱的，

破坏社会秩序的犯罪……

王x为强行承包沙场，安排宋x波等人站场 → 双方

发生冲突，宋x波捅伤多人，王x龙带人追打 → 虽

故意伤害行为由宋卫x实施，但王x指使站场、引发

事端，构成寻衅滋事罪 → 触犯《刑法》第293条 →

构成寻衅滋事罪……

Tab.2 Example representations of sub-facts of cases

案件

子事实

案件要素

适用法律条款罪状描述

案件行为链

F 1

嫌疑人职业：无固定职业，

茶楼经营者

人物类型：黑社会性质组织者

涉案物品：水果刀……

非法拘禁罪：规定非法拘禁或剥夺他人

人身自由的行为将受法律制裁，包括有期

徒刑、拘役等。如有殴打、侮辱情节，将加重

处罚。致人重伤或死亡者，刑罚更重……

离婚财产纠纷 → 房某1指使宋x波等人将宋某1

诱骗后挟持至临沂民房，捆绑、殴打，逼迫签订离婚

协议及房产过户申请 → 宋某1被拘禁10天，致锁骨

骨折、多处擦伤（轻伤二级、轻微伤） → 触犯《刑法》

第238条 → 构成非法拘禁罪

F 2

嫌疑人职业：采矿场工人

人物类型：黑社会组织成员

涉案物品：和田玉、烟灰缸……

开设赌场罪：组织多人赌博、长期经营赌场、通过网络建立赌博平台或担任代理接受

投注、参与赌网利润分成等行为……

为牟取非法利益 → 王x龙、王x等人在xx县xx村

开设赌场，由王x找场地和车辆，组织赌博，朱x峰

放高利贷，宋x波、胡x桥看场子，刘某1记账，多次

组织张某2、姚某1等人赌博 → 非法获利14万

余元 → 构成开设赌场罪……

F 3

嫌疑人职业：无固定职业

人物类型：黑社会性质组织者

涉案物品：房屋买卖合同、

收款收据……

寻衅滋事罪：寻衅滋事罪指的是随意殴打

他人，情节恶劣的、追逐、拦截、辱骂、恐吓

他人，情节恶劣的、强拿硬要或者任意损毁、

占用公私财物，情节严重的、在公共场所

起哄闹事，造成公共场所秩序严重混乱的，

破坏社会秩序的犯罪……

王x为强行承包沙场，安排宋x波等人站场 → 双方

发生冲突，宋x波捅伤多人，王x龙带人追打 → 虽

故意伤害行为由宋卫x实施，但王x指使站场、引发

事端，构成寻衅滋事罪 → 触犯《刑法》第293条 →

构成寻衅滋事罪……

Fig.3 SFA mechanism

Tab.3 Annotation instructions for LeCaRD dataset

要件事实	案情事实	相似度
相似	相似	3
相似	不相似	2
不相似	相似	1
不相似	不相似	0

Tab.4 Evaluation results of different models

模型	MAP	P@3	NDCG@3	NDCG@5	NDCG@10
TF-IDF	45.05	35.36	63.77	64.70	66.81
BM-25	48.19	41.77	64.76	65.94	68.79
BERT	48.83	41.11	68.35	68.97	72.42
RoBERTa	53.83	47.62	74.40	74.33	76.70
coCondenser	52.13	47.11	67.22	66.86	69.21
COT-MAE	56.35	49.42	69.45	67.13	70.72
RetroMAE	55.76	49.97	68.01	67.20	68.83
SAILER	55.92	51.67	78.97	79.33	80.16
Lawformer	54.58	50.79	73.19	73.43	75.54
BERT-PLI（BERT）	47.91	39.99	63.22	68.56	72.24
PromptCase	64.92	55.45	78.15	78.30	80.23
KELLER	64.77	55.87	79.79	81.62	84.34
本文模型	67.45	60.95	83.01	83.44	85.37

Tab.5 Ablation experimental results

消除模块	MAP	P@3	NDCG@3	NDCG@5	NDCG@10
案件行为链	61.91	54.28	79.31	80.37	81.92
法律条款罪状描述	65.28	56.82	80.32	81.86	84.66
案件要素	66.44	59.68	81.95	82.21	84.04
SFA机制	64.89	55.55	79.12	80.71	83.82

References 30

[1]	谢永峰，尹华，乔丹. 类案检索技术研究综述［J］. 软件导刊， 2024， 23（6）： 198-204.
	XIE Y F， YIN H， QIAO D. A survey on law case retrieval technology［J］. Software Guide， 2024， 23（6）： 198-204.
[2]	LI H， AI Q， CHEN J， et al. SAILER： structure-aware pre-trained language model for legal case retrieval［C］// Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2023： 1035-1044.
[3]	DENG C， MAO K， DOU Z. Learning interpretable legal case retrieval via knowledge-guided case reformulation［C］// Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2024： 1253-1265.
[4]	TANG Y， QIU R， LI X. Prompt-based effective input reformulation for legal case retrieval［C］// Proceedings of the 2023 Australasian Database Conference， LNCS 14386. Cham： Springer， 2024： 87-100.
[5]	李林睿，王东升，范红杰. 基于法条知识的事理型类案检索方法［J］.浙江大学学报（工学版）， 2024， 58（7）： 1357-1365.
	LI L R， WANG D S， FAN H J. Fact-based similar case retrieval methods based on statutory knowledge［J］. Journal of Zhejiang University （Engineering Science）， 2024， 58（7）： 1357-1365.
[6]	XIAO C， HU X， LIU Z， et al. Lawformer： a pre-trained language model for Chinese legal long documents［J］. AI Open， 2021， 2： 79-84.
[7]	VAN OPIJNEN M， SANTOS C. On the concept of relevance in legal information retrieval［J］. Artificial Intelligence and Law， 2017， 25（1）： 65-87.
[8]	KHATTAB O， ZAHARIA M. ColBERT： efficient and effective passage search via contextualized late interaction over BERT［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 39-48.
[9]	MA Y， SHAO Y， WU Y， et al. LeCaRD： a legal case retrieval dataset for Chinese law system［C］// Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2021： 2342-2348.
[10]	SALTON G， BUCKLEY C. Term-weighting approaches in automatic text retrieval［J］. Information Processing and Management， 1988， 24（5）： 513-523.
[11]	ROBERTSON S， ZARAGOZA H. The probabilistic relevance framework： BM25 and beyond［J］. Foundations and Trends^® in Information Retrieval， 2009， 3（4）： 333-389.
[12]	PONTE J M， CROFT W B. A language modeling approach to information retrieval［C］// Proceeding of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 1998： 275-281.
[13]	BLEI D M， NG A Y， JORDAN M I. Latent Dirichlet allocation［J］. Journal of Machine Learning Research， 2003， 3： 993-1022.
[14]	詹力林，秦永彬，黄瑞章，等. 融合时序行为链与事件类型的类案检索方法［J］. 计算机应用， 2025， 45（6）： 1741-1747.
	ZHAN L L， QIN Y B， HUANG R Z， et al. Legal case retrieval method integrating temporal behavior chain and event type［J］. Journal of Computer Applications， 2025， 45（6）： 1741-1747.
[15]	TRAN V， NGUYEN M L， SATOH K. Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model［C］// Proceeding of the 17th International Conference on Artificial Intelligence and Law. New York： ACM， 2019： 275-282.
[16]	ASKARI A， VERBERNE S. Combining lexical and neural retrieval with Longformer-based summarization for effective case law retrieval［C］// Proceeding of the 2nd International Conference on Design of Experimental Search and Information Retrieval Systems. Aachen： CEUR-WS.org， 2021： 162-170.
[17]	YU W， SUN Z， XU J， et al. Explainable legal case matching via inverse optimal transport-based rationale extraction［C］// Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2022： 657-668.
[18]	SHAO Y， MAO J， LIU Y， et al. BERT-PLI： modeling paragraph-level interactions for legal case retrieval［C］// Proceeding of the 29th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2020： 3501-3507.
[19]	ALTHAMMER S， HOFSTÄTTER S， HANBURY A. Cross-domain retrieval in the legal and patent domains： a reproducibility study［C］// Proceeding of the 2021 European Conference on Information Retrieval， LNCS 12657. Cham： Springer， 2021： 3-17.
[20]	HU W， ZHAO S， ZHAO Q， et al. BERT_LF： a similar case retrieval method based on legal facts［J］. Wireless Communications and Mobile Computing， 2022， 2022： No.2511147.
[21]	ZANG J， LIU H. Modeling selective feature attention for lightweight text matching［C］// Proceedings of the 33rd International Joint Conference on Artificial Intelligence. California： ijcai.org， 2024： 6624-6632.
[22]	曹发鑫，孙媛媛，王治政，等. 面向借贷案件的相似案例匹配模型［J］. 计算工程， 2024， 50（1）： 306-312.
	CAO F X， SUN Y Y， WANG Z Z， et al. Similar case matching model for lending cases［J］. Computer Engineering， 2024， 50（1）： 306-312.
[23]	刘权，余正涛，高盛祥，等. 融合案件要素的相似案例匹配［J］. 中文信息学报， 2022， 36（11）： 140-147.
	LIU Q， YU Z T， GAO S X， et al. Incorporating case elements for case matching［J］. Journal of Chinese Information Processing， 2022， 36（11）： 140-147.
[24]	刘博阳，李尚，叶麟，等. 基于法律要素引导的相似案例推荐算法［J］. 智能计算机与应用， 2021， 11（6）： 1-4， 13.
	LIU B Y， LI S， YE L， et al. Similar case recommendation algorithm based on legal elements［J］. Intelligent Computer and Applications， 2021， 11（6）： 1-4， 13.
[25]	LYU Y， WANG Z， REN Z， et al. Improving legal judgment prediction through reinforced criminal element extraction［J］. Information Processing and Management， 2022， 59（1）： No.102780.
[26]	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding［C］// Proceeding of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
[27]	LIU Y， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach［EB/OL］. ［2024-03-25］..
[28]	GAO L， CALLAN J. Unsupervised corpus aware language model pre-training for dense passage retrieval［C］// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2022： 2843-2853.
[29]	XIAO S， LIU Z， SHAO Y， et al. RetroMAE： pre-training retrieval-oriented language models via masked auto-encoder［C］// Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2022： 538-548.
[30]	WU X， MA G Y， LIN M， et al. ConTextual masked auto-encoder for dense passage retrieval［C］// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2023： 4738-4746.

Legal case retrieval method via case information reformulation using large language model

基于大语言模型重构案件信息的类案检索方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 30

Related Articles 15

Recommended Articles

Metrics

[1]	Longyu XIONG, Shengdong DU, Haochen SHI, Jie HU, Yan YANG, Tianrui LI. Government affairs hotline question answering system based on knowledge-enhanced large language model architecture [J]. Journal of Computer Applications, 2026, 46(6): 1721-1727.
[2]	Taixin CAI, Fengfeng WEI. Large language model-enhanced ant colony optimization for multi-solution traveling salesman problems [J]. Journal of Computer Applications, 2026, 46(6): 1712-1720.
[3]	Qianfei WANG, Yang LI, Deyu LI, Suge WANG. Dual-channel feature fusion representation method for short-text clustering based on large language model [J]. Journal of Computer Applications, 2026, 46(5): 1441-1449.
[4]	Xing SHENG, Sunxian WENG, Kuosong CHEN, Zhongping WANG, Ruifeng REN, Yong LIU. Deep learning-based patent value evaluation for power grid enterprises [J]. Journal of Computer Applications, 2026, 46(5): 1468-1474.
[5]	Jiali ZHENG, Gang ZHOU, Jing CHEN, Shunhang LI. Adaptive multi-feature fusion detection method for AI-generated text [J]. Journal of Computer Applications, 2026, 46(5): 1433-1440.
[6]	Xiaoyu WANG, Xin LI, Di XUE, Zhangtao JIANG, Wei WANG, Yanjun XIAO. Vulnerability classification framework for video surveillance network security based on large language models [J]. Journal of Computer Applications, 2026, 46(4): 1158-1170.
[7]	Kaizhou SHI, Xuan HE, Guoyi HOU, Gen LI, Shuanggao LI, Xiang HUANG. Airborne product metrological traceability knowledge graph construction method based on large language models [J]. Journal of Computer Applications, 2026, 46(4): 1086-1095.
[8]	Haoyang ZHANG, Liping ZHANG, Sheng YAN, Na LI, Xuefei ZHANG. Review of large language model methods for knowledge graph completion [J]. Journal of Computer Applications, 2026, 46(3): 683-695.
[9]	Bin SHEN, Xiaoning CHEN, Hua CHENG, Yiquan FANG, Huifeng WANG. Intelligent undergraduate teaching evaluation system based on large language models [J]. Journal of Computer Applications, 2026, 46(3): 993-1003.
[10]	Enkang XI, Jing FAN, Yadong JIN, Hua DONG, Hao YU, Yihang SUN. Review of threats faced by federated learning in privacy and security field [J]. Journal of Computer Applications, 2026, 46(3): 798-808.
[11]	Yiming HUANG, Xihua ZOU, Guo DENG, Di ZHENG. Pre-answering and retrieval filtering： dual-stage optimization method for RAG-based question-answering systems [J]. Journal of Computer Applications, 2026, 46(3): 696-707.
[12]	Rilong WANG, Zhenping LI, Xiaosong LI, Qiang GAO, Ya HE, Yong ZHONG, Yingxiao ZHAO. Multi-Agent collaborative knowledge reasoning framework [J]. Journal of Computer Applications, 2026, 46(3): 708-714.
[13]	Dingjia WU, Zhe CUI. MG-SQL： SQL generation framework with enhanced schema linking and multi-generator collaboration [J]. Journal of Computer Applications, 2026, 46(3): 723-731.
[14]	Fei GAO, Dong CHEN, Dixing BIAN, Wenqiang FAN, Qidong LIU, Pei LYU, Chaoyang ZHANG, Mingliang XU. Multistage coupled decision-making framework for researcher redeployment after discipline revocation [J]. Journal of Computer Applications, 2026, 46(2): 416-426.
[15]	Yixin LIU, Xianggen LIU, Wen LIU, Hongbo DENG, Ziye ZHANG, Hua MU. Benchmark dataset for retrieval-augmented generation on long documents [J]. Journal of Computer Applications, 2026, 46(2): 386-394.