基于大语言模型重构案件信息的类案检索方法

doi:10.11772/j.issn.1001-9081.2025050662

《计算机应用》唯一官方网站

• • 下一篇

基于大语言模型重构案件信息的类案检索方法

王劲滔,高志霖,孟琪翔,卜凡亮

中国人民公安大学

收稿日期:2025-06-16 修回日期:2025-09-03 发布日期:2025-09-15 出版日期:2025-09-15
通讯作者: 卜凡亮
基金资助:
中国人民公安大学安全防范工程双一流专项（2023SYL08）

A Large Language Model Approach to Legal Case Retrieval with Structured Case Reformulation

Received:2025-06-16 Revised:2025-09-03 Online:2025-09-15 Published:2025-09-15

摘要/Abstract

摘要： 随着智慧司法建设的推进，类案检索技术因其在保障司法公正性与效率性中的关键作用备受关注。然而，现有检索文本仍面临以下挑战：(1)传统模型易受语义结构相似性干扰，难以精准捕捉影响判决的要素。(2)预训练语言模型受限于输入长度，对冗长法律文本的全局语义建模不足。(3)现有聚合相似度评分机制容易受噪声干扰、可解释性不强。针对上述问题，本文提出一种融合法律要素增强与多粒度交互的类案检索模型，从案件要素提取、文本深度编码与评分聚合三个层面进行改进。首先，针对法律文本冗余与要素缺失问题，提出基于大语言模型的案件要素分层提取模块，该方法按照罪名分类总结出案件子事实。有效保留案件核心事实，减少噪声干扰。其次，为解决法律文本编码的深度依赖问题，设计SFA-SAILER编码架构。该架构通过SAILER捕获案件事实与其他章节的跨层次依赖，并在CLS表征处引入特征级注意力机制，在词与特征两个维度对案件信息进行深度编码。最后，使用MaxSim操作符聚合案件子事实间的相似度分数。实验结果表明，与现有众多模型相比，本文模型在LeCaRD数据集上的MAP与P@3指标分别达到了67.45、60.95。NDCG@K指标也均高于其他模型。本研究为类案检索提供了兼顾法律逻辑与深度语义理解的新思路，对推动司法智能化具有实践价值。

关键词: 大语言模型, 类案检索, 智慧司法, 文本匹配

Abstract: Abstract: With the advancement of intelligent justice initiatives, similar case retrieval technology has garnered significant attention for its crucial role in ensuring judicial fairness and efficiency. However, existing retrieval systems still face the following challenges: traditional models were easily interfered with by semantic structural similarity, making it difficult to accurately capture elements influencing judgments; pre-trained language models were constrained by input length limitations, leading to insufficient global semantic modeling of lengthy legal texts; existing aggregated similarity scoring mechanisms were susceptible to noise and lacked strong interpretability. To address these issues, a similar case retrieval model integrating legal element enhancement and multi-granularity interaction was proposed, focusing on improvements at three levels: case element extraction, deep text encoding, and score aggregation. Firstly, to tackle text redundancy and element omission in legal documents, a hierarchical case element extraction module based on large language models (LLMs) was introduced. It summarized case sub-facts according to charge classifications, effectively preserving core case facts while reducing noise interference. Secondly, to resolve the problem of deep dependency in legal text encoding, the SFA-SAILER encoding architecture was designed. The architecture captured cross-hierarchical dependencies between case facts and other sections via SAILER, and a feature-level attention mechanism was incorporated at the CLS representation point, enabling deep encoding of case information at both word and feature dimensions. Finally, the MaxSim operator was employed to aggregate similarity scores between case sub-facts. Experimental results demonstrated that compared to numerous existing models, the proposed model achieved MAP and P@3 scores of 67.45 and 60.95, respectively, on the LeCaRD dataset. NDCG@K metrics also consistently surpassed those of other models. This study provides a new approach to similar case retrieval that balances legal logic with deep semantic understanding, offering practical value for advancing judicial intelligence.

Key words: Large Language Model, Legal Case Retrieval, Intelligent Judiciary, Text Matching

中图分类号:

TP391

王劲滔高志霖孟琪翔卜凡亮. 基于大语言模型重构案件信息的类案检索方法[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2025050662.

[1]	张昊洋, 张丽萍, 闫盛, 李娜, 张学飞. 面向知识图谱补全的大模型方法综述[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 683-695.
[2]	吴定佳, 崔喆. 增强模式链接与多生成器协同的SQL生成框架MG-SQL[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 723-731.
[3]	王日龙, 李振平, 李晓松, 高强, 何亚, 钟勇, 赵英潇. 多Agent协作的知识推理框架[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 708-714.
[4]	郗恩康, 范菁, 金亚东, 董华, 俞浩, 孙伊航. 联邦学习在隐私安全领域面临的威胁综述[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 798-808.
[5]	黄奕明, 邹喜华, 邓果, 郑狄. 预回答与召回过滤：双阶段RAG问答系统优化方法[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 696-707.
[6]	沈斌, 陈晓宁, 程华, 房一泉, 王慧锋. 基于大语言模型的本科教学评估智能系统[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 993-1003.
[7]	高飞, 陈董, 边帝行, 范文强, 刘起东, 吕培, 张朝阳, 徐明亮. 面向学科撤销后科研人员重分配的多阶段耦合决策框架[J]. 《计算机应用》唯一官方网站, 2026, 46(2): 416-426.
[8]	谢欣冉, 崔喆, 陈睿, 彭泰来, 林德坤. 基于层次过滤与标签语义扩展的大模型零样本重排序方法[J]. 《计算机应用》唯一官方网站, 2026, 46(1): 60-68.
[9]	林怡, 夏冰, 王永, 孟顺达, 刘居宠, 张书钦. 基于AI智能体的隐藏RESTful API识别与漏洞检测方法[J]. 《计算机应用》唯一官方网站, 2026, 46(1): 135-143.
[10]	张滨滨, 秦永彬, 黄瑞章, 陈艳平. 结合大语言模型与动态提示的裁判文书摘要方法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2783-2789.
[11]	冯涛, 刘晨. 自动化偏好对齐的双阶段提示调优方法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2442-2447.
[12]	孙熠衡, 刘茂福. 基于知识提示微调的标书信息抽取方法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1169-1176.
[13]	鲁超峰, 陶冶, 文连庆, 孟菲, 秦修功, 杜永杰, 田云龙. 融合大语言模型和预训练模型的少量语料说话人-情感语音转换方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 815-822.
[14]	曹鹏, 温广琪, 杨金柱, 陈刚, 刘歆一, 季学纯. 面向测试用例生成的大模型高效微调方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 725-731.
[15]	盛坤, 王中卿. 基于大语言模型和数据增强的通感隐喻分析[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 794-800.

基于大语言模型重构案件信息的类案检索方法

A Large Language Model Approach to Legal Case Retrieval with Structured Case Reformulation

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics