《计算机应用》唯一官方网站

• •    下一篇

基于大语言模型重构案件信息的类案检索方法

王劲滔,高志霖,孟琪翔,卜凡亮   

  1. 中国人民公安大学
  • 收稿日期:2025-06-16 修回日期:2025-09-03 发布日期:2025-09-15 出版日期:2025-09-15
  • 通讯作者: 卜凡亮
  • 基金资助:
    中国人民公安大学安全防范工程双一流专项(2023SYL08)

A Large Language Model Approach to Legal Case Retrieval with Structured Case Reformulation

  • Received:2025-06-16 Revised:2025-09-03 Online:2025-09-15 Published:2025-09-15

摘要: 随着智慧司法建设的推进,类案检索技术因其在保障司法公正性与效率性中的关键作用备受关注。然而,现有检索文本仍面临以下挑战:(1)传统模型易受语义结构相似性干扰,难以精准捕捉影响判决的要素。(2)预训练语言模型受限于输入长度,对冗长法律文本的全局语义建模不足。(3)现有聚合相似度评分机制容易受噪声干扰、可解释性不强。针对上述问题,本文提出一种融合法律要素增强与多粒度交互的类案检索模型,从案件要素提取、文本深度编码与评分聚合三个层面进行改进。首先,针对法律文本冗余与要素缺失问题,提出基于大语言模型的案件要素分层提取模块,该方法按照罪名分类总结出案件子事实。有效保留案件核心事实,减少噪声干扰。其次,为解决法律文本编码的深度依赖问题,设计SFA-SAILER编码架构。该架构通过SAILER捕获案件事实与其他章节的跨层次依赖,并在CLS表征处引入特征级注意力机制,在词与特征两个维度对案件信息进行深度编码。最后,使用MaxSim操作符聚合案件子事实间的相似度分数。实验结果表明,与现有众多模型相比,本文模型在LeCaRD数据集上的MAP与P@3指标分别达到了67.45、60.95。NDCG@K指标也均高于其他模型。本研究为类案检索提供了兼顾法律逻辑与深度语义理解的新思路,对推动司法智能化具有实践价值。

关键词: 大语言模型, 类案检索, 智慧司法, 文本匹配

Abstract: Abstract: With the advancement of intelligent justice initiatives, similar case retrieval technology has garnered significant attention for its crucial role in ensuring judicial fairness and efficiency. However, existing retrieval systems still face the following challenges: traditional models were easily interfered with by semantic structural similarity, making it difficult to accurately capture elements influencing judgments; pre-trained language models were constrained by input length limitations, leading to insufficient global semantic modeling of lengthy legal texts; existing aggregated similarity scoring mechanisms were susceptible to noise and lacked strong interpretability. To address these issues, a similar case retrieval model integrating legal element enhancement and multi-granularity interaction was proposed, focusing on improvements at three levels: case element extraction, deep text encoding, and score aggregation. Firstly, to tackle text redundancy and element omission in legal documents, a hierarchical case element extraction module based on large language models (LLMs) was introduced. It summarized case sub-facts according to charge classifications, effectively preserving core case facts while reducing noise interference. Secondly, to resolve the problem of deep dependency in legal text encoding, the SFA-SAILER encoding architecture was designed. The architecture captured cross-hierarchical dependencies between case facts and other sections via SAILER, and a feature-level attention mechanism was incorporated at the CLS representation point, enabling deep encoding of case information at both word and feature dimensions. Finally, the MaxSim operator was employed to aggregate similarity scores between case sub-facts. Experimental results demonstrated that compared to numerous existing models, the proposed model achieved MAP and P@3 scores of 67.45 and 60.95, respectively, on the LeCaRD dataset. NDCG@K metrics also consistently surpassed those of other models. This study provides a new approach to similar case retrieval that balances legal logic with deep semantic understanding, offering practical value for advancing judicial intelligence.

Key words: Large Language Model, Legal Case Retrieval, Intelligent Judiciary, Text Matching

中图分类号: