《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (6): 1785-1792.DOI: 10.11772/j.issn.1001-9081.2025050662
收稿日期:2025-06-23
修回日期:2025-09-07
接受日期:2025-09-09
发布日期:2025-09-15
出版日期:2026-06-10
通讯作者:
卜凡亮
作者简介:王劲滔(2000—),男,江苏扬州人,硕士研究生,CCF会员,主要研究方向:自然语言处理、信息检索基金资助:
Jintao WANG, Zhilin GAO, Qixiang MENG, Fanliang BU(
)
Received:2025-06-23
Revised:2025-09-07
Accepted:2025-09-09
Online:2025-09-15
Published:2026-06-10
Contact:
Fanliang BU
About author:WANG Jintao, born in 2000, M. S. candidate. His research interests include natural language processing, information retrieval.Supported by:摘要:
随着智慧司法建设的推进,类案检索技术因为在保障司法公正性与效率性中的关键作用备受关注。然而,现有文本检索方法仍面临以下挑战:传统模型易受语义结构相似性干扰,难以精准捕捉影响判决的要素;预训练语言模型受限于输入长度,对冗长法律文本的全局语义建模不足;现有的聚合相似度评分机制易受噪声干扰,可解释性不强。针对上述问题,提出一种基于大语言模型(LLM)重构案件信息的类案检索方法。首先,利用LLM对案件文本进行信息抽取,以将案件要素、罪行适用法条描述与案件行为链组合成案件子事实,从而减少信息冗余;其次,在编码部分,设计SFA-SAILER (Selective Feature Attention & Structure-Aware pre-traIned language model for LEgal case Retrieval)编码架构;再次,通过在词与特征两个不同维度对案件信息进行深度编码,增强案件信息与编码维度间的依赖关系;最后,使用MaxSim操作符聚合相似度分数。实验结果表明,所提模型在LeCaRD (Legal Case Retrieval Dataset)上的平均精确率均值(mAP)与前3个结果的精确率(P@3)指标分别达到了67.45%和60.95%,而前K个结果的归一化折损累计增益(NDCG@K)指标也均高于对比模型。可见,所提模型可为类案检索提供兼顾法律逻辑与深度语义理解的新思路,在司法智能化应用中具有实践价值。
中图分类号:
王劲滔, 高志霖, 孟琪翔, 卜凡亮. 基于大语言模型重构案件信息的类案检索方法[J]. 计算机应用, 2026, 46(6): 1785-1792.
Jintao WANG, Zhilin GAO, Qixiang MENG, Fanliang BU. Legal case retrieval method via case information reformulation using large language model[J]. Journal of Computer Applications, 2026, 46(6): 1785-1792.
| 案件要素 | 示例 |
|---|---|
| 犯罪人物类型 | 未成年人、精神病人等 |
| 罪名 | 抢劫罪、交通肇事罪等 |
| 犯罪行为 | 伤害、殴打、贩卖毒品等 |
| 涉案物品 | 甲基苯丙胺、机动车辆等 |
| 量刑情节 | 悔罪、立功、认罪等 |
| 和解情况 | 达成和解协议等 |
| 犯罪后果 | 死亡、轻伤、造成损失等 |
表1 案件要素示例
Tab.1 Examples of case elements
| 案件要素 | 示例 |
|---|---|
| 犯罪人物类型 | 未成年人、精神病人等 |
| 罪名 | 抢劫罪、交通肇事罪等 |
| 犯罪行为 | 伤害、殴打、贩卖毒品等 |
| 涉案物品 | 甲基苯丙胺、机动车辆等 |
| 量刑情节 | 悔罪、立功、认罪等 |
| 和解情况 | 达成和解协议等 |
| 犯罪后果 | 死亡、轻伤、造成损失等 |
案件 子事实 | 案件要素 | 适用法律条款罪状描述 | 案件行为链 |
|---|---|---|---|
投注、参与赌网利润分成等行为 | |||
表2 案件子事实的表示示例
Tab.2 Example representations of sub-facts of cases
案件 子事实 | 案件要素 | 适用法律条款罪状描述 | 案件行为链 |
|---|---|---|---|
投注、参与赌网利润分成等行为 | |||
| 要件事实 | 案情事实 | 相似度 |
|---|---|---|
| 相似 | 相似 | 3 |
| 相似 | 不相似 | 2 |
| 不相似 | 相似 | 1 |
| 不相似 | 不相似 | 0 |
表3 LeCaRD数据集的标注说明
Tab.3 Annotation instructions for LeCaRD dataset
| 要件事实 | 案情事实 | 相似度 |
|---|---|---|
| 相似 | 相似 | 3 |
| 相似 | 不相似 | 2 |
| 不相似 | 相似 | 1 |
| 不相似 | 不相似 | 0 |
| 模型 | MAP | P@3 | NDCG@3 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|
| TF-IDF | 45.05 | 35.36 | 63.77 | 64.70 | 66.81 |
| BM-25 | 48.19 | 41.77 | 64.76 | 65.94 | 68.79 |
| BERT | 48.83 | 41.11 | 68.35 | 68.97 | 72.42 |
| RoBERTa | 53.83 | 47.62 | 74.40 | 74.33 | 76.70 |
| coCondenser | 52.13 | 47.11 | 67.22 | 66.86 | 69.21 |
| COT-MAE | 56.35 | 49.42 | 69.45 | 67.13 | 70.72 |
| RetroMAE | 55.76 | 49.97 | 68.01 | 67.20 | 68.83 |
| SAILER | 55.92 | 51.67 | 78.97 | 79.33 | 80.16 |
| Lawformer | 54.58 | 50.79 | 73.19 | 73.43 | 75.54 |
| BERT-PLI(BERT) | 47.91 | 39.99 | 63.22 | 68.56 | 72.24 |
| PromptCase | 64.92 | 55.45 | 78.15 | 78.30 | 80.23 |
| KELLER | 64.77 | 55.87 | 79.79 | 81.62 | 84.34 |
| 本文模型 | 67.45 | 60.95 | 83.01 | 83.44 | 85.37 |
表4 不同模型的测评结果 (%)
Tab.4 Evaluation results of different models
| 模型 | MAP | P@3 | NDCG@3 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|
| TF-IDF | 45.05 | 35.36 | 63.77 | 64.70 | 66.81 |
| BM-25 | 48.19 | 41.77 | 64.76 | 65.94 | 68.79 |
| BERT | 48.83 | 41.11 | 68.35 | 68.97 | 72.42 |
| RoBERTa | 53.83 | 47.62 | 74.40 | 74.33 | 76.70 |
| coCondenser | 52.13 | 47.11 | 67.22 | 66.86 | 69.21 |
| COT-MAE | 56.35 | 49.42 | 69.45 | 67.13 | 70.72 |
| RetroMAE | 55.76 | 49.97 | 68.01 | 67.20 | 68.83 |
| SAILER | 55.92 | 51.67 | 78.97 | 79.33 | 80.16 |
| Lawformer | 54.58 | 50.79 | 73.19 | 73.43 | 75.54 |
| BERT-PLI(BERT) | 47.91 | 39.99 | 63.22 | 68.56 | 72.24 |
| PromptCase | 64.92 | 55.45 | 78.15 | 78.30 | 80.23 |
| KELLER | 64.77 | 55.87 | 79.79 | 81.62 | 84.34 |
| 本文模型 | 67.45 | 60.95 | 83.01 | 83.44 | 85.37 |
| 消除模块 | MAP | P@3 | NDCG@3 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|
| 案件行为链 | 61.91 | 54.28 | 79.31 | 80.37 | 81.92 |
| 法律条款罪状描述 | 65.28 | 56.82 | 80.32 | 81.86 | 84.66 |
| 案件要素 | 66.44 | 59.68 | 81.95 | 82.21 | 84.04 |
| SFA机制 | 64.89 | 55.55 | 79.12 | 80.71 | 83.82 |
表5 消融实验结果 (%)
Tab.5 Ablation experimental results
| 消除模块 | MAP | P@3 | NDCG@3 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|
| 案件行为链 | 61.91 | 54.28 | 79.31 | 80.37 | 81.92 |
| 法律条款罪状描述 | 65.28 | 56.82 | 80.32 | 81.86 | 84.66 |
| 案件要素 | 66.44 | 59.68 | 81.95 | 82.21 | 84.04 |
| SFA机制 | 64.89 | 55.55 | 79.12 | 80.71 | 83.82 |
| [1] | 谢永峰,尹华,乔丹. 类案检索技术研究综述[J]. 软件导刊, 2024, 23(6): 198-204. |
| XIE Y F, YIN H, QIAO D. A survey on law case retrieval technology[J]. Software Guide, 2024, 23(6): 198-204. | |
| [2] | LI H, AI Q, CHEN J, et al. SAILER: structure-aware pre-trained language model for legal case retrieval[C]// Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2023: 1035-1044. |
| [3] | DENG C, MAO K, DOU Z. Learning interpretable legal case retrieval via knowledge-guided case reformulation[C]// Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2024: 1253-1265. |
| [4] | TANG Y, QIU R, LI X. Prompt-based effective input reformulation for legal case retrieval[C]// Proceedings of the 2023 Australasian Database Conference, LNCS 14386. Cham: Springer, 2024: 87-100. |
| [5] | 李林睿,王东升,范红杰. 基于法条知识的事理型类案检索方法[J].浙江大学学报(工学版), 2024, 58(7): 1357-1365. |
| LI L R, WANG D S, FAN H J. Fact-based similar case retrieval methods based on statutory knowledge[J]. Journal of Zhejiang University (Engineering Science), 2024, 58(7): 1357-1365. | |
| [6] | XIAO C, HU X, LIU Z, et al. Lawformer: a pre-trained language model for Chinese legal long documents[J]. AI Open, 2021, 2: 79-84. |
| [7] | VAN OPIJNEN M, SANTOS C. On the concept of relevance in legal information retrieval[J]. Artificial Intelligence and Law, 2017, 25(1): 65-87. |
| [8] | KHATTAB O, ZAHARIA M. ColBERT: efficient and effective passage search via contextualized late interaction over BERT[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 39-48. |
| [9] | MA Y, SHAO Y, WU Y, et al. LeCaRD: a legal case retrieval dataset for Chinese law system[C]// Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2021: 2342-2348. |
| [10] | SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing and Management, 1988, 24(5): 513-523. |
| [11] | ROBERTSON S, ZARAGOZA H. The probabilistic relevance framework: BM25 and beyond[J]. Foundations and Trends® in Information Retrieval, 2009, 3(4): 333-389. |
| [12] | PONTE J M, CROFT W B. A language modeling approach to information retrieval[C]// Proceeding of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1998: 275-281. |
| [13] | BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. |
| [14] | 詹力林,秦永彬,黄瑞章,等. 融合时序行为链与事件类型的类案检索方法[J]. 计算机应用, 2025, 45(6): 1741-1747. |
| ZHAN L L, QIN Y B, HUANG R Z, et al. Legal case retrieval method integrating temporal behavior chain and event type[J]. Journal of Computer Applications, 2025, 45(6): 1741-1747. | |
| [15] | TRAN V, NGUYEN M L, SATOH K. Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model[C]// Proceeding of the 17th International Conference on Artificial Intelligence and Law. New York: ACM, 2019: 275-282. |
| [16] | ASKARI A, VERBERNE S. Combining lexical and neural retrieval with Longformer-based summarization for effective case law retrieval[C]// Proceeding of the 2nd International Conference on Design of Experimental Search and Information Retrieval Systems. Aachen: CEUR-WS.org, 2021: 162-170. |
| [17] | YU W, SUN Z, XU J, et al. Explainable legal case matching via inverse optimal transport-based rationale extraction[C]// Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2022: 657-668. |
| [18] | SHAO Y, MAO J, LIU Y, et al. BERT-PLI: modeling paragraph-level interactions for legal case retrieval[C]// Proceeding of the 29th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2020: 3501-3507. |
| [19] | ALTHAMMER S, HOFSTÄTTER S, HANBURY A. Cross-domain retrieval in the legal and patent domains: a reproducibility study[C]// Proceeding of the 2021 European Conference on Information Retrieval, LNCS 12657. Cham: Springer, 2021: 3-17. |
| [20] | HU W, ZHAO S, ZHAO Q, et al. BERT_LF: a similar case retrieval method based on legal facts[J]. Wireless Communications and Mobile Computing, 2022, 2022: No.2511147. |
| [21] | ZANG J, LIU H. Modeling selective feature attention for lightweight text matching[C]// Proceedings of the 33rd International Joint Conference on Artificial Intelligence. California: ijcai.org, 2024: 6624-6632. |
| [22] | 曹发鑫,孙媛媛,王治政,等. 面向借贷案件的相似案例匹配模型[J]. 计算工程, 2024, 50(1): 306-312. |
| CAO F X, SUN Y Y, WANG Z Z, et al. Similar case matching model for lending cases[J]. Computer Engineering, 2024, 50(1): 306-312. | |
| [23] | 刘权,余正涛,高盛祥,等. 融合案件要素的相似案例匹配[J]. 中文信息学报, 2022, 36(11): 140-147. |
| LIU Q, YU Z T, GAO S X, et al. Incorporating case elements for case matching[J]. Journal of Chinese Information Processing, 2022, 36(11): 140-147. | |
| [24] | 刘博阳,李尚,叶麟,等. 基于法律要素引导的相似案例推荐算法[J]. 智能计算机与应用, 2021, 11(6): 1-4, 13. |
| LIU B Y, LI S, YE L, et al. Similar case recommendation algorithm based on legal elements[J]. Intelligent Computer and Applications, 2021, 11(6): 1-4, 13. | |
| [25] | LYU Y, WANG Z, REN Z, et al. Improving legal judgment prediction through reinforced criminal element extraction[J]. Information Processing and Management, 2022, 59(1): No.102780. |
| [26] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[C]// Proceeding of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186. |
| [27] | LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. [2024-03-25].. |
| [28] | GAO L, CALLAN J. Unsupervised corpus aware language model pre-training for dense passage retrieval[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2022: 2843-2853. |
| [29] | XIAO S, LIU Z, SHAO Y, et al. RetroMAE: pre-training retrieval-oriented language models via masked auto-encoder[C]// Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2022: 538-548. |
| [30] | WU X, MA G Y, LIN M, et al. ConTextual masked auto-encoder for dense passage retrieval[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 4738-4746. |
| [1] | 熊龙雨, 杜圣东, 史浩琛, 胡节, 杨燕, 李天瑞. 基于知识增强大语言模型架构的政务热线问答系统[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1721-1727. |
| [2] | 易宇声, 黄兆豪, 邓梓昊, 孔蕾蕾, 齐浩亮. 面向信创数据库迁移的多知识库协同大语言模型提示框架CORER[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1811-1817. |
| [3] | 蔡泰鑫, 魏凤凤. 面向多解旅行商问题的大语言模型增强蚁群优化算法[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1712-1720. |
| [4] | 王倩飞, 李旸, 李德玉, 王素格. 基于大语言模型的双通道特征融合表示的短文本聚类方法[J]. 《计算机应用》唯一官方网站, 2026, 46(5): 1441-1449. |
| [5] | 盛兴, 翁孙贤, 陈扩松, 王忠平, 任芮锋, 刘勇. 基于深度学习的电网企业专利价值评估[J]. 《计算机应用》唯一官方网站, 2026, 46(5): 1468-1474. |
| [6] | 郑嘉丽, 周刚, 陈静, 李顺航. 基于多特征自适应融合的智能生成文本检测方法[J]. 《计算机应用》唯一官方网站, 2026, 46(5): 1433-1440. |
| [7] | 陈浩轩, 叶培昌, 刘磊, 刘承明, 胡文华. 自动代码编辑推荐综述[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1227-1237. |
| [8] | 王晓宇, 李欣, 薛迪, 蒋章涛, 王威, 肖岩军. 基于大语言模型的视频监控网络安全漏洞分类框架[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1158-1170. |
| [9] | 师凯洲, 何旋, 候国义, 李根, 李泷杲, 黄翔. 基于大语言模型的机载产品计量溯源知识图谱构建方法[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1086-1095. |
| [10] | 张昊洋, 张丽萍, 闫盛, 李娜, 张学飞. 面向知识图谱补全的大模型方法综述[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 683-695. |
| [11] | 沈斌, 陈晓宁, 程华, 房一泉, 王慧锋. 基于大语言模型的本科教学评估智能系统[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 993-1003. |
| [12] | 郗恩康, 范菁, 金亚东, 董华, 俞浩, 孙伊航. 联邦学习在隐私安全领域面临的威胁综述[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 798-808. |
| [13] | 黄奕明, 邹喜华, 邓果, 郑狄. 预回答与召回过滤:双阶段RAG问答系统优化方法[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 696-707. |
| [14] | 王日龙, 李振平, 李晓松, 高强, 何亚, 钟勇, 赵英潇. 多Agent协作的知识推理框架[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 708-714. |
| [15] | 吴定佳, 崔喆. 增强模式链接与多生成器协同的SQL生成框架MG-SQL[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 723-731. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||