Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Legal case retrieval method via case information reformulation using large language model
Jintao WANG, Zhilin GAO, Qixiang MENG, Fanliang BU
Journal of Computer Applications    2026, 46 (6): 1785-1792.   DOI: 10.11772/j.issn.1001-9081.2025050662
Abstract224)   HTML1)    PDF (978KB)(22)       Save

With the advancement of intelligent judiciary construction, legal case retrieval technology has garnered significant attention due to its crucial role in ensuring judicial fairness and efficiency. However, the existing text retrieval methods still face the following challenges: the traditional models are susceptible to interference from semantic structural similarities, making it difficult to capture elements that influence judgments accurately; the pre-trained language models are constrained by input length, leading to insufficient global semantic modeling of lengthy legal texts; and the existing aggregated similarity scoring mechanisms are prone to noise interference and lack strong interpretability. To address these challenges, a legal case retrieval method via case information reformulation using large language model (LLM) was proposed. Firstly, LLM was employed to extract information from case texts, so as to combine case elements, descriptions of applicable legal provisions for crimes, and case behavior chains into sub-facts of cases, thereby reducing information redundancy. Secondly, in the encoding part, an SFA-SAILER (Selective Feature Attention & Structure-Aware pre-traIned language model for LEgal case Retrieval) encoding architecture was designed. Thirdly, by encoding case information at two different dimensions deeply — word and feature, the dependency between case information and encoding dimensions was enhanced. Finally, the MaxSim operator was used to aggregate similarity scores. Experimental results show that on the LeCaRD (Legal Case Retrieval Dataset), the proposed model achieves the mean Average Precision (mAP) and Top-3 Precision (P@3) of 67.45% and 60.95%, respectively, and has the Top-K Normalized Discounted Cumulative Gain (NDCG@K) higher than those of comparison models. It can be seen that the proposed model offers a new idea that integrates legal logic with deep semantic understanding for legal case retrieval, and has practical value for intelligent judiciary applications.

Table and Figures | Reference | Related Articles | Metrics