计算机应用 ›› 2019, Vol. 39 ›› Issue (6): 1696-1700.DOI: 10.11772/j.issn.1001-9081.2018109193

• 数据科学与技术 • 上一篇    下一篇

语义驱动的司法文档学习分类方法

马建刚1,2,3, 马应龙4   

  1. 1. 中国人民大学 法学院, 北京 100872;
    2. 国家检察官学院, 北京 102206;
    3. 河南省人民检察院, 郑州 450004;
    4. 华北电力大学 控制与计算机工程学院, 北京 102206
  • 收稿日期:2018-11-15 修回日期:2019-01-03 发布日期:2019-06-17 出版日期:2019-06-10
  • 通讯作者: 马建刚
  • 作者简介:马建刚(1977-),男,河南郑州人,高级工程师,博士,CCF高级会员,主要研究方向:大数据、智慧检务、智慧司法;马应龙(1976-),男,陕西咸阳人,教授,博士,CCF高级会员,主要研究方向:大数据、知识工程。
  • 基金资助:
    国家重点研发计划项目(2018YFC0831404,2018YFC0830605);中国博士后科学基金资助项目(2016M591317)。

Semantic-driven learning and classification method of judicial documents

MA Jiangang1,2,3, MA Yinglong4   

  1. 1. Law School, Renmin University of China, Beijing 100872, China;
    2. National Prosecutors College of P. R. C., Beijing 102206, China;
    3. The People's Procuratorate of Henan Province, Zhengzhou Henen 450004, China;
    4. School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
  • Received:2018-11-15 Revised:2019-01-03 Online:2019-06-17 Published:2019-06-10
  • Supported by:
    This work is partially supported by the National Key R&D Program of China (2018YFC0831404, 2018YFC0830605), the Postdoctoral Science Foundation of China (2016M591317).

摘要: 基于海量的司法文书进行的高效司法文档分类有助于目前的司法智能化应用,如类案推送、文书检索、判决预测和量刑辅助等。面向通用领域的文本分类方法因没有考虑司法领域文本的复杂结构和知识语义,导致司法文本分类的效能很低。针对该问题提出了一种语义驱动的方法来学习和分类司法文书。首先,提出并构建了面向司法领域的领域知识模型以清晰表达文档级语义;然后,基于该模型对司法文档进行相应的领域知识抽取;最后,利用图长短期记忆模型(Graph LSTM)对司法文书进行训练和分类。实验结果表明该方法在准确率和召回率方面明显优于常用的长短期记忆(LSTM)模型、多类别逻辑回归和支持向量机等方法。

关键词: 司法大数据, 领域知识模型, 文本分类, 智慧检务, 图长短期记忆模型

Abstract: Efficient document classification techniques based on large-scale judicial documents are crucial to current judicial intelligent application, such as similar case pushing, legal document retrieval, judgment prediction and sentencing assistance. The general-domain-oriented document classification methods are lack of efficiency because they do not consider the complex structure and knowledge semantics of judicial documents. To solve this problem, a semantic-driven method was proposed to learn and classify judicial documents. Firstly, a domain knowledge model oriented to judicial domain was proposed and constructed to express the document-level semantics clearly. Then, domain knowledge was extracted from the judicial documents based on the model. Finally, the judicial documents were trained and classified by using Graph Long Short-Term Memory (Graph LSTM) model. The experimental results show that, the proposed method is superior to Long Short-Term Memory (LSTM) model, Multinomial Logistic Regression (MLR) and Support Vector Machine (SVM) in accuracy and recall.

Key words: judicial big data, domain knowledge model, text categorization, smart procuratorate, Graph Long Short-Term Memory (Graph LSTM) model

中图分类号: