Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (10): 3011-3017.DOI: 10.11772/j.issn.1001-9081.2021091565
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Lanlan ZENG, Yisong WANG, Panfeng CHEN
Received:
2021-09-03
Revised:
2021-12-02
Accepted:
2022-01-04
Online:
2022-04-15
Published:
2022-10-10
Contact:
Yisong WANG
About author:
ZENG Lanlan, born in 1997, M. S. candidate. Her research interests include natural language processing, knowledge representation and reasoning.Supported by:
曾兰兰, 王以松, 陈攀峰
通讯作者:
王以松
作者简介:
第一联系人:曾兰兰(1997—),女,贵州毕节人,硕士研究生,主要研究方向:自然语言处理、知识表示与推理基金资助:
CLC Number:
Lanlan ZENG, Yisong WANG, Panfeng CHEN. Named entity recognition based on BERT and joint learning for judgment documents[J]. Journal of Computer Applications, 2022, 42(10): 3011-3017.
曾兰兰, 王以松, 陈攀峰. 基于BERT和联合学习的裁判文书命名实体识别[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3011-3017.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021091565
实体类别(英文简写) | 训练集 | 验证集 | 测试集 |
---|---|---|---|
作案时间(TIME) | 1 203 | 166 | 180 |
作案地点(LOC) | 1 079 | 156 | 135 |
被告人(DEF) | 1 067 | 225 | 235 |
受害人(VIC) | 797 | 224 | 196 |
案发起因(MOT) | 297 | 54 | 46 |
作案工具(TOOL) | 259 | 65 | 66 |
损失物品(OBJ) | 259 | 64 | 66 |
损失金额(MON) | 731 | 114 | 109 |
人身损伤(INJ) | 52 | 25 | 20 |
Tab. 1 Number of entities in each category
实体类别(英文简写) | 训练集 | 验证集 | 测试集 |
---|---|---|---|
作案时间(TIME) | 1 203 | 166 | 180 |
作案地点(LOC) | 1 079 | 156 | 135 |
被告人(DEF) | 1 067 | 225 | 235 |
受害人(VIC) | 797 | 224 | 196 |
案发起因(MOT) | 297 | 54 | 46 |
作案工具(TOOL) | 259 | 65 | 66 |
损失物品(OBJ) | 259 | 64 | 66 |
损失金额(MON) | 731 | 114 | 109 |
人身损伤(INJ) | 52 | 25 | 20 |
模型 | 精确率 | 召回率 | F1值 |
---|---|---|---|
BiLSTM-CRF[ | 90.45 | 91.34 | 90.90 |
BiLSTM-CRF(Word2vec) | 92.47 | 91.43 | 91.95 |
ID-CNN-CRF[ | 88.55 | 91.83 | 90.16 |
Lattice-LSTM[ | 91.32 | 91.51 | 91.42 |
BERT-CRF[ | 92.58 | 94.53 | 93.53 |
BERT-BiLSTM-CRF[ | 93.31 | 94.46 | 93.88 |
JLB-BiLSTM-CRF | 94.36 | 94.94 | 94.65 |
Tab. 2 Comparison of experimental results of different models
模型 | 精确率 | 召回率 | F1值 |
---|---|---|---|
BiLSTM-CRF[ | 90.45 | 91.34 | 90.90 |
BiLSTM-CRF(Word2vec) | 92.47 | 91.43 | 91.95 |
ID-CNN-CRF[ | 88.55 | 91.83 | 90.16 |
Lattice-LSTM[ | 91.32 | 91.51 | 91.42 |
BERT-CRF[ | 92.58 | 94.53 | 93.53 |
BERT-BiLSTM-CRF[ | 93.31 | 94.46 | 93.88 |
JLB-BiLSTM-CRF | 94.36 | 94.94 | 94.65 |
模型 | 示例1 | 示例2 | 示例3 |
---|---|---|---|
BiLSTM-CRF(Word2vec) | 被告人途经 施工中的工地,将12个方管托盘窃走。 | 被告人脚踢救护车后挡风玻璃,致使挡风 玻璃碎裂。救护车损坏修复费为 | 谭某1纠集严1、严2 进入 |
BERT-BiLSTM-CRF[ | 被告人途经 | 被告人脚踢救护车后挡风玻璃,致使挡风 玻璃碎裂。救护车损坏修复费为 | 谭某1纠集 进入 |
JLB-BiLSTM-CRF | 被告人途经 | 被告人脚踢救护车后挡风玻璃,致使挡风 玻璃碎裂。救护车损坏修复费为 | 谭某1纠集 进入 |
Tab. 3 Marking results of three models on examples
模型 | 示例1 | 示例2 | 示例3 |
---|---|---|---|
BiLSTM-CRF(Word2vec) | 被告人途经 施工中的工地,将12个方管托盘窃走。 | 被告人脚踢救护车后挡风玻璃,致使挡风 玻璃碎裂。救护车损坏修复费为 | 谭某1纠集严1、严2 进入 |
BERT-BiLSTM-CRF[ | 被告人途经 | 被告人脚踢救护车后挡风玻璃,致使挡风 玻璃碎裂。救护车损坏修复费为 | 谭某1纠集 进入 |
JLB-BiLSTM-CRF | 被告人途经 | 被告人脚踢救护车后挡风玻璃,致使挡风 玻璃碎裂。救护车损坏修复费为 | 谭某1纠集 进入 |
实体类别 | 精确率 | 召回率 | F1值 |
---|---|---|---|
被告人 | 95.75 | 96.65 | 96.20 |
受害人 | 95.69 | 94.68 | 95.17 |
案发起因 | 92.31 | 91.73 | 92.01 |
作案时间 | 99.57 | 97.24 | 98.39 |
作案地点 | 96.25 | 98.83 | 97.52 |
作案工具 | 83.38 | 91.11 | 86.99 |
损失物品 | 89.94 | 82.55 | 85.92 |
损失金额 | 95.17 | 100.00 | 97.53 |
人身损伤 | 99.16 | 100.00 | 99.58 |
Tab. 4 Recognition effect of JLB-BiLSTM-CRF model to each category of entities
实体类别 | 精确率 | 召回率 | F1值 |
---|---|---|---|
被告人 | 95.75 | 96.65 | 96.20 |
受害人 | 95.69 | 94.68 | 95.17 |
案发起因 | 92.31 | 91.73 | 92.01 |
作案时间 | 99.57 | 97.24 | 98.39 |
作案地点 | 96.25 | 98.83 | 97.52 |
作案工具 | 83.38 | 91.11 | 86.99 |
损失物品 | 89.94 | 82.55 | 85.92 |
损失金额 | 95.17 | 100.00 | 97.53 |
人身损伤 | 99.16 | 100.00 | 99.58 |
1 | ZHONG H X, XIAO C J, TU C C, et al. How does NLP benefit legal system: a summary of legal artificial intelligence[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 5218-5230. 10.18653/v1/2020.acl-main.466 |
2 | BANSAL N, SHARMA A, SINGH R K. A review on the application of deep learning in legal domain[C]// Proceedings of the 15th IFIP International Conference on Artificial Intelligence Applications and Innovations, IFIPAICT 559. Cham: Springer, 2019: 374-381. |
3 | 佘贵清,张永安. 审判案例自动抽取与标注模型研究[J]. 现代图书情报技术, 2013(6):23-29. 10.11925/infotech.1003-3513.2013.06.04 |
SHE G Q, ZHANG Y A. Study on the model of automatic extraction and annotation of trial cases[J]. New Technology of Library and Information Service, 2013(6): 23-29. 10.11925/infotech.1003-3513.2013.06.04 | |
4 | 宋传宝. 基于GATE的司法案件信息抽取方法研究[D]. 天津:天津大学, 2016:26-37. |
SONG C B. Research on the method of information extraction based on GATE[D]. Tianjin: Tianjin University, 2016: 26-37. | |
5 | LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]// Proceedings of the 31st International Conference on Machine Learning. New York: JMLR.org, 2014: 1188-1196. |
6 | LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2016: 260-270. 10.18653/v1/n16-1030 |
7 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. 10.18653/v1/n18-2 |
8 | MORWAL S, JAHAN N, CHOPRA D. Named entity recognition using Hidden Markov Model (HMM)[J]. International Journal on Natural Language Computing, 2012, 1(4):15-23. 10.5121/ijnlc.2012.1402 |
9 | SONG S L, ZHANG N, HUANG H T. Named entity recognition based on conditional random fields[J]. Cluster Computing, 2019, 22(S3): 5195-5206. 10.1007/s10586-017-1146-3 |
10 | JU Z F, WANG J, ZHU F. Named entity recognition from biomedical text using SVM[C]// Proceedings of the 5th International Conference on Bioinformatics and Biomedical Engineering. Piscataway: IEEE, 2011: 1-4. 10.1109/icbbe.2011.5779984 |
11 | HAMMERTON J. Named entity recognition with long short-term memory[C/OL]// Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003. [2021-07-21]. . 10.3115/1119176.1119202 |
12 | ZHANG Y, YANG J. Chinese NER using lattice LSTM[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 1554-1564. 10.18653/v1/p18-1144 |
13 | DONG C H, ZHANG J J, ZONG C Q, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]// Proceedings of the 2016 International Conference on Computer Processing of Oriental Languages and the 2016 National CCF Conference on Natural Language Processing and Chinese Computing, LNCS 10102. Cham: Springer, 2016: 239-250. |
14 | STRUBELL E, VERGA P, BELANGER D, et al. Fast and accurate entity recognition with iterated dilated convolutions[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2017: 2670-2680. 10.18653/v1/d17-1283 |
15 | MA X Z, HOVY E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2016: 1064-1074. 10.18653/v1/p16-1101 |
16 | LIU S, YANG H, LI J Y, et al. Chinese named entity recognition method in history and culture field based on BERT[J]. International Journal of Computational Intelligence Systems, 2021, 14(1): No.163. 10.1007/s44196-021-00019-8 |
17 | LI X Y, ZHANG H, ZHOU X H. Chinese clinical named entity recognition with variant neural structures based on BERT methods[J]. Journal of Biomedical Informatics, 2020, 107: No.103422. 10.1016/j.jbi.2020.103422 |
18 | WANG X, ZHANG Y, REN X, et al. Cross-type biomedical named entity recognition with deep multi-task learning[J]. Bioinformatics, 2019, 35(10): 1745-1752. 10.1093/bioinformatics/bty869 |
19 | TONG Y Q, CHEN Y D, SHI X D. A multi-task approach for improving biomedical named entity recognition by incorporating multi-granularity information[C]// Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Stroudsburg, PA: Association for Computational Linguistics, 2021: 4804-4813. 10.18653/v1/2021.findings-acl.424 |
20 | HUANG W M, HU D R, DENG Z R, et al. Named entity recognition for Chinese judgment documents based on BiLSTM and CRF[J]. EURASIP Journal on Image and Video Processing, 2020, 2020: No.52. 10.1186/s13640-020-00539-x |
21 | WANG C, LI B, ZHANG W J. Attention-BiLSTM-CRF based model for named entity recognition in judicial domain[J]. Journal of Physics: Conference Series, 2020, 1616: No.012108. 10.1088/1742-6596/1616/1/012108 |
22 | 王得贤,王素格,裴文生,等. 基于JCWA-DLSTM的法律文书命名实体识别方法[J]. 中文信息学报, 2020, 34(10):51-58. 10.3969/j.issn.1003-0077.2020.10.007 |
WANG D X, WANG S G, PEI W S, et al. Named entity recognition based on JCWA-DLSTM for legal instruments[J]. Journal of Chinese Information Processing, 2020, 34(10): 51-58. 10.3969/j.issn.1003-0077.2020.10.007 | |
23 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. |
24 | LIU X D, HE P C, CHEN W Z, et al. Multi-Task deep neural networks for natural language understanding[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 4487-4496. 10.18653/v1/p19-1441 |
25 | ZHANG S X, ZHAO M. Chinese agricultural diseases named entity recognition based on BERT-CRF[C]// Proceedings of the 5th International Conference on Mechanical, Control and Computer Engineering. Piscataway: IEEE, 2020: 1148-1151. 10.1109/icmcce51767.2020.00252 |
[1] | Yuqing WANG, Guangli ZHU, Wenjie DUAN, Shuyu LI, Ruotong ZHOU. Sentiment classification model of psychological counseling text based on attention over attention mechanism [J]. Journal of Computer Applications, 2024, 44(8): 2393-2399. |
[2] | Huanliang SUN, Siyi WANG, Junling LIU, Jingke XU. Help-seeking information extraction model for flood event in social media data [J]. Journal of Computer Applications, 2024, 44(8): 2437-2445. |
[3] | Youren YU, Yangsen ZHANG, Yuru JIANG, Gaijuan HUANG. Chinese named entity recognition model incorporating multi-granularity linguistic knowledge and hierarchical information [J]. Journal of Computer Applications, 2024, 44(6): 1706-1712. |
[4] | Yongfeng DONG, Jiaming BAI, Liqin WANG, Xu WANG. Chinese named entity recognition combining prior knowledge and glyph features [J]. Journal of Computer Applications, 2024, 44(3): 702-708. |
[5] | Xiaoyan ZHANG, Zhengyu DUAN. Cross-lingual zero-resource named entity recognition model based on sentence-level generative adversarial network [J]. Journal of Computer Applications, 2023, 43(8): 2406-2411. |
[6] | Zhirong HOU, Xiaodong FAN, Hua ZHANG, Xiaonan MA. J-SGPGN: paraphrase generation network based on joint learning of sequence and graph [J]. Journal of Computer Applications, 2023, 43(5): 1365-1371. |
[7] | Jingsheng LEI, Kaijun LA, Shengying YANG, Yi WU. Joint entity and relation extraction based on contextual semantic enhancement [J]. Journal of Computer Applications, 2023, 43(5): 1438-1444. |
[8] | Xudong HOU, Fei TENG, Yi ZHANG. Medical named entity recognition model based on deep auto-encoding [J]. Journal of Computer Applications, 2022, 42(9): 2686-2692. |
[9] | Guanyou XU, Weisen FENG. Python named entity recognition model based on transformer [J]. Journal of Computer Applications, 2022, 42(9): 2693-2700. |
[10] | Jie HU, Yan HU, Mengchi LIU, Yan ZHANG. Chinese named entity recognition based on knowledge base entity enhanced BERT model [J]. Journal of Computer Applications, 2022, 42(9): 2680-2685. |
[11] | Yayao ZUO, Haoyu CHEN, Zhiran CHEN, Jiawei HONG, Kun CHEN. Named entity recognition method combining multiple semantic features [J]. Journal of Computer Applications, 2022, 42(7): 2001-2008. |
[12] | Tingxiu CHEN, Jianqin YIN. Audio visual joint action recognition based on key frame selection network [J]. Journal of Computer Applications, 2022, 42(3): 731-735. |
[13] | Yi ZHANG, Shuangsheng WANG, Bin HE, Peiming YE, Keqiang LI. Named entity recognition method of elementary mathematical text based on BERT [J]. Journal of Computer Applications, 2022, 42(2): 433-439. |
[14] | Xiayang SHI, Fengyuan ZHANG, Jiaqi YUAN, Min HUANG. Detection of unsupervised offensive speech based on multilingual BERT [J]. Journal of Computer Applications, 2022, 42(11): 3379-3385. |
[15] | HE Zhenghai, XIAN Yantuan, WANG Meng, YU Zhengtao. Case reading comprehension method combining syntactic guidance and character attention mechanism [J]. Journal of Computer Applications, 2021, 41(8): 2427-2431. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||