Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (9): 2693-2700.DOI: 10.11772/j.issn.1001-9081.2021071356
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Received:
2021-07-30
Revised:
2021-11-03
Accepted:
2021-11-09
Online:
2022-09-19
Published:
2022-09-10
Contact:
Weisen FENG
About author:
XU Guanyou, born in 1997, M. S. candidate. His research interests include natural language processing, knowledge graph.
通讯作者:
冯伟森
作者简介:
徐关友(1997—),男,四川泸州人,硕士研究生,主要研究方向:自然语言处理、知识图谱;
CLC Number:
Guanyou XU, Weisen FENG. Python named entity recognition model based on transformer[J]. Journal of Computer Applications, 2022, 42(9): 2693-2700.
徐关友, 冯伟森. 基于transformer的python命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2693-2700.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071356
数据集 | 类型 | 训练集 | 开发集 | 测试集 |
---|---|---|---|---|
python | 句子数 | 6.1K | 0.7K | 0.6K |
字符数 | 207.4K | 23.3K | 22.3K | |
resume | 句子数 | 3.8K | 0.5K | 0.5K |
字符数 | 124.1K | 13.9K | 15.1K | |
句子数 | 1.4K | 0.27K | 0.27K | |
字符数 | 73.8K | 14.5K | 14.8K |
Tab. 1 Statistics of datasets
数据集 | 类型 | 训练集 | 开发集 | 测试集 |
---|---|---|---|---|
python | 句子数 | 6.1K | 0.7K | 0.6K |
字符数 | 207.4K | 23.3K | 22.3K | |
resume | 句子数 | 3.8K | 0.5K | 0.5K |
字符数 | 124.1K | 13.9K | 15.1K | |
句子数 | 1.4K | 0.27K | 0.27K | |
字符数 | 73.8K | 14.5K | 14.8K |
参数 | 值 |
---|---|
hidden_size | [160,256,320,480] |
number of layers | [ |
number of head | [ |
head dimension | [ |
max_len | [175,178,199] |
fc dropout | 0.4 |
transformer dropout | 0.15 |
optimizer | SGD |
learning rate | [1E-3,7E-4] |
clip | 5 |
batch_size | 10 |
epochs | [75,100] |
Tab. 2 Model parameters
参数 | 值 |
---|---|
hidden_size | [160,256,320,480] |
number of layers | [ |
number of head | [ |
head dimension | [ |
max_len | [175,178,199] |
fc dropout | 0.4 |
transformer dropout | 0.15 |
optimizer | SGD |
learning rate | [1E-3,7E-4] |
clip | 5 |
batch_size | 10 |
epochs | [75,100] |
环境 | 配置 | |
---|---|---|
硬件 | 操作系统 | Windows10 |
中央处理器 | AMD Ryzen7 3700X | |
图形处理器 | GeForce RTX 3070 | |
内存 | 32 GB | |
软件 | 编程环境 | Anaconda |
Python | Python3.6 | |
Pytorch | 1.8.0 | |
Fastnlp | 0.5.0 |
Tab. 3 Experimental environment
环境 | 配置 | |
---|---|---|
硬件 | 操作系统 | Windows10 |
中央处理器 | AMD Ryzen7 3700X | |
图形处理器 | GeForce RTX 3070 | |
内存 | 32 GB | |
软件 | 编程环境 | Anaconda |
Python | Python3.6 | |
Pytorch | 1.8.0 | |
Fastnlp | 0.5.0 |
数据集 | 模型 | P | R | F1 |
---|---|---|---|---|
python | Lattice-LSTM | 70.16 | 69.94 | 70.05 |
WC-LSTM | 72.23 | 72.02 | 72.11 | |
LR-CNN | 71.05 | 73.67 | 72.34 | |
BERT+CRF | 70.69 | 67.47 | 69.04 | |
BERT+LSTM+CRF | 73.81 | 72.75 | 73.28 | |
CW-TF+最短策略 | 70.29 | 71.88 | 71.20 | |
CW-TF+最长策略 | 68.38 | 75.64 | 71.82 | |
CW-TF+平均策略 | 71.66 | 73.75 | 72.69 | |
CW-TF+最长策略+预训练 | 71.11 | 77.20 | 74.03 | |
resume | Lattice-LSTM | 94.81 | 94.11 | 94.46 |
WC-LSTM | 95.27 | 95.15 | 95.21 | |
LR-CNN | 95.37 | 94.84 | 95.11 | |
BERT+CRF | 94.87 | 96.50 | 95.68 | |
BERT+LSTM+CRF | 95.75 | 95.28 | 95.51 | |
CW-TF+最短策略 | 94.62 | 95.25 | 94.94 | |
CW-TF+最长策略 | 95.16 | 95.39 | 95.29 | |
CW-TF+平均策略 | 94.79 | 94.92 | 94.85 | |
Lattice-LSTM | 53.04 | 62.25 | 58.79 | |
WC-LSTM | 52.55 | 67.41 | 59.84 | |
LR-CNN | 57.14 | 66.67 | 59.92 | |
BERT+CRF | 65.77 | 62.05 | 63.80 | |
BERT+LSTM+CRF | 69.65 | 64.62 | 67.33 | |
CW-TF+最短策略 | 70.18 | 50.49 | 58.73 | |
CW-TF+最长策略 | 64.84 | 54.78 | 59.39 | |
CW-TF+平均策略 | 65.09 | 54.79 | 59.49 |
Tab. 4 Experimental results on python, resume, weibo datasets
数据集 | 模型 | P | R | F1 |
---|---|---|---|---|
python | Lattice-LSTM | 70.16 | 69.94 | 70.05 |
WC-LSTM | 72.23 | 72.02 | 72.11 | |
LR-CNN | 71.05 | 73.67 | 72.34 | |
BERT+CRF | 70.69 | 67.47 | 69.04 | |
BERT+LSTM+CRF | 73.81 | 72.75 | 73.28 | |
CW-TF+最短策略 | 70.29 | 71.88 | 71.20 | |
CW-TF+最长策略 | 68.38 | 75.64 | 71.82 | |
CW-TF+平均策略 | 71.66 | 73.75 | 72.69 | |
CW-TF+最长策略+预训练 | 71.11 | 77.20 | 74.03 | |
resume | Lattice-LSTM | 94.81 | 94.11 | 94.46 |
WC-LSTM | 95.27 | 95.15 | 95.21 | |
LR-CNN | 95.37 | 94.84 | 95.11 | |
BERT+CRF | 94.87 | 96.50 | 95.68 | |
BERT+LSTM+CRF | 95.75 | 95.28 | 95.51 | |
CW-TF+最短策略 | 94.62 | 95.25 | 94.94 | |
CW-TF+最长策略 | 95.16 | 95.39 | 95.29 | |
CW-TF+平均策略 | 94.79 | 94.92 | 94.85 | |
Lattice-LSTM | 53.04 | 62.25 | 58.79 | |
WC-LSTM | 52.55 | 67.41 | 59.84 | |
LR-CNN | 57.14 | 66.67 | 59.92 | |
BERT+CRF | 65.77 | 62.05 | 63.80 | |
BERT+LSTM+CRF | 69.65 | 64.62 | 67.33 | |
CW-TF+最短策略 | 70.18 | 50.49 | 58.73 | |
CW-TF+最长策略 | 64.84 | 54.78 | 59.39 | |
CW-TF+平均策略 | 65.09 | 54.79 | 59.49 |
模型 | 数据集 | |
---|---|---|
python | resume | |
Lattice-LSTM | 1.00× | 1.00× |
WC-LSTM | 2.13× | 1.47× |
LR-CNN | 2.52× | 1.51× |
CW-TF+最短策略 | 4.14× | 3.25× |
CW-TF+最长策略 | 3.35× | 3.12× |
CW-TF+平均策略 | 3.40× | 3.16× |
Tab. 5 Training speed
模型 | 数据集 | |
---|---|---|
python | resume | |
Lattice-LSTM | 1.00× | 1.00× |
WC-LSTM | 2.13× | 1.47× |
LR-CNN | 2.52× | 1.51× |
CW-TF+最短策略 | 4.14× | 3.25× |
CW-TF+最长策略 | 3.35× | 3.12× |
CW-TF+平均策略 | 3.40× | 3.16× |
transformer多头注意力特征维度数目 | P | R | F1 |
---|---|---|---|
32 | 72.21 | 71.62 | 71.92 |
64 | 71.66 | 73.75 | 72.69 |
96 | 68.40 | 75.21 | 71.65 |
256 | 69.12 | 73.73 | 72.10 |
Tab. 6 Result comparison of different transformer multi-head attention feature dimension on python dataset
transformer多头注意力特征维度数目 | P | R | F1 |
---|---|---|---|
32 | 72.21 | 71.62 | 71.92 |
64 | 71.66 | 73.75 | 72.69 |
96 | 68.40 | 75.21 | 71.65 |
256 | 69.12 | 73.73 | 72.10 |
1 | DIEFENBACH D, LOPEZ V, SINGH K, et al. Core techniques of question answering systems over knowledge bases: a survey[J]. Knowledge and Information Systems, 2018, 55(3): 529-569. 10.1007/s10115-017-1100-y |
2 | VEALE T. Creative language retrieval: a robust hybrid of information retrieval and linguistic creativity[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2011: 278-287. |
3 | HAN X, GAO T Y, LIN Y K, et al. More data, more relations, more context and more openness: a review and outlook for relation extraction[C]// Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2020: 745-758. |
4 | SAITO K, NAGATA M. Multi-language named-entity recognition system based on HMM[C]// Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition. Stroudsburg, PA: Association for Computational Linguistics, 2003: 41-48. 10.3115/1119384.1119390 |
5 | FENG Y Y, SUN L, LV Y H. Chinese word segmentation and named entity recognition based on conditional random fields models[C]// Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2006: 181-184. |
6 | EKBAL A, BANDYOPADHYAY S. Named entity recognition using support vector machine: a language independent approach[J]. International Journal of Electrical, Computer, and Systems Engineering, 2010, 4(2): 155-170. |
7 | LI X N, YAN H, QIU X P, et al. FLAT: Chinese NER using flat-lattice transformer[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 6836-6842. 10.18653/v1/2020.acl-main.611 |
8 | HE H F, SUN X. A unified model for cross-domain and semi-supervised named entity recognition in Chinese social media[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2017: 3216-3222. 10.1609/aaai.v31i1.10977 |
9 | CAO P F, CHEN Y B, LIU K, et al. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 182-192. 10.18653/v1/d18-1017 |
10 | LI H B, HAGIWARA M, LI Q, et al. Comparison of the impact of word segmentation on name tagging for Chinese and Japanese[C]// Proceedings of the 9th International Conference on Language Resources and Evaluation. Paris: European Language Resources Association, 2014: 2532-2536. |
11 | ZHANG Y, YANG J. Chinese NER using lattice LSTM[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 1554-1564. 10.18653/v1/p18-1144 |
12 | MA R T, PENG M L, ZHANG Q, et al. Simplify the usage of lexicon in Chinese NER[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 5951-5960. 10.18653/v1/2020.acl-main.528 |
13 | GUI T, MA R T, ZHANG Q, et al. CNN-based Chinese NER with lexicon rethinking[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2019: 4982-4988. 10.24963/ijcai.2019/692 |
14 | GUI T, ZOU Y C, ZHANG Q, et al. A lexicon-based graph neural network for Chinese NER[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 1040-1050. 10.18653/v1/d19-1096 |
15 | LI J, SUN A X, HAN J L, et al. A survey on deep learning for named entity recognition[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(1): 50-70. 10.1109/tkde.2020.2981314 |
16 | XU C W, WANG F Y, HAN J L, et al. Exploiting multiple embeddings for Chinese named entity recognition[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: ACM, 2019: 2269-2272. 10.1145/3357384.3358117 |
17 | SUN Y, WANG S H, LI Y K, et al. ERNIE 2.0: a continual pre-training framework for language understanding[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 8968-8975. 10.1609/aaai.v34i05.6428 |
18 | LIU W, XU T G, XU Q H, et al. An encoding strategy based word-character LSTM for Chinese NER[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 2379-2389. 10.18653/v1/n18-2 |
19 | MENG Y X, WU W, WANG F, et al. Glyce: glyph-vectors for Chinese character representations[C/OL]// Proceedings of the 33rd Conference on Neural Information Processing Systems. [2021-03-15].. |
20 | XUAN Z Y, BAO R, JIANG S Y. FGN: fusion glyph network for Chinese named entity recognition[C]// Proceedings of the 2020 China Conference on Knowledge Graph and Semantic Computing, CCIS 1356. Singapore: Springer, 2021: 28-40. |
21 | YAN H, DENG B C, LI X N, et al. TENER: adapting transformer encoder for named entity recognition[EB/OL]. (2019-12-10) [2020-10-13].. |
22 | ZHU Y Y, WANG G X. CAN-NER: convolutional attention network for Chinese named entity recognition[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 3384-3393. 10.18653/v1/N19-1342 |
23 | DING R X, XIE P J, ZHANG X Y, et al. A neural multi-digraph model for Chinese NER with gazetteers[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 1462-1467. 10.18653/v1/p19-1141 |
24 | SUI D B, CHEN Y B, LIU K, et al. Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 3830-3840. 10.18653/v1/d19-1396 |
25 | WU F Z, LIU J X, WU C H, et al. Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation[C]// Proceedings of the 2019 World Wide Web Conference. New York: ACM, 2019: 3342-3348. 10.1145/3308558.3313743 |
26 | XUE M G, YU B W, LIU T W, et al. Porous lattice transformer encoder for Chinese NER[C]// Proceedings of the 28th International Conference on Computational Linguistics. [S.l.]: International Committee on Computational Linguistics, 2020: 3831-3841. 10.18653/v1/2020.coling-main.340 |
27 | ZHAO H S, YANG Y, ZHANG Q, et al. Improve neural entity recognition via multi-task data selection and constrained decoding[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 346-351. 10.18653/v1/n18-2056 |
28 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. |
29 | HETLAND M L. Python基础教程[M]. 3版. 袁国忠,译. 北京:人民邮电出版社, 2014:1-458. 10.1007/978-1-4842-0055-1_1 |
HETLAND M L. Beginning Python: From Novice to Professional[M]. 3rd ed. YUAN G Z, translated. Beijing: People’s Posts and Telecommunications Press, 2008:1-458. 10.1007/978-1-4842-0055-1_1 | |
30 | YANG J, ZHANG Y, LI L W, et al. YEDDA: a lightweight collaborative text span annotation tool[C]// Proceedings of ACL 2018, System Demonstrations. Stroudsburg, PA: Association for Computational Linguistics, 2018: 31-36. 10.18653/v1/p18-4006 |
31 | 杨玉基,许斌,胡家威,等. 一种准确而高效的领域知识图谱构建方法[J]. 软件学报, 2018, 29(10): 2931-2947. 10.13328/j.cnki.jos.005552 |
YANG Y J, XU B, HU J W, et al. Accurate and efficient method for constructing domain knowledge graph[J]. Journal of Software, 2018, 29(10): 2931-2947. 10.13328/j.cnki.jos.005552 | |
32 | 李振,周东岱. 教育知识图谱的概念模型与构建方法研究[J]. 电化教育研究, 2019, 40(8): 78-86, 113. |
LI Z, ZHOU D D. Research on conceptual model and construction method of educational knowledge graph[J]. e-Education Research, 2019, 40(8): 78-86, 113. |
[1] | Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746. |
[2] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[3] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[4] | Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951. |
[5] | Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769. |
[6] | Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902. |
[7] | Huanliang SUN, Siyi WANG, Junling LIU, Jingke XU. Help-seeking information extraction model for flood event in social media data [J]. Journal of Computer Applications, 2024, 44(8): 2437-2445. |
[8] | Yuwei DING, Hongbo SHI, Jie LI, Min LIANG. Image denoising network based on local and global feature decoupling [J]. Journal of Computer Applications, 2024, 44(8): 2571-2579. |
[9] | Kaili DENG, Weibo WEI, Zhenkuan PAN. Industrial defect detection method with improved masked autoencoder [J]. Journal of Computer Applications, 2024, 44(8): 2595-2603. |
[10] | Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642. |
[11] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
[12] | Xun YAO, Zhongzheng QIN, Jie YANG. Generative label adversarial text classification model [J]. Journal of Computer Applications, 2024, 44(6): 1781-1785. |
[13] | Xiting LYU, Jinghua ZHAO, Haiying RONG, Jiale ZHAO. Information diffusion prediction model based on Transformer and relational graph convolutional network [J]. Journal of Computer Applications, 2024, 44(6): 1760-1766. |
[14] | Mengyuan HUANG, Kan CHANG, Mingyang LING, Xinjie WEI, Tuanfa QIN. Progressive enhancement algorithm for low-light images based on layer guidance [J]. Journal of Computer Applications, 2024, 44(6): 1911-1919. |
[15] | Shibin LI, Jun GONG, Shengjun TANG. Semi-supervised heterophilic graph representation learning model based on Graph Transformer [J]. Journal of Computer Applications, 2024, 44(6): 1816-1823. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||