《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (9): 2693-2700.DOI: 10.11772/j.issn.1001-9081.2021071356
• 人工智能 • 上一篇
收稿日期:
2021-07-30
修回日期:
2021-11-03
接受日期:
2021-11-09
发布日期:
2022-09-19
出版日期:
2022-09-10
通讯作者:
冯伟森
作者简介:
徐关友(1997—),男,四川泸州人,硕士研究生,主要研究方向:自然语言处理、知识图谱;
Received:
2021-07-30
Revised:
2021-11-03
Accepted:
2021-11-09
Online:
2022-09-19
Published:
2022-09-10
Contact:
Weisen FENG
About author:
XU Guanyou, born in 1997, M. S. candidate. His research interests include natural language processing, knowledge graph.
摘要:
最近一些基于字符的命名实体识别(NER)模型无法充分利用词信息,而利用词信息的格子结构模型可能会退化为基于词的模型而出现分词错误。针对这些问题提出了一种基于transformer的python NER模型来编码字符-词信息。首先,将词信息与词开始或结束对应的字符绑定;然后,利用三种不同的策略,将词信息通过transformer编码为固定大小的表示;最后,使用条件随机场(CRF)解码,从而避免获取词边界信息带来的分词错误,并提升批量训练速度。在python数据集上的实验结果可以看出,所提模型的F1值比Lattice-LSTM模型高2.64个百分点,同时训练时间是对比模型的1/4左右,说明所提模型能够防止模型退化,提升批量训练速度,更好地识别python命名实体。
中图分类号:
徐关友, 冯伟森. 基于transformer的python命名实体识别模型[J]. 计算机应用, 2022, 42(9): 2693-2700.
Guanyou XU, Weisen FENG. Python named entity recognition model based on transformer[J]. Journal of Computer Applications, 2022, 42(9): 2693-2700.
数据集 | 类型 | 训练集 | 开发集 | 测试集 |
---|---|---|---|---|
python | 句子数 | 6.1K | 0.7K | 0.6K |
字符数 | 207.4K | 23.3K | 22.3K | |
resume | 句子数 | 3.8K | 0.5K | 0.5K |
字符数 | 124.1K | 13.9K | 15.1K | |
句子数 | 1.4K | 0.27K | 0.27K | |
字符数 | 73.8K | 14.5K | 14.8K |
表1 数据集的统计信息
Tab. 1 Statistics of datasets
数据集 | 类型 | 训练集 | 开发集 | 测试集 |
---|---|---|---|---|
python | 句子数 | 6.1K | 0.7K | 0.6K |
字符数 | 207.4K | 23.3K | 22.3K | |
resume | 句子数 | 3.8K | 0.5K | 0.5K |
字符数 | 124.1K | 13.9K | 15.1K | |
句子数 | 1.4K | 0.27K | 0.27K | |
字符数 | 73.8K | 14.5K | 14.8K |
参数 | 值 |
---|---|
hidden_size | [160,256,320,480] |
number of layers | [ |
number of head | [ |
head dimension | [ |
max_len | [175,178,199] |
fc dropout | 0.4 |
transformer dropout | 0.15 |
optimizer | SGD |
learning rate | [1E-3,7E-4] |
clip | 5 |
batch_size | 10 |
epochs | [75,100] |
表2 模型参数
Tab. 2 Model parameters
参数 | 值 |
---|---|
hidden_size | [160,256,320,480] |
number of layers | [ |
number of head | [ |
head dimension | [ |
max_len | [175,178,199] |
fc dropout | 0.4 |
transformer dropout | 0.15 |
optimizer | SGD |
learning rate | [1E-3,7E-4] |
clip | 5 |
batch_size | 10 |
epochs | [75,100] |
环境 | 配置 | |
---|---|---|
硬件 | 操作系统 | Windows10 |
中央处理器 | AMD Ryzen7 3700X | |
图形处理器 | GeForce RTX 3070 | |
内存 | 32 GB | |
软件 | 编程环境 | Anaconda |
Python | Python3.6 | |
Pytorch | 1.8.0 | |
Fastnlp | 0.5.0 |
表3 实验环境
Tab. 3 Experimental environment
环境 | 配置 | |
---|---|---|
硬件 | 操作系统 | Windows10 |
中央处理器 | AMD Ryzen7 3700X | |
图形处理器 | GeForce RTX 3070 | |
内存 | 32 GB | |
软件 | 编程环境 | Anaconda |
Python | Python3.6 | |
Pytorch | 1.8.0 | |
Fastnlp | 0.5.0 |
数据集 | 模型 | P | R | F1 |
---|---|---|---|---|
python | Lattice-LSTM | 70.16 | 69.94 | 70.05 |
WC-LSTM | 72.23 | 72.02 | 72.11 | |
LR-CNN | 71.05 | 73.67 | 72.34 | |
BERT+CRF | 70.69 | 67.47 | 69.04 | |
BERT+LSTM+CRF | 73.81 | 72.75 | 73.28 | |
CW-TF+最短策略 | 70.29 | 71.88 | 71.20 | |
CW-TF+最长策略 | 68.38 | 75.64 | 71.82 | |
CW-TF+平均策略 | 71.66 | 73.75 | 72.69 | |
CW-TF+最长策略+预训练 | 71.11 | 77.20 | 74.03 | |
resume | Lattice-LSTM | 94.81 | 94.11 | 94.46 |
WC-LSTM | 95.27 | 95.15 | 95.21 | |
LR-CNN | 95.37 | 94.84 | 95.11 | |
BERT+CRF | 94.87 | 96.50 | 95.68 | |
BERT+LSTM+CRF | 95.75 | 95.28 | 95.51 | |
CW-TF+最短策略 | 94.62 | 95.25 | 94.94 | |
CW-TF+最长策略 | 95.16 | 95.39 | 95.29 | |
CW-TF+平均策略 | 94.79 | 94.92 | 94.85 | |
Lattice-LSTM | 53.04 | 62.25 | 58.79 | |
WC-LSTM | 52.55 | 67.41 | 59.84 | |
LR-CNN | 57.14 | 66.67 | 59.92 | |
BERT+CRF | 65.77 | 62.05 | 63.80 | |
BERT+LSTM+CRF | 69.65 | 64.62 | 67.33 | |
CW-TF+最短策略 | 70.18 | 50.49 | 58.73 | |
CW-TF+最长策略 | 64.84 | 54.78 | 59.39 | |
CW-TF+平均策略 | 65.09 | 54.79 | 59.49 |
表4 在python,resume,weibo数据集上的实验结果 (%)
Tab. 4 Experimental results on python, resume, weibo datasets
数据集 | 模型 | P | R | F1 |
---|---|---|---|---|
python | Lattice-LSTM | 70.16 | 69.94 | 70.05 |
WC-LSTM | 72.23 | 72.02 | 72.11 | |
LR-CNN | 71.05 | 73.67 | 72.34 | |
BERT+CRF | 70.69 | 67.47 | 69.04 | |
BERT+LSTM+CRF | 73.81 | 72.75 | 73.28 | |
CW-TF+最短策略 | 70.29 | 71.88 | 71.20 | |
CW-TF+最长策略 | 68.38 | 75.64 | 71.82 | |
CW-TF+平均策略 | 71.66 | 73.75 | 72.69 | |
CW-TF+最长策略+预训练 | 71.11 | 77.20 | 74.03 | |
resume | Lattice-LSTM | 94.81 | 94.11 | 94.46 |
WC-LSTM | 95.27 | 95.15 | 95.21 | |
LR-CNN | 95.37 | 94.84 | 95.11 | |
BERT+CRF | 94.87 | 96.50 | 95.68 | |
BERT+LSTM+CRF | 95.75 | 95.28 | 95.51 | |
CW-TF+最短策略 | 94.62 | 95.25 | 94.94 | |
CW-TF+最长策略 | 95.16 | 95.39 | 95.29 | |
CW-TF+平均策略 | 94.79 | 94.92 | 94.85 | |
Lattice-LSTM | 53.04 | 62.25 | 58.79 | |
WC-LSTM | 52.55 | 67.41 | 59.84 | |
LR-CNN | 57.14 | 66.67 | 59.92 | |
BERT+CRF | 65.77 | 62.05 | 63.80 | |
BERT+LSTM+CRF | 69.65 | 64.62 | 67.33 | |
CW-TF+最短策略 | 70.18 | 50.49 | 58.73 | |
CW-TF+最长策略 | 64.84 | 54.78 | 59.39 | |
CW-TF+平均策略 | 65.09 | 54.79 | 59.49 |
模型 | 数据集 | |
---|---|---|
python | resume | |
Lattice-LSTM | 1.00× | 1.00× |
WC-LSTM | 2.13× | 1.47× |
LR-CNN | 2.52× | 1.51× |
CW-TF+最短策略 | 4.14× | 3.25× |
CW-TF+最长策略 | 3.35× | 3.12× |
CW-TF+平均策略 | 3.40× | 3.16× |
表5 训练速度
Tab. 5 Training speed
模型 | 数据集 | |
---|---|---|
python | resume | |
Lattice-LSTM | 1.00× | 1.00× |
WC-LSTM | 2.13× | 1.47× |
LR-CNN | 2.52× | 1.51× |
CW-TF+最短策略 | 4.14× | 3.25× |
CW-TF+最长策略 | 3.35× | 3.12× |
CW-TF+平均策略 | 3.40× | 3.16× |
transformer多头注意力特征维度数目 | P | R | F1 |
---|---|---|---|
32 | 72.21 | 71.62 | 71.92 |
64 | 71.66 | 73.75 | 72.69 |
96 | 68.40 | 75.21 | 71.65 |
256 | 69.12 | 73.73 | 72.10 |
表6 在python数据集上不同transformer多头注意力的特征维度结果对比 (%)
Tab. 6 Result comparison of different transformer multi-head attention feature dimension on python dataset
transformer多头注意力特征维度数目 | P | R | F1 |
---|---|---|---|
32 | 72.21 | 71.62 | 71.92 |
64 | 71.66 | 73.75 | 72.69 |
96 | 68.40 | 75.21 | 71.65 |
256 | 69.12 | 73.73 | 72.10 |
1 | DIEFENBACH D, LOPEZ V, SINGH K, et al. Core techniques of question answering systems over knowledge bases: a survey[J]. Knowledge and Information Systems, 2018, 55(3): 529-569. 10.1007/s10115-017-1100-y |
2 | VEALE T. Creative language retrieval: a robust hybrid of information retrieval and linguistic creativity[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2011: 278-287. |
3 | HAN X, GAO T Y, LIN Y K, et al. More data, more relations, more context and more openness: a review and outlook for relation extraction[C]// Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2020: 745-758. |
4 | SAITO K, NAGATA M. Multi-language named-entity recognition system based on HMM[C]// Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition. Stroudsburg, PA: Association for Computational Linguistics, 2003: 41-48. 10.3115/1119384.1119390 |
5 | FENG Y Y, SUN L, LV Y H. Chinese word segmentation and named entity recognition based on conditional random fields models[C]// Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2006: 181-184. |
6 | EKBAL A, BANDYOPADHYAY S. Named entity recognition using support vector machine: a language independent approach[J]. International Journal of Electrical, Computer, and Systems Engineering, 2010, 4(2): 155-170. |
7 | LI X N, YAN H, QIU X P, et al. FLAT: Chinese NER using flat-lattice transformer[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 6836-6842. 10.18653/v1/2020.acl-main.611 |
8 | HE H F, SUN X. A unified model for cross-domain and semi-supervised named entity recognition in Chinese social media[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2017: 3216-3222. 10.1609/aaai.v31i1.10977 |
9 | CAO P F, CHEN Y B, LIU K, et al. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 182-192. 10.18653/v1/d18-1017 |
10 | LI H B, HAGIWARA M, LI Q, et al. Comparison of the impact of word segmentation on name tagging for Chinese and Japanese[C]// Proceedings of the 9th International Conference on Language Resources and Evaluation. Paris: European Language Resources Association, 2014: 2532-2536. |
11 | ZHANG Y, YANG J. Chinese NER using lattice LSTM[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 1554-1564. 10.18653/v1/p18-1144 |
12 | MA R T, PENG M L, ZHANG Q, et al. Simplify the usage of lexicon in Chinese NER[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 5951-5960. 10.18653/v1/2020.acl-main.528 |
13 | GUI T, MA R T, ZHANG Q, et al. CNN-based Chinese NER with lexicon rethinking[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2019: 4982-4988. 10.24963/ijcai.2019/692 |
14 | GUI T, ZOU Y C, ZHANG Q, et al. A lexicon-based graph neural network for Chinese NER[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 1040-1050. 10.18653/v1/d19-1096 |
15 | LI J, SUN A X, HAN J L, et al. A survey on deep learning for named entity recognition[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(1): 50-70. 10.1109/tkde.2020.2981314 |
16 | XU C W, WANG F Y, HAN J L, et al. Exploiting multiple embeddings for Chinese named entity recognition[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: ACM, 2019: 2269-2272. 10.1145/3357384.3358117 |
17 | SUN Y, WANG S H, LI Y K, et al. ERNIE 2.0: a continual pre-training framework for language understanding[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 8968-8975. 10.1609/aaai.v34i05.6428 |
18 | LIU W, XU T G, XU Q H, et al. An encoding strategy based word-character LSTM for Chinese NER[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 2379-2389. 10.18653/v1/n18-2 |
19 | MENG Y X, WU W, WANG F, et al. Glyce: glyph-vectors for Chinese character representations[C/OL]// Proceedings of the 33rd Conference on Neural Information Processing Systems. [2021-03-15].. |
20 | XUAN Z Y, BAO R, JIANG S Y. FGN: fusion glyph network for Chinese named entity recognition[C]// Proceedings of the 2020 China Conference on Knowledge Graph and Semantic Computing, CCIS 1356. Singapore: Springer, 2021: 28-40. |
21 | YAN H, DENG B C, LI X N, et al. TENER: adapting transformer encoder for named entity recognition[EB/OL]. (2019-12-10) [2020-10-13].. |
22 | ZHU Y Y, WANG G X. CAN-NER: convolutional attention network for Chinese named entity recognition[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 3384-3393. 10.18653/v1/N19-1342 |
23 | DING R X, XIE P J, ZHANG X Y, et al. A neural multi-digraph model for Chinese NER with gazetteers[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 1462-1467. 10.18653/v1/p19-1141 |
24 | SUI D B, CHEN Y B, LIU K, et al. Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 3830-3840. 10.18653/v1/d19-1396 |
25 | WU F Z, LIU J X, WU C H, et al. Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation[C]// Proceedings of the 2019 World Wide Web Conference. New York: ACM, 2019: 3342-3348. 10.1145/3308558.3313743 |
26 | XUE M G, YU B W, LIU T W, et al. Porous lattice transformer encoder for Chinese NER[C]// Proceedings of the 28th International Conference on Computational Linguistics. [S.l.]: International Committee on Computational Linguistics, 2020: 3831-3841. 10.18653/v1/2020.coling-main.340 |
27 | ZHAO H S, YANG Y, ZHANG Q, et al. Improve neural entity recognition via multi-task data selection and constrained decoding[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 346-351. 10.18653/v1/n18-2056 |
28 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. |
29 | HETLAND M L. Python基础教程[M]. 3版. 袁国忠,译. 北京:人民邮电出版社, 2014:1-458. 10.1007/978-1-4842-0055-1_1 |
HETLAND M L. Beginning Python: From Novice to Professional[M]. 3rd ed. YUAN G Z, translated. Beijing: People’s Posts and Telecommunications Press, 2008:1-458. 10.1007/978-1-4842-0055-1_1 | |
30 | YANG J, ZHANG Y, LI L W, et al. YEDDA: a lightweight collaborative text span annotation tool[C]// Proceedings of ACL 2018, System Demonstrations. Stroudsburg, PA: Association for Computational Linguistics, 2018: 31-36. 10.18653/v1/p18-4006 |
31 | 杨玉基,许斌,胡家威,等. 一种准确而高效的领域知识图谱构建方法[J]. 软件学报, 2018, 29(10): 2931-2947. 10.13328/j.cnki.jos.005552 |
YANG Y J, XU B, HU J W, et al. Accurate and efficient method for constructing domain knowledge graph[J]. Journal of Software, 2018, 29(10): 2931-2947. 10.13328/j.cnki.jos.005552 | |
32 | 李振,周东岱. 教育知识图谱的概念模型与构建方法研究[J]. 电化教育研究, 2019, 40(8): 78-86, 113. |
LI Z, ZHOU D D. Research on conceptual model and construction method of educational knowledge graph[J]. e-Education Research, 2019, 40(8): 78-86, 113. |
[1] | 胡婕, 胡燕, 刘梦赤, 张龑. 基于知识库实体增强BERT模型的中文命名实体识别[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2680-2685. |
[2] | 侯旭东, 滕飞, 张艺. 基于深度自编码的医疗命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2686-2692. |
[3] | 张显杰, 张之明. 基于卷积神经网络和Transformer的手写体英文文本识别[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2394-2400. |
[4] | 邓杰航, 郭文权, 陈汉杰, 顾国生, 刘景建, 杜宇坤, 刘超, 康晓东, 赵建. 融合多尺度多头自注意力和在线难例挖掘的小样本硅藻检测[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2593-2600. |
[5] | 左亚尧, 陈皓宇, 陈致然, 洪嘉伟, 陈坤. 融合多语义特征的命名实体识别方法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2001-2008. |
[6] | 韩玉民, 郝晓燕. 基于子词嵌入和相对注意力的材料实体识别[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1862-1868. |
[7] | 杨先凤, 赵家和, 李自强. 融合字注释的文本分类模型[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1317-1323. |
[8] | 张毅, 王爽胜, 何彬, 叶培明, 李克强. 基于BERT的初等数学文本命名实体识别方法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 433-439. |
[9] | 武国亮, 徐继宁. 基于命名实体识别任务反馈增强的中文突发事件抽取方法[J]. 计算机应用, 2021, 41(7): 1891-1896. |
[10] | 李想, 王卫兵, 尚学达. 指针生成网络和覆盖损失优化的Transformer在生成式文本摘要领域的应用[J]. 计算机应用, 2021, 41(6): 1647-1651. |
[11] | 崔博文, 金涛, 王建民. 自由文本电子病历信息抽取综述[J]. 计算机应用, 2021, 41(4): 1055-1063. |
[12] | 许力, 李建华. 基于句法依存分析的图网络生物医学命名实体识别[J]. 计算机应用, 2021, 41(2): 357-362. |
[13] | 汪涛, 靳聪, 李小兵, 帖云, 齐林. 基于Transformer的多轨音乐生成对抗网络[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3585-3589. |
[14] | 张心怡, 冯仕民, 丁恩杰. 面向煤矿的实体识别与关系抽取模型[J]. 计算机应用, 2020, 40(8): 2182-2188. |
[15] | 潘国腾, 欧国东, 晁张虎, 李梦君. Lite寄存器模型的设计与实现[J]. 计算机应用, 2020, 40(5): 1369-1373. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||