《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (9): 2693-2700.DOI: 10.11772/j.issn.1001-9081.2021071356
所属专题: 人工智能
收稿日期:2021-07-30
									
				
											修回日期:2021-11-03
									
				
											接受日期:2021-11-09
									
				
											发布日期:2022-09-19
									
				
											出版日期:2022-09-10
									
				
			通讯作者:
					冯伟森
							作者简介:徐关友(1997—),男,四川泸州人,硕士研究生,主要研究方向:自然语言处理、知识图谱;
				
							Received:2021-07-30
									
				
											Revised:2021-11-03
									
				
											Accepted:2021-11-09
									
				
											Online:2022-09-19
									
				
											Published:2022-09-10
									
			Contact:
					Weisen FENG   
							About author:XU Guanyou, born in 1997, M. S. candidate. His research interests include natural language processing, knowledge graph.				
							摘要:
最近一些基于字符的命名实体识别(NER)模型无法充分利用词信息,而利用词信息的格子结构模型可能会退化为基于词的模型而出现分词错误。针对这些问题提出了一种基于transformer的python NER模型来编码字符-词信息。首先,将词信息与词开始或结束对应的字符绑定;然后,利用三种不同的策略,将词信息通过transformer编码为固定大小的表示;最后,使用条件随机场(CRF)解码,从而避免获取词边界信息带来的分词错误,并提升批量训练速度。在python数据集上的实验结果可以看出,所提模型的F1值比Lattice-LSTM模型高2.64个百分点,同时训练时间是对比模型的1/4左右,说明所提模型能够防止模型退化,提升批量训练速度,更好地识别python命名实体。
中图分类号:
徐关友, 冯伟森. 基于transformer的python命名实体识别模型[J]. 计算机应用, 2022, 42(9): 2693-2700.
Guanyou XU, Weisen FENG. Python named entity recognition model based on transformer[J]. Journal of Computer Applications, 2022, 42(9): 2693-2700.
| 数据集 | 类型 | 训练集 | 开发集 | 测试集 | 
|---|---|---|---|---|
| python | 句子数 | 6.1K | 0.7K | 0.6K | 
| 字符数 | 207.4K | 23.3K | 22.3K | |
| resume | 句子数 | 3.8K | 0.5K | 0.5K | 
| 字符数 | 124.1K | 13.9K | 15.1K | |
| 句子数 | 1.4K | 0.27K | 0.27K | |
| 字符数 | 73.8K | 14.5K | 14.8K | 
表1 数据集的统计信息
Tab. 1 Statistics of datasets
| 数据集 | 类型 | 训练集 | 开发集 | 测试集 | 
|---|---|---|---|---|
| python | 句子数 | 6.1K | 0.7K | 0.6K | 
| 字符数 | 207.4K | 23.3K | 22.3K | |
| resume | 句子数 | 3.8K | 0.5K | 0.5K | 
| 字符数 | 124.1K | 13.9K | 15.1K | |
| 句子数 | 1.4K | 0.27K | 0.27K | |
| 字符数 | 73.8K | 14.5K | 14.8K | 
| 参数 | 值 | 
|---|---|
| hidden_size | [160,256,320,480] | 
| number of layers | [ | 
| number of head | [ | 
| head dimension | [ | 
| max_len | [175,178,199] | 
| fc dropout | 0.4 | 
| transformer dropout | 0.15 | 
| optimizer | SGD | 
| learning rate | [1E-3,7E-4] | 
| clip | 5 | 
| batch_size | 10 | 
| epochs | [75,100] | 
表2 模型参数
Tab. 2 Model parameters
| 参数 | 值 | 
|---|---|
| hidden_size | [160,256,320,480] | 
| number of layers | [ | 
| number of head | [ | 
| head dimension | [ | 
| max_len | [175,178,199] | 
| fc dropout | 0.4 | 
| transformer dropout | 0.15 | 
| optimizer | SGD | 
| learning rate | [1E-3,7E-4] | 
| clip | 5 | 
| batch_size | 10 | 
| epochs | [75,100] | 
| 环境 | 配置 | |
|---|---|---|
| 硬件 | 操作系统 | Windows10 | 
| 中央处理器 | AMD Ryzen7 3700X | |
| 图形处理器 | GeForce RTX 3070 | |
| 内存 | 32 GB | |
| 软件 | 编程环境 | Anaconda | 
| Python | Python3.6 | |
| Pytorch | 1.8.0 | |
| Fastnlp | 0.5.0 | |
表3 实验环境
Tab. 3 Experimental environment
| 环境 | 配置 | |
|---|---|---|
| 硬件 | 操作系统 | Windows10 | 
| 中央处理器 | AMD Ryzen7 3700X | |
| 图形处理器 | GeForce RTX 3070 | |
| 内存 | 32 GB | |
| 软件 | 编程环境 | Anaconda | 
| Python | Python3.6 | |
| Pytorch | 1.8.0 | |
| Fastnlp | 0.5.0 | |
| 数据集 | 模型 | P | R | F1 | 
|---|---|---|---|---|
| python | Lattice-LSTM | 70.16 | 69.94 | 70.05 | 
| WC-LSTM | 72.23 | 72.02 | 72.11 | |
| LR-CNN | 71.05 | 73.67 | 72.34 | |
| BERT+CRF | 70.69 | 67.47 | 69.04 | |
| BERT+LSTM+CRF | 73.81 | 72.75 | 73.28 | |
| CW-TF+最短策略 | 70.29 | 71.88 | 71.20 | |
| CW-TF+最长策略 | 68.38 | 75.64 | 71.82 | |
| CW-TF+平均策略 | 71.66 | 73.75 | 72.69 | |
| CW-TF+最长策略+预训练 | 71.11 | 77.20 | 74.03 | |
| resume | Lattice-LSTM | 94.81 | 94.11 | 94.46 | 
| WC-LSTM | 95.27 | 95.15 | 95.21 | |
| LR-CNN | 95.37 | 94.84 | 95.11 | |
| BERT+CRF | 94.87 | 96.50 | 95.68 | |
| BERT+LSTM+CRF | 95.75 | 95.28 | 95.51 | |
| CW-TF+最短策略 | 94.62 | 95.25 | 94.94 | |
| CW-TF+最长策略 | 95.16 | 95.39 | 95.29 | |
| CW-TF+平均策略 | 94.79 | 94.92 | 94.85 | |
| Lattice-LSTM | 53.04 | 62.25 | 58.79 | |
| WC-LSTM | 52.55 | 67.41 | 59.84 | |
| LR-CNN | 57.14 | 66.67 | 59.92 | |
| BERT+CRF | 65.77 | 62.05 | 63.80 | |
| BERT+LSTM+CRF | 69.65 | 64.62 | 67.33 | |
| CW-TF+最短策略 | 70.18 | 50.49 | 58.73 | |
| CW-TF+最长策略 | 64.84 | 54.78 | 59.39 | |
| CW-TF+平均策略 | 65.09 | 54.79 | 59.49 | 
表4 在python,resume,weibo数据集上的实验结果 (%)
Tab. 4 Experimental results on python, resume, weibo datasets
| 数据集 | 模型 | P | R | F1 | 
|---|---|---|---|---|
| python | Lattice-LSTM | 70.16 | 69.94 | 70.05 | 
| WC-LSTM | 72.23 | 72.02 | 72.11 | |
| LR-CNN | 71.05 | 73.67 | 72.34 | |
| BERT+CRF | 70.69 | 67.47 | 69.04 | |
| BERT+LSTM+CRF | 73.81 | 72.75 | 73.28 | |
| CW-TF+最短策略 | 70.29 | 71.88 | 71.20 | |
| CW-TF+最长策略 | 68.38 | 75.64 | 71.82 | |
| CW-TF+平均策略 | 71.66 | 73.75 | 72.69 | |
| CW-TF+最长策略+预训练 | 71.11 | 77.20 | 74.03 | |
| resume | Lattice-LSTM | 94.81 | 94.11 | 94.46 | 
| WC-LSTM | 95.27 | 95.15 | 95.21 | |
| LR-CNN | 95.37 | 94.84 | 95.11 | |
| BERT+CRF | 94.87 | 96.50 | 95.68 | |
| BERT+LSTM+CRF | 95.75 | 95.28 | 95.51 | |
| CW-TF+最短策略 | 94.62 | 95.25 | 94.94 | |
| CW-TF+最长策略 | 95.16 | 95.39 | 95.29 | |
| CW-TF+平均策略 | 94.79 | 94.92 | 94.85 | |
| Lattice-LSTM | 53.04 | 62.25 | 58.79 | |
| WC-LSTM | 52.55 | 67.41 | 59.84 | |
| LR-CNN | 57.14 | 66.67 | 59.92 | |
| BERT+CRF | 65.77 | 62.05 | 63.80 | |
| BERT+LSTM+CRF | 69.65 | 64.62 | 67.33 | |
| CW-TF+最短策略 | 70.18 | 50.49 | 58.73 | |
| CW-TF+最长策略 | 64.84 | 54.78 | 59.39 | |
| CW-TF+平均策略 | 65.09 | 54.79 | 59.49 | 
| 模型 | 数据集 | |
|---|---|---|
| python | resume | |
| Lattice-LSTM | 1.00× | 1.00× | 
| WC-LSTM | 2.13× | 1.47× | 
| LR-CNN | 2.52× | 1.51× | 
| CW-TF+最短策略 | 4.14× | 3.25× | 
| CW-TF+最长策略 | 3.35× | 3.12× | 
| CW-TF+平均策略 | 3.40× | 3.16× | 
表5 训练速度
Tab. 5 Training speed
| 模型 | 数据集 | |
|---|---|---|
| python | resume | |
| Lattice-LSTM | 1.00× | 1.00× | 
| WC-LSTM | 2.13× | 1.47× | 
| LR-CNN | 2.52× | 1.51× | 
| CW-TF+最短策略 | 4.14× | 3.25× | 
| CW-TF+最长策略 | 3.35× | 3.12× | 
| CW-TF+平均策略 | 3.40× | 3.16× | 
| transformer多头注意力特征维度数目 | P | R | F1 | 
|---|---|---|---|
| 32 | 72.21 | 71.62 | 71.92 | 
| 64 | 71.66 | 73.75 | 72.69 | 
| 96 | 68.40 | 75.21 | 71.65 | 
| 256 | 69.12 | 73.73 | 72.10 | 
表6 在python数据集上不同transformer多头注意力的特征维度结果对比 (%)
Tab. 6 Result comparison of different transformer multi-head attention feature dimension on python dataset
| transformer多头注意力特征维度数目 | P | R | F1 | 
|---|---|---|---|
| 32 | 72.21 | 71.62 | 71.92 | 
| 64 | 71.66 | 73.75 | 72.69 | 
| 96 | 68.40 | 75.21 | 71.65 | 
| 256 | 69.12 | 73.73 | 72.10 | 
| 1 | DIEFENBACH D, LOPEZ V, SINGH K, et al. Core techniques of question answering systems over knowledge bases: a survey[J]. Knowledge and Information Systems, 2018, 55(3): 529-569. 10.1007/s10115-017-1100-y | 
| 2 | VEALE T. Creative language retrieval: a robust hybrid of information retrieval and linguistic creativity[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2011: 278-287. | 
| 3 | HAN X, GAO T Y, LIN Y K, et al. More data, more relations, more context and more openness: a review and outlook for relation extraction[C]// Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2020: 745-758. | 
| 4 | SAITO K, NAGATA M. Multi-language named-entity recognition system based on HMM[C]// Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition. Stroudsburg, PA: Association for Computational Linguistics, 2003: 41-48. 10.3115/1119384.1119390 | 
| 5 | FENG Y Y, SUN L, LV Y H. Chinese word segmentation and named entity recognition based on conditional random fields models[C]// Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2006: 181-184. | 
| 6 | EKBAL A, BANDYOPADHYAY S. Named entity recognition using support vector machine: a language independent approach[J]. International Journal of Electrical, Computer, and Systems Engineering, 2010, 4(2): 155-170. | 
| 7 | LI X N, YAN H, QIU X P, et al. FLAT: Chinese NER using flat-lattice transformer[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 6836-6842. 10.18653/v1/2020.acl-main.611 | 
| 8 | HE H F, SUN X. A unified model for cross-domain and semi-supervised named entity recognition in Chinese social media[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2017: 3216-3222. 10.1609/aaai.v31i1.10977 | 
| 9 | CAO P F, CHEN Y B, LIU K, et al. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 182-192. 10.18653/v1/d18-1017 | 
| 10 | LI H B, HAGIWARA M, LI Q, et al. Comparison of the impact of word segmentation on name tagging for Chinese and Japanese[C]// Proceedings of the 9th International Conference on Language Resources and Evaluation. Paris: European Language Resources Association, 2014: 2532-2536. | 
| 11 | ZHANG Y, YANG J. Chinese NER using lattice LSTM[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 1554-1564. 10.18653/v1/p18-1144 | 
| 12 | MA R T, PENG M L, ZHANG Q, et al. Simplify the usage of lexicon in Chinese NER[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 5951-5960. 10.18653/v1/2020.acl-main.528 | 
| 13 | GUI T, MA R T, ZHANG Q, et al. CNN-based Chinese NER with lexicon rethinking[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2019: 4982-4988. 10.24963/ijcai.2019/692 | 
| 14 | GUI T, ZOU Y C, ZHANG Q, et al. A lexicon-based graph neural network for Chinese NER[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 1040-1050. 10.18653/v1/d19-1096 | 
| 15 | LI J, SUN A X, HAN J L, et al. A survey on deep learning for named entity recognition[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(1): 50-70. 10.1109/tkde.2020.2981314 | 
| 16 | XU C W, WANG F Y, HAN J L, et al. Exploiting multiple embeddings for Chinese named entity recognition[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: ACM, 2019: 2269-2272. 10.1145/3357384.3358117 | 
| 17 | SUN Y, WANG S H, LI Y K, et al. ERNIE 2.0: a continual pre-training framework for language understanding[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 8968-8975. 10.1609/aaai.v34i05.6428 | 
| 18 | LIU W, XU T G, XU Q H, et al. An encoding strategy based word-character LSTM for Chinese NER[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 2379-2389. 10.18653/v1/n18-2 | 
| 19 | MENG Y X, WU W, WANG F, et al. Glyce: glyph-vectors for Chinese character representations[C/OL]// Proceedings of the 33rd Conference on Neural Information Processing Systems. [2021-03-15].. | 
| 20 | XUAN Z Y, BAO R, JIANG S Y. FGN: fusion glyph network for Chinese named entity recognition[C]// Proceedings of the 2020 China Conference on Knowledge Graph and Semantic Computing, CCIS 1356. Singapore: Springer, 2021: 28-40. | 
| 21 | YAN H, DENG B C, LI X N, et al. TENER: adapting transformer encoder for named entity recognition[EB/OL]. (2019-12-10) [2020-10-13].. | 
| 22 | ZHU Y Y, WANG G X. CAN-NER: convolutional attention network for Chinese named entity recognition[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 3384-3393. 10.18653/v1/N19-1342 | 
| 23 | DING R X, XIE P J, ZHANG X Y, et al. A neural multi-digraph model for Chinese NER with gazetteers[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 1462-1467. 10.18653/v1/p19-1141 | 
| 24 | SUI D B, CHEN Y B, LIU K, et al. Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 3830-3840. 10.18653/v1/d19-1396 | 
| 25 | WU F Z, LIU J X, WU C H, et al. Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation[C]// Proceedings of the 2019 World Wide Web Conference. New York: ACM, 2019: 3342-3348. 10.1145/3308558.3313743 | 
| 26 | XUE M G, YU B W, LIU T W, et al. Porous lattice transformer encoder for Chinese NER[C]// Proceedings of the 28th International Conference on Computational Linguistics. [S.l.]: International Committee on Computational Linguistics, 2020: 3831-3841. 10.18653/v1/2020.coling-main.340 | 
| 27 | ZHAO H S, YANG Y, ZHANG Q, et al. Improve neural entity recognition via multi-task data selection and constrained decoding[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 346-351. 10.18653/v1/n18-2056 | 
| 28 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. | 
| 29 | HETLAND M L. Python基础教程[M]. 3版. 袁国忠,译. 北京:人民邮电出版社, 2014:1-458. 10.1007/978-1-4842-0055-1_1 | 
| HETLAND M L. Beginning Python: From Novice to Professional[M]. 3rd ed. YUAN G Z, translated. Beijing: People’s Posts and Telecommunications Press, 2008:1-458. 10.1007/978-1-4842-0055-1_1 | |
| 30 | YANG J, ZHANG Y, LI L W, et al. YEDDA: a lightweight collaborative text span annotation tool[C]// Proceedings of ACL 2018, System Demonstrations. Stroudsburg, PA: Association for Computational Linguistics, 2018: 31-36. 10.18653/v1/p18-4006 | 
| 31 | 杨玉基,许斌,胡家威,等. 一种准确而高效的领域知识图谱构建方法[J]. 软件学报, 2018, 29(10): 2931-2947. 10.13328/j.cnki.jos.005552 | 
| YANG Y J, XU B, HU J W, et al. Accurate and efficient method for constructing domain knowledge graph[J]. Journal of Software, 2018, 29(10): 2931-2947. 10.13328/j.cnki.jos.005552 | |
| 32 | 李振,周东岱. 教育知识图谱的概念模型与构建方法研究[J]. 电化教育研究, 2019, 40(8): 78-86, 113. | 
| LI Z, ZHOU D D. Research on conceptual model and construction method of educational knowledge graph[J]. e-Education Research, 2019, 40(8): 78-86, 113. | 
| [1] | 贾洁茹, 杨建超, 张硕蕊, 闫涛, 陈斌. 基于自蒸馏视觉Transformer的无监督行人重识别[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2893-2902. | 
| [2] | 黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969. | 
| [3] | 杨鑫, 陈雪妮, 吴春江, 周世杰. 结合变种残差模型和Transformer的城市公路短时交通流预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2947-2951. | 
| [4] | 任烈弘, 黄铝文, 田旭, 段飞. 基于DFT的频率敏感双分支Transformer多变量长时间序列预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2739-2746. | 
| [5] | 李金金, 桑国明, 张益嘉. APK-CNN和Transformer增强的多域虚假新闻检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2674-2682. | 
| [6] | 方介泼, 陶重犇. 应对零日攻击的混合车联网入侵检测系统[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2763-2769. | 
| [7] | 丁宇伟, 石洪波, 李杰, 梁敏. 基于局部和全局特征解耦的图像去噪网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2571-2579. | 
| [8] | 邓凯丽, 魏伟波, 潘振宽. 改进掩码自编码器的工业缺陷检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2595-2603. | 
| [9] | 杨帆, 邹窈, 朱明志, 马振伟, 程大伟, 蒋昌俊. 基于图注意力Transformer神经网络的信用卡欺诈检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2634-2642. | 
| [10] | 孙焕良, 王思懿, 刘俊岭, 许景科. 社交媒体数据中水灾事件求助信息提取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2437-2445. | 
| [11] | 李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182. | 
| [12] | 黎施彬, 龚俊, 汤圣君. 基于Graph Transformer的半监督异配图表示学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1816-1823. | 
| [13] | 黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919. | 
| [14] | 于右任, 张仰森, 蒋玉茹, 黄改娟. 融合多粒度语言知识与层级信息的中文命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1706-1712. | 
| [15] | 吕锡婷, 赵敬华, 荣海迎, 赵嘉乐. 基于Transformer和关系图卷积网络的信息传播预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1760-1766. | 
| 阅读次数 | ||||||
| 全文 |  | |||||
| 摘要 |  | |||||
