Sequence labeling optimization method combined with entity boundary offset

doi:10.11772/j.issn.1001-9081.2024071036

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (8): 2522-2529.DOI: 10.11772/j.issn.1001-9081.2024071036

• Artificial intelligence • Previous Articles

Sequence labeling optimization method combined with entity boundary offset

Jing YU¹^,²^,³, Yanping CHEN¹^,²^,³(), Ying HU¹^,²^,³, Ruizhang HUANG¹^,²^,³, Yongbin QIN¹^,²^,³

^1.Engineering Research Center of Ministry of Education for Text Computing and Cognitive Intelligence，Guizhou University，Guiyang Guizhou 550025，China
^2.State Key Laboratory of Public Big Data （Guizhou University），Guiyang Guizhou 550025，China
^3.College of Computer Science and Technology，Guizhou University，Guiyang Guizhou 550025，China

Received:2024-07-23 Revised:2024-10-12 Accepted:2024-10-16 Online:2024-11-19 Published:2025-08-10
Contact: Yanping CHEN
About author:YU Jing， born in 1999， M. S. candidate. Her research interests include natural language processing， named entity recognition.
HU Ying， born in 1996， Ph. D. candidate. His research interests include natural language processing.
HUANG Ruizhang， born in 1979， Ph. D.， professor. Her research interests include data fusion analysis， text mining， network mining， knowledge discovery.
QIN Yongbin， born in 1980， Ph. D.， professor. His research interests include big data management and application， multi-source data fusion.
Supported by:
Major Science and Technology Foundation of Guizhou Province(［2024］003);National Key Research and Development Program of China(2023YFC3304500);National Natural Science Foundation of China(62166007)

结合实体边界偏移的序列标注优化方法

余婧¹^,²^,³, 陈艳平¹^,²^,³(), 扈应¹^,²^,³, 黄瑞章¹^,²^,³, 秦永彬¹^,²^,³

^1.贵州大学文本计算与认知智能教育部工程研究中心，贵阳 550025
^2.公共大数据国家重点实验室（贵州大学），贵阳 550025
^3.贵州大学计算机科学与技术学院，贵阳 550025

通讯作者: 陈艳平
作者简介:余婧（1999—），女，贵州贵阳人，硕士研究生，CCF会员，主要研究方向：自然语言处理、命名实体识别
扈应（1996—），男，重庆人，博士研究生，主要研究方向：自然语言处理
黄瑞章（1979—），女，天津人，教授，博士，CCF会员，主要研究方向：数据融合分析、文本挖掘、网络挖掘、知识发现
秦永彬（1980—），男，山东烟台人，教授，博士，CCF高级会员，主要研究方向：大数据管理与应用、多源数据融合。
基金资助:
贵州省科学技术基金重点资助项目(［2024］003);国家重点研发计划项目(2023YFC3304500);国家自然科学基金资助项目(62166007)

Abstract

Abstract:

To address the issue of positional deviation between the predicted entity boundaries and the true entity boundaries in sequence labeling models in Named Entity Recognition （NER）， a sequence labeling optimization method combined with entity boundary offset was proposed. Firstly， the concept of boundary offset was introduced to quantify the positional relationship between each word and entity boundaries， and the relative offset between each word and the nearest entity boundary was calculated， and these offsets were used to generate candidate spans for the entity boundaries. Secondly， Intersection-over-Union （IoU） was used as a filtering criterion to filter out low-quality candidate spans， thereby retaining those spans most likely to represent the entity boundary. Finally， the boundary adjustment module was used to update positions of the entity boundaries in the label sequence based on the candidate spans， thereby optimizing the entity boundaries in the entire label sequence and improving the performance of entity recognition. Experimental results show that the proposed method achieves the F1-scores of 80.48%， 96.42%， and 94.80% on CLUENER2020， Resume-zh， and MSRA datasets， respectively， validating its effectiveness in NER task.

Key words: Named Entity Recognition (NER), sequence labeling, boundary offset, Intersection-over-Union (IoU), boundary adjustment

摘要：

针对序列标注模型在命名实体识别（NER）任务中出现的识别的实体边界与真实的实体边界之间存在位置偏差的问题，提出一种结合实体边界偏移的序列标注优化方法。首先，引入边界偏移量的概念量化每个词与实体边界之间的位置关系，计算每个词与最近实体边界的相对偏移量，再利用这些偏移量生成实体边界的候选跨度；其次，利用交并比（IoU）作为筛选标准过滤低质量的候选跨度，以保留最有可能代表实体边界的候选跨度；最后，通过边界调整模块，根据候选跨度更新标签序列中实体边界的位置，从而优化整个标签序列的实体边界，并提升实体识别的性能。实验结果表明，所提方法在数据集CLUENER2020、Resume-zh和MSRA上的F1值分别达到了80.48%、96.42%和94.80%，验证了该方法对NER任务的有效性。

关键词: 命名实体识别, 序列标注, 边界偏移, 交并比, 边界调整

CLC Number:

TP391.1

Jing YU, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Sequence labeling optimization method combined with entity boundary offset[J]. Journal of Computer Applications, 2025, 45(8): 2522-2529.

余婧, 陈艳平, 扈应, 黄瑞章, 秦永彬. 结合实体边界偏移的序列标注优化方法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2522-2529.

Figures/Tables 12

References 39

[1]	王颖洁，张程烨，白凤波，等. 中文命名实体识别研究综述［J］. 计算机科学与探索， 2023， 17（2）： 324-341.
	WANG Y J， ZHANG C Y， BAI F B， et al. Review of Chinese named entity recognition research［J］. Journal of Frontiers of Computer Science and Technology， 2023， 17（2）： 324-341.
[2]	ZHANG J， XIE J， HOU W， et al. Mapping the knowledge structure of research on patient adherence： knowledge domain visualization based co-word analysis and social network analysis［J］. PLoS ONE， 2012， 7（4）： No.e34497.
[3]	BABYCH B， HARTLEY A. Improving machine translation quality with automatic named entity recognition［C］// Proceedings of the 7th International EAMT Workshop on MT and Other Language Technology Tools， Improving MT Through Other Language Technology Tools， Resource and Tools for Building MT at EACL 2003. Stroudsburg： ACL， 2003：1-8.
[4]	MOLLÁ D， VAN ZAANEN M， SMITH D. Named entity recognition for question answering［C］// Proceedings of the Australasian Language Technology Association Workshop 2006. ［S.l.］： Australasian Language Technology Association， 2006： 51-58.
[5]	HOPFIELD J J. Neural networks and physical systems with emergent collective computational abilities［J］. Proceedings of the National Academy of Sciences of the United States of America， 1982， 79（8）： 2554-2558.
[6]	HOCHREITER S， SCHMIDHUBER J. Long short-term memory［J］. Neural Computation， 1997， 9（8）： 1735-1780.
[7]	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
[8]	GUO J， LIU J， YANG C， et al. BERT-LSTM network prediction model based on Transformer［C］// Proceedings of the 36th Chinese Control and Decision Conference. Piscataway： IEEE， 2024： 3098-3103.
[9]	CAI Y， LIU Q， GAN Y， et al. DiFiNet： boundary-aware semantic differentiation and filtration network for nested named entity recognition［C］// Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2024： 6455-6471.
[10]	MO Y， YANG J， LIU J， et al. MCL-NER： cross-lingual named entity recognition via multi-view contrastive learning［C］// Proceedings of the 38th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2024： 18789-18797.
[11]	WANG Y， LU L， YANG W， et al. Chinese named entity recognition based on heterogeneous graph and dynamic attention network［C］// Proceedings of the IEEE 15th International Symposium on Autonomous Decentralized System. Piscataway： IEEE， 2023： 1-8.
[12]	WU S， SONG X， FENG Z， et al. NFLAT： non-flat-lattice Transformer for Chinese named entity recognition［EB/OL］. ［2024-10-10］..
[13]	JU M， MIWA M， ANANIADOU S. A neural layered model for nested named entity recognition［C］// Proceedings of the Conference of the 2018 North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg： ACL， 2018： 1446-1459.
[14]	RAMSHAW L A， MARCUS M P. Text chunking using transformation-based learning［M］// ARMSTRONG S， CHURCH K， ISABELLE P， et al. Natural language processing using very large corpora. Dordrecht： Springer， 1999： 157-176.
[15]	CHEN C， KONG F. Enhancing entity boundary detection for better Chinese named entity recognition［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing （Volume 2： Short Papers）. Stroudsburg： ACL， 2021： 20-25.
[16]	TAN C， QIU W， CHEN M， et al. Boundary enhanced neural span classification for nested named entity recognition［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2020： 9016-9023.
[17]	COLLOBERT R， WESTON J， BOTTOU L， et al. Natural language processing （almost） from scratch［J］. Journal of Machine Learning Research， 2011， 12：2493-2537.
[18]	HUANG Z， XU W， YU K. Bidirectional LSTM-CRF models for sequence tagging［EB/OL］. ［2024-10-10］..
[19]	SCHUSTER M， PALIWAL K K. Bidirectional recurrent neural networks［J］. IEEE Transactions on Signal Processing， 1997， 45（11）： 2673-2681.
[20]	LAMPLE G， BALLESTEROS M， SUBRAMANIAN S， et al. Neural architectures for named entity recognition［C］// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2016： 260-270.
[21]	LI Z， SONG M， ZHU Y， et al. Chinese nested named entity recognition based on boundary prompt［C］// Proceedings of the 2023 International Conference on Web Information Systems and Applications， LNCS 14094. Singapore： Springer， 2023： 331-343.
[22]	WANG C， XIONG X， WANG L， et al. A lexicon enhanced Chinese long named entity recognition using word-aware attention［C］// Proceedings of the 6th International Conference on Machine Learning and Natural Language Processing. New York： ACM， 2023： 234-242.
[23]	PETERS M E， NEUMANN M， IYYER M， et al. Deep contextualized word representations［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg： ACL， 2018： 2227-2237.
[24]	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
[25]	LI D， YAN L， YANG J， et al. Dependency syntax guided BERT-BILSTM-GAM-CRF for Chinese NER［J］. Expert Systems with Applications， 2022， 196： No.116682.
[26]	ZHU P， CHENG D， YANG F， et al. Improving Chinese named entity recognition by large-scale syntactic dependency graph［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2022， 30： 979-991.
[27]	ZHANG Y， YANG J. Chinese NER using lattice LSTM［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2018： 1554-1564.
[28]	MA R， PENG M， ZHANG Q， et al. Simplify the usage of lexicon in Chinese NER［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 5951-5960.
[29]	LI X， YAN H， QIU X， et al. FLAT： Chinese NER using flat-lattice Transformer［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 6836-6842.
[30]	XU L， TONG Y， DONG Q， et al. CLUENER2020： fine-grained named entity recognition dataset and benchmark for Chinese［EB/OL］. ［2024-10-10］..
[31]	LEVOW G A. The third international Chinese language processing bakeoff： word segmentation and named entity recognition［C］// Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. Stroudsburg： ACL， 2006： 108-117.
[32]	XUAN Z， BAO R， JIANG S. FGN： fusion glyph network for Chinese named entity recognition［C］// Proceedings of the 2020 China Conference on Knowledge Graph and Semantic Computing， CCIS 1356. Singapore： Springer， 2021： 28-40.
[33]	WU S， SONG X， FENG Z. MECT： multi-metadata embedding based cross-Transformer for Chinese named entity recognition［C］// Proceedings of the 59th Joint Conference of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg： ACL， 2021： 1529-1539.
[34]	WANG X， XU X， HUANG D， et al. Multi-task label-wise transformer for Chinese named entity recognition［J］. ACM Transactions on Asian and Low-Resource Language Information Processing， 2023， 22（4）： No.118.
[35]	LONG K， ZHAO H， SHAO Z， et al. Deep neural network with embedding fusion for Chinese named entity recognition［J］. ACM Transactions on Asian and Low-Resource Language Information Processing， 2023， 22（3）： No.91.
[36]	LI X， FENG J， MENG Y， et al. A unified MRC framework for named entity recognition［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 5849-5859.
[37]	LI X， SUN X， MENG Y， et al. Dice loss for data-imbalanced NLP tasks［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 465-476.
[38]	TAN Z， SHEN Y， ZHANG S， et al. A sequence-to-set network for nested named entity recognition［C］// Proceedings of the 30th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2021：3936-3942.
[39]	SHEN Y， WANG X， TAN Z， et al. Parallel instance query network for named entity recognition［C］// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2022： 947-961.

数据集	样本数
数据集	训练集	验证集	测试集
CLUENER2020	10 748	—	1 342
Resume-zh	3 821	463	477
MSRA	46 364	—	4 365

数据集	样本数
数据集	训练集	验证集	测试集
CLUENER2020	10 748	—	1 342
Resume-zh	3 821	463	477
MSRA	46 364	—	4 365

参数	设定值	参数	设定值
LSTM hidden size	1 024	词向量维度	768
Batch size	16	权重衰减	0.01
Epoch	50	学习率	3×10^-5
Dropout	0.5	优化器	Adam

参数	设定值	参数	设定值
LSTM hidden size	1 024	词向量维度	768
Batch size	16	权重衰减	0.01
Epoch	50	学习率	3×10^-5
Dropout	0.5	优化器	Adam

数据集	方法	P	R	F1
CLUENER2020	BiLSTM-CRF^［30］	71.06	68.97	70.00
	Bert-CRF^［30］	77.24	80.46	78.82
	RoBERTa-CRF^［30］	79.26	81.69	80.42
	本文方法	79.89	81.08	80.48
Resume-zh	Lattice-LSTM^［27］	94.81	94.11	94.46
	SoftLexicon^［28］	96.08	96.13	96.11
	FLAT^［29］	—	—	95.86
	FGN^［32］	96.49	97.08	96.79
	MECT^［33］	—	—	95.98
	MTLWT^［34］	—	—	96.33
	EF-DNN^［35］	95.47	95.64	95.56
	本文方法	96.84	96.01	96.42
MSRA	MRC^［36］	90.39	89.00	89.68
	MRC+DSC^［37］	96.67	96.77	96.72
	Seq-to-set^［38］	93.21	91.97	92.58
	PIQN^［39］	93.61	93.35	93.48
	EF-DNN^［35］	94.13	92.65	93.39
	本文方法	95.49	94.13	94.80

Sequence labeling optimization method combined with entity boundary offset

结合实体边界偏移的序列标注优化方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 39

Related Articles 15

Recommended Articles

Metrics

边界位置	CLUENER2020			Resume-zh			MSRA
边界位置	P	R	F1	P	R	F1	P	R	F1
开始边界	89.33	92.38	90.83	97.31	97.61	97.48	97.69	97.57	97.63
结束边界	86.69	91.15	88.86	97.93	98.40	98.16	96.35	97.11	96.73

[1]	Zhangjie XU, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Nested named entity recognition combined with boundary generation by multi-objective learning [J]. Journal of Computer Applications, 2025, 45(7): 2229-2236.
[2]	Lixiao ZHANG, Yao MA, Yuli YANG, Dan YU, Yongle CHEN. Large-scale IoT binary component identification based on named entity recognition [J]. Journal of Computer Applications, 2025, 45(7): 2288-2295.
[3]	Biqing ZENG, Guangbin ZHONG, James Zhiqing WEN. Few-shot named entity recognition based on decomposed fuzzy span [J]. Journal of Computer Applications, 2025, 45(5): 1504-1510.
[4]	Jie HU, Shuaixing WU, Zhilan CAO, Yan ZHANG. Named entity recognition model based on global information fusion and multi-dimensional relation perception [J]. Journal of Computer Applications, 2025, 45(5): 1511-1519.
[5]	Xueqiang LYU, Tao WANG, Xindong YOU, Ge XU. HTLR： named entity recognition framework with hierarchical fusion of multi-knowledge [J]. Journal of Computer Applications, 2025, 45(1): 40-47.
[6]	Huanliang SUN, Siyi WANG, Junling LIU, Jingke XU. Help-seeking information extraction model for flood event in social media data [J]. Journal of Computer Applications, 2024, 44(8): 2437-2445.
[7]	Youren YU, Yangsen ZHANG, Yuru JIANG, Gaijuan HUANG. Chinese named entity recognition model incorporating multi-granularity linguistic knowledge and hierarchical information [J]. Journal of Computer Applications, 2024, 44(6): 1706-1712.
[8]	Yongfeng DONG, Jiaming BAI, Liqin WANG, Xu WANG. Chinese named entity recognition combining prior knowledge and glyph features [J]. Journal of Computer Applications, 2024, 44(3): 702-708.
[9]	Xiaoyan ZHANG, Zhengyu DUAN. Cross-lingual zero-resource named entity recognition model based on sentence-level generative adversarial network [J]. Journal of Computer Applications, 2023, 43(8): 2406-2411.
[10]	Jingsheng LEI, Kaijun LA, Shengying YANG, Yi WU. Joint entity and relation extraction based on contextual semantic enhancement [J]. Journal of Computer Applications, 2023, 43(5): 1438-1444.
[11]	Guanyou XU, Weisen FENG. Python named entity recognition model based on transformer [J]. Journal of Computer Applications, 2022, 42(9): 2693-2700.
[12]	Jie HU, Yan HU, Mengchi LIU, Yan ZHANG. Chinese named entity recognition based on knowledge base entity enhanced BERT model [J]. Journal of Computer Applications, 2022, 42(9): 2680-2685.
[13]	Yayao ZUO, Haoyu CHEN, Zhiran CHEN, Jiawei HONG, Kun CHEN. Named entity recognition method combining multiple semantic features [J]. Journal of Computer Applications, 2022, 42(7): 2001-2008.
[14]	Yi ZHANG, Shuangsheng WANG, Bin HE, Peiming YE, Keqiang LI. Named entity recognition method of elementary mathematical text based on BERT [J]. Journal of Computer Applications, 2022, 42(2): 433-439.
[15]	Lanlan ZENG, Yisong WANG, Panfeng CHEN. Named entity recognition based on BERT and joint learning for judgment documents [J]. Journal of Computer Applications, 2022, 42(10): 3011-3017.