Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Named entity recognition for sensitive information based on data augmentation and residual networks

Li LI, Han SONG, Peihe LIU, Hanlin CHEN

Journal of Computer Applications 2025, 45 (9): 2790-2797. DOI: 10.11772/j.issn.1001-9081.2024081143

Abstract （25）

HTML （2）

PDF （1579KB）（10）

Save

Named Entity Recognition （NER） for sensitive information is a key technology of privacy protection. However， the existing NER methods face challenges in the sensitive information domain due to the scarcity of relevant datasets and the traditional techniques have problems such as low accuracy and poor portability. To address these issues， firstly， a sensitive information NER dataset， SenResume， was constructed by crawling and manually annotating text corpora containing sensitive information from the Internet. Secondly， a data augmentation model — Entity-based Masked Language Modeling （E-MLM） was proposed to utilize whole-word masking technique to generate new data samples， and expand the dataset to enhance data diversity. Thirdly， a RoBERTa-ResBiLSTM-CRF model was introduced， which combined the Robustly optimized Bidirectional Encoder Representations from Transformers approach with Whole Word Masking （RoBERTa-WWM） to extract contextual features for generating high-quality word vector representations， while ResBiLSTM （Residual Bidirectional Long Short-Term Memory） was employed to enhance text features. Finally， a multi-layer residual network was applied to improve training efficiency and model stability， and Conditional Random Field （CRF） was used for global decoding to enhance the accuracy of sequence labeling. Experimental results demonstrate that E-MLM improves dataset quality significantly， and the proposed NER model achieves the optimal performance on both the original and 1x augmented datasets， with F1 scores of 96.16% and 97.84%， respectively. It can be seen that the introduction of E-MLM and residual networks contribute to improvements in the accuracy of sensitive information NER.

Table and Figures | Reference | Related Articles | Metrics