• •    

基于一致性训练的半监督虚假招聘广告检测模型

王瑞琪,纪淑娟,曹宁,郭亚杰   

  1. 山东科技大学
  • 收稿日期:2022-08-08 修回日期:2022-12-13 发布日期:2023-02-28
  • 通讯作者: 纪淑娟
  • 基金资助:
    面向大数据流的信用攻击群体及关键人物发现方法研究

Semi-Supervised Fake Job Advertisements Detection Model based on Consistency Training

  • Received:2022-08-08 Revised:2022-12-13 Online:2023-02-28

摘要: 虚假招聘广告的泛滥不仅会损害求职者的合法权益,还会扰乱正常的就业秩序,造成求职者极差的用户体验。为了有效检测出虚假招聘广告,提出了一种基于一致性训练的半监督虚假招聘广告检测模型。该模型对所有数据使用一致性正则项来提升模型的性能,通过联合训练的方式整合有监督损失和无监督损失得到半监督损失,最后使用半监督损失对模型进行优化。在两个真实数据集EMSCAD和IMDB上的实验结果表明,在标签数据仅为20条时该模型取得了最好的检测效果,准确率分别为70%和71.7%。与现有先进的半监督学习模型UDA 相比准确率分别提升了2.2%和2.8%,与深度学习模型Bert相比准确率分别提升了3.4%和11.7%,同时本文所提模型还具有较好的可拓展性。

关键词: 网络招聘, 虚假招聘广告, 虚假招聘广告检测, 半监督学习, 一致性训练

Abstract: The flood of fake job advertisements will not only damage the legitimate rights and interests of job seekers but also disrupt the normal employment order, which results in a poor user experience for job seekers. To effectively detect fake job advertisements, a semi-supervised fake job advertisements detection model based on consistency training was proposed. The consistency regularization term was used to improve the performance of the model. Supervised loss and unsupervised loss were integrated through joint training to obtain semi-supervised loss. Finally, the semi-supervised loss was used to optimize the model. Experimental results on two real datasets EMSCAD and IMDB show that the proposed model achieves the best detection performance when the labeled data is 20, and the accuracy reaches 70% and 71.7%, respectively. Compared with the existing advanced semi-supervised learning model UDA, the accuracy increased by 2.2% and 2.8% and compared with the deep learning model Bert, the accuracy increased by 3.4% and 11.7% respectively. At the same time, the proposed model had good scalability.

Key words: online recruitment, fake job advertisements, fake job advertisements detection, semi-supervised learning, consistency training

中图分类号: