《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2932-2939.DOI: 10.11772/j.issn.1001-9081.2022081163

• 多媒体计算与计算机仿真 • 上一篇    下一篇

基于一致性训练的半监督虚假招聘广告检测模型

王瑞琪, 纪淑娟(), 曹宁, 郭亚杰   

  1. 山东省智慧矿山信息技术重点实验室(山东科技大学),山东 青岛 266590
  • 收稿日期:2022-08-08 修回日期:2023-01-07 接受日期:2023-01-16 发布日期:2023-09-10 出版日期:2023-09-10
  • 通讯作者: 纪淑娟
  • 作者简介:王瑞琪(1997—),女,山东菏泽人,硕士研究生,主要研究方向:人工智能
    曹宁(1997—),男,山东菏泽人,博士研究生,主要研究方向:人工智能
    郭亚杰(1996—),男,山东东营人,硕士,主要研究方向:人工智能。
  • 基金资助:
    国家自然科学基金资助项目(71772107)

Semi-supervised fake job advertisement detection model based on consistency training

Ruiqi WANG, Shujuan JI(), Ning CAO, Yajie GUO   

  1. Shandong Provincial Key Laboratory of Wisdom Mine Information Technology (Shandong University of Science and Technology),Qingdao Shandong 266590,China
  • Received:2022-08-08 Revised:2023-01-07 Accepted:2023-01-16 Online:2023-09-10 Published:2023-09-10
  • Contact: Shujuan JI
  • About author:WANG Ruiqi, born in 1997, M. S. candidate. Her research interests include artificial intelligence.
    CAO Ning, born in 1997, Ph. D. candidate. His research interests include artificial intelligence.
    GUO Yajie, born in 1996, M. S. His research interests include artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(71772107)

摘要:

虚假招聘广告的泛滥不仅会损害求职者的合法权益,还会扰乱正常的就业秩序,造成求职者极差的用户体验。为了有效检测出虚假招聘广告,提出一种基于一致性训练的半监督虚假招聘广告检测模型(SSC)。首先,对所有数据应用一致性正则项提升模型的性能;然后,通过联合训练的方式整合有监督损失和无监督损失得到半监督损失;最后,使用半监督损失对模型进行优化。在两个真实数据集EMSCAD (EMployment SCam Aegean Dataset)和IMDB (Internet Movie DataBase)上的实验结果表明,SSC在标签数据仅为20时取得了最好的检测效果,准确率与现有先进的半监督学习模型UDA (Unsupervised Data Augmentation)相比提升了2.2和2.8个百分点,与深度学习模型BERT (Bidirectional Encoder Representations from Transformers)相比提升了3.4和11.7个百分点,同时还具有较好的可拓展性。

关键词: 虚假信息检测, 半监督学习, 网络招聘, 虚假招聘广告, 一致性训练

Abstract:

The flood of fake job advertisements will not only damage the legitimate rights and interests of job seekers but also disrupt the normal employment order, which results in a poor user experience for job seekers. To effectively detect fake job advertisements, an SSC (Semi-Supervised fake job advertisements detection model based on Consistency training) was proposed. Firstly, the consistency regularization term was applied on all the data to improve the performance of the model. Then, supervised loss and unsupervised loss were integrated through joint training to obtain the semi-supervised loss. Finally, the semi-supervised loss was used to optimize the model. Experimental results on two real datasets EMSCAD (EMployment SCam Aegean Dataset) and IMDB (Internet Movie DataBase) show that SSC achieves the best detection performance when the labeled data are only 20, and the accuracy is increased by 2.2 and 2.8 percentage points compared with the existing advanced semi-supervised learning model UDA (Unsupervised Data Augmentation), and is increased by 3.4 and 11.7 percentage points compared with the deep learning model BERT (Bidirectional Encoder Representations from Transformers). At the same time, SSC has good scalability.

Key words: false information detection, semi-supervised learning, online recruitment, fake job advertisement, consistency training

中图分类号: