计算机应用 ›› 2020, Vol. 40 ›› Issue (8): 2262-2267.DOI: 10.11772/j.issn.1001-9081.2019122141

• 数据科学与技术 • 上一篇    下一篇

基于特征选择和超参数优化的恐怖袭击组织预测方法

肖跃雷1,2, 张云娇1   

  1. 1. 西安邮电大学 现代邮政学院, 西安 710061;
    2. 陕西省信息化工程研究院, 西安 710075
  • 收稿日期:2019-12-23 修回日期:2020-03-23 出版日期:2020-08-10 发布日期:2020-04-23
  • 通讯作者: 肖跃雷(1979-),男,江西吉安人,副教授,博士,主要研究方向:信息安全、大数据;xiao_yuelei@163.com
  • 作者简介:张云娇(1993-),女,陕西韩城人,硕士研究生,主要研究方向:数据分析与挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61741216);陕西省科技统筹创新工程计划项目(2016KTTSGY01-03);西安邮电大学“西邮新星”团队支持计划项目(401-205010001)。

Terrorist attack organization prediction method based on feature selection and hyperparameter optimization

XIAO Yuelei1,2, ZHANG Yunjiao1   

  1. 1. School of Modern Posts, Xi'an University of Posts&Telecommunications, Xi'an Shaanxi 710061, China;
    2. Shaanxi Information Engineering Research Institute, Xi'an Shaanxi 710075, China
  • Received:2019-12-23 Revised:2020-03-23 Online:2020-08-10 Published:2020-04-23
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61741216), the Shaanxi Science & Technology Co-ordination & Innovation Program (2016KTTSGY01-03), the New Star Team Project of Xi'an University of Posts & Telecommunications (401-205010001).

摘要: 针对恐怖袭击事件难以找到恐怖袭击组织以及恐怖袭击事件数据的样本不平衡问题,提出了一种基于特征选择和超参数优化的恐怖袭击组织预测方法。首先,利用随机森林(RF)在处理不平衡数据上的优势,通过RF迭代来进行后向特征选择;然后,利用决策树(DT)、RF、Bagging和XGBoost这四种主流分类器对恐怖袭击组织进行分类预测,并利用贝叶斯优化方法对这些分类器进行超参数优化;最后,利用全球恐怖主义数据库(GTD)评价了这些分类器在多数类样本和少数类样本上的分类预测性能。实验结果表明:所提方法提高了对恐怖袭击组织的分类预测性能,其中使用RF和Bagging时的分类预测性能最佳,准确率分别达到0.823 9和0.831 6,特别是在少数类样本上的分类预测性能有明显提高。

关键词: 随机森林迭代, 后向特征选择, 贝叶斯优化, 分类器, 恐怖袭击组织

Abstract: Aiming at the difficulty of finding terrorist attack organizations and the imbalance of terrorist attack data samples, a terrorist attack organization prediction method based on feature selection and hyperparameter optimization was proposed. First, by taking the advantage of Random Forest (RF) in dealing with imbalanced data, the backward feature selection was carried out through the RF iteration. Second, four mainstream classifiers including Decision Tree (DT), RF, Bagging and XGBoost were used to classify and predict terrorist attack organizations, and the Bayesian optimization method was used to optimize the hyperparameters of these classifiers. Finally, the Global Terrorism Database (GTD) was used to evaluate the classification prediction performance of these classifiers on the majority class samples and minority class samples. Experimental results show that the proposed method improves the classification and prediction performance of terrorist attack organizations, and the classification and prediction performance is the best when using RF and Bagging, with the accuracy of 0.823 9 and 0.831 6 respectively. Especially for minority class samples, the classification and prediction performance when using RF and Bagging is significantly improved.

Key words: random forest iteration, backward feature selection, Bayesian optimization, classifier, terrorist attack organization

中图分类号: