计算机应用 ›› 2016, Vol. 36 ›› Issue (1): 133-137.DOI: 10.11772/j.issn.1001-9081.2016.01.0133

• 先进计算 • 上一篇    下一篇

基于MapReduce的支持向量机态势评估算法

陈珍1, 夏靖波1, 杨娟2, 韦泽鲲1   

  1. 1. 空军工程大学 信息与导航学院, 西安 710077;
    2. 93010部队89分队, 沈阳 110015
  • 收稿日期:2015-07-14 修回日期:2015-09-29 出版日期:2016-01-10 发布日期:2016-01-09
  • 通讯作者: 陈珍(1990-),女,内蒙古巴彦淖尔人,硕士研究生,主要研究方向:网络态势感知、大数据
  • 作者简介:夏靖波(1963-),男,河北秦皇岛人,教授,博士生导师,博士,CCF会员,主要研究方向:通信网络、网络管理;杨娟(1990-),女,湖北当阳人,助理工程师,主要研究方向:网络管理;韦泽鲲(1992-),男,陕西西安人,硕士研究生,主要研究方向:网络态势感知、云计算。
  • 基金资助:
    陕西省科技计划自然基金重点项目(2012JZ8005)。

Support vector machine situation assessment algorithm based on MapReduce

CHEN Zhen1, XIA Jingbo1, YANG Juan2, WEI Zekun1   

  1. 1. College of Information and Navigation, Air Force Engineering University, Xi'an Shaanxi 710077, China;
    2. Unit 89, 93010 Troops, Shenyang Liaoning 110015, China
  • Received:2015-07-14 Revised:2015-09-29 Online:2016-01-10 Published:2016-01-09
  • Supported by:
    This work is partially supported by the Provincial Technology Plan and Natural Science Foundation of Shaanxi (2012JZ8005).

摘要: 支持向量机(SVM)可以解决传统态势评估算法无法兼顾的“维数灾难”“过学习”及“非线性”等难题,却无法应对大规模样本的问题。为了有效应对态势评估中的大数据处理挑战,提出了一种基于MapReduce的SVM(MR-SVM)态势评估算法。该算法利用MapReduce并行计算模型,同时结合SVM可并行化的特点,通过设计主要的map函数和reduce函数,实现了SVM算法的并行化和主要参数的选取。在搭建的Hadoop平台上对改进算法与原算法进行了比较验证:对于小规模样本,改进算法反而"化简为繁",不比原算法效率高;但在大规模样本的处理上,原算法的训练时间随样本规模呈指数型增长,而改进算法的训练时间随样本规模并没有特别明显的增幅,体现出了较好的时间优势。实验结果表明,基于MapReduce改进的SVM很好地弥补了原算法"样本规模"的短板,更适用于大数据环境下的网络态势评估。

关键词: 支持向量机, 态势评估, MapReduce, Hadoop, 并行化

Abstract: Support Vector Machine (SVM) has good performance in dealing with dimensionality disaster, over fitting and nonlinearity, which other traditional situation assessment algorithms does not have. However SVM has low efficiency when dealing with large-scale data. To effectively confront the challenge of handling big data, a MapReduce-based SVM (MR-SVM) situation assessment algorithm was proposed. Considering the characteristics of SVM algorithm, the parallelization and parameter selection of SVM based on MapReduce programming was implemented by designing procedures of map function and reduce function. The performances of MR-SVM and SVM were compared on Hadoop platform, MR-SVM had lower efficiency than SVM when dealing with small-scale data, but much better performance when dealing with large-scale data. SVM had an exponential growth on training time with the growth of data scalability while MR-SVM has slow growth. The experiment results show that MR-SVM solves the problem of data scalability, therefore it is suitable for situation assessment in big data environment.

Key words: Support Vector Machine (SVM), situation assessment, MapReduce, Hadoop, parallelization

中图分类号: