计算机应用 ›› 2014, Vol. 34 ›› Issue (6): 1600-1603.DOI: 10.11772/j.issn.1001-9081.2014.06.1600

• 先进计算 • 上一篇    下一篇

基于MapReduce的图像分类方法

韩伟,张学庆,陈旸   

  1. 中国电子科技集团公司 第五十四研究所, 石家庄 050081
  • 收稿日期:2013-12-03 修回日期:2014-02-19 出版日期:2014-06-01 发布日期:2014-07-02
  • 通讯作者: 韩伟
  • 作者简介:韩伟(1977-),男,山东黄岛人,博士,主要研究方向:云计算、大数据分析;张学庆(1968-),男,河北定兴人,研究员,主要研究方向:航天地面应用、云计算;陈旸(1971-),男,河北石家庄人,高级工程师,主要研究方向:综合电子信息系统。
  • 基金资助:

    河北省高层次人才资助项目

MapReduce Based Image Classification Approach

WEI Han,ZHANG Xueqing,CHEN Yang   

  1. The 54th Research Institute, China Electronics Technology Group Corporation, Shijiazhuang Hebei 050081,China
  • Received:2013-12-03 Revised:2014-02-19 Online:2014-06-01 Published:2014-07-02
  • Contact: WEI Han

摘要:

针对现有的方法不能有效用于图像大数据分类的问题,提出了一种基于MapReduce编程模型的图像分类方法,在分类的全过程利用MapReduce机制加速分类过程。首先,利用MapReduce机制实现对图像尺度不变特征变换(SIFT)特征的分布式提取,并通过稀疏编码将其转换为稀疏向量,生成图像的稀疏特征;然后,利用MapReduce机制实现对随机森林的分布式训练;在此基础上,利用MapReduce机制对图像集实现基于随机森林方法的并行分类。通过在Hadoop平台的实验结果表明,该方法能够充分利用MapReduce框架的分布式特性,对大规模图像数据实现快速准确分类。

Abstract:

Many existing image classification algorithms cannot be used for big image data. A new approach was proposed to accelerate big image classification based on MapReduce. The whole image classification process was reconstructed to fit the MapReduce programming model. First, the Scale Invariant Feature Transform (SIFT) feature was extracted by MapReduce, then it was converted to sparse vector using sparse coding to get the sparse feature of the image. The MapReduce was also used to distributed training of random forest, and on the basis of it, the big image classification was achieved parallel. The MapReduce based algorithm was evaluated on a Hadoop cluster. The experimental results show that the proposed approach can classify images simultaneously on Hadoop cluster with a good speedup rate.

中图分类号: