计算机应用 ›› 2014, Vol. 34 ›› Issue (3): 695-699.DOI: 10.11772/j.issn.1001-9081.2014.03.0695

• 先进计算 • 上一篇    下一篇

基于Hadoop云计算平台的大规模图像检索方案

朱为盛1,2,王鹏3   

  1. 1. 中国科学院大学,北京100049;
    2. 中国科学院 成都计算机应用研究所,成都610041
    3. 成都信息工程学院 并行计算实验室,成都610225
  • 收稿日期:2013-08-23 修回日期:2013-10-20 出版日期:2014-03-01 发布日期:2014-04-01
  • 通讯作者: 朱为盛
  • 作者简介:朱为盛(1986-),男,浙江温州人,硕士研究生,主要研究方向:云计算、多媒体内容分析;王鹏(1975-),男,四川乐山人,教授,博士生导师,博士,CCF高级会员,主要研究方向:云计算、并行计算。
  • 基金资助:

    国家自然科学基金资助项目;四川省青年科学基金前期资助;四川省教育厅自然科学重点项目

Large-scale image retrieval solution based on Hadoop cloud computing platform

ZHU Weisheng1,2,WANG Peng3   

  1. 1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu Sichuan 610041, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Parallel Computing Laboratory, Chengdu University of Information Technology, Chengdu Sichuan 610225, China
  • Received:2013-08-23 Revised:2013-10-20 Online:2014-03-01 Published:2014-04-01
  • Contact: ZHU Weisheng
  • Supported by:

    National Natural Science Foundation

摘要:

针对传统图像检索方法在处理海量图像数据时面临困扰的问题,提出了一种基于传统视觉词袋(BoVW)模型和MapReduce计算模型的大规模图像检索(MR-BoVW)方案。该方案充分利用了Hadoop云计算平台海量存储能力和强大的并行计算能力。为了更好地处理图像数据,首先引入一种改进的Hadoop图像数据处理方法,在此基础上分特征向量生成、特征聚类、图片的向量表示与倒排索引构建三个阶段MapReduce化。多组实验表明,MR-BoVW方案具有优良的加速比、扩展率以及数据伸缩率,效率均大于0.62,扩展率以及数据伸缩率曲线平缓,适于大规模图像检索。

关键词: 云计算, Hadoop, MapReduce, 图像检索, 视觉词袋模型

Abstract:

Concerning that the traditional image retrieval methods are confronted with massive image data processing problems, a new solution for large-scale image retrieval, named MR-BoVW, was proposed, which was based on the traditional Bag of Visual Words (BVW) approach and MapReduce model to take advantage of the massive storage capacity and powerful parallel computing ability of Hadoop. To handle image data well, firstly an improved method for Hadoop image processing was introduced, and then, the MapReduce layout was divided into three stages: feature vector generation, feature clustering, image representation and inverted index construction. The experimental results demonstrate that the MR-BoVW solution shows good performance on speedup, scaleup, and sizeup. In fact, the efficiency results are all greater than 0.62, and the curve of scaleup and sizeup is gentle. Thus it is suitable for large-scale image retrieval.

Key words: cloud computing, hadoop, mapreduce, image retrieval, Bag of Visual Words(BoVW)

中图分类号: