Large-scale image retrieval solution based on Hadoop cloud computing platform
ZHU Weisheng1,2,WANG Peng3
1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu Sichuan 610041, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China;
3. Parallel Computing Laboratory, Chengdu University of Information Technology, Chengdu Sichuan 610225, China
Concerning that the traditional image retrieval methods are confronted with massive image data processing problems, a new solution for large-scale image retrieval, named MR-BoVW, was proposed, which was based on the traditional Bag of Visual Words (BVW) approach and MapReduce model to take advantage of the massive storage capacity and powerful parallel computing ability of Hadoop. To handle image data well, firstly an improved method for Hadoop image processing was introduced, and then, the MapReduce layout was divided into three stages: feature vector generation, feature clustering, image representation and inverted index construction. The experimental results demonstrate that the MR-BoVW solution shows good performance on speedup, scaleup, and sizeup. In fact, the efficiency results are all greater than 0.62, and the curve of scaleup and sizeup is gentle. Thus it is suitable for large-scale image retrieval.