计算机应用 ›› 2013, Vol. 33 ›› Issue (04): 1026-1030.DOI: 10.3724/SP.J.1087.2013.01026

• 先进计算 • 上一篇    下一篇

面向Web电子产品信息分布式检索系统的设计与实现

张渊源1,张琴燕2,蒋关富3   

  1. 1. 浙江中医药大学 信息技术学院,杭州 310053
    2. 浙江大学 计算中心,杭州 310058
    3. 浙江大学 计算机科学与技术学院,杭州 315100
  • 收稿日期:2012-09-06 修回日期:2012-10-28 出版日期:2013-04-01 发布日期:2013-04-23
  • 通讯作者: 张琴燕
  • 作者简介:张渊源(1980-),女,浙江杭州人,讲师,硕士,主要研究方向:信息系统、医疗软件中间件;张琴燕(1976- ),女,浙江杭州人,工程师,硕士,主要研究方向:语义Web、中间件;蒋关富(1986-),男,浙江杭州人,硕士研究生,主要研究方向:信息融合。
  • 基金资助:

    浙江教育厅科研项目(Y201225127);浙江省自然科学基金资助项目(LY12F02029);国家科技支撑计划项目(2011BAH16B04)

Design and implementation of distributed retrieval system for electronic products information

ZHANG YuanYuan1,ZHANG Qinyan2,JIANG Guanfu3   

  1. 1. College of Information Technology, Zhejiang Chinese Medical University, Hangzhou Zhejiang 310053, China
    2. Computer Center, Zhejiang University, Hangzhou Zhejiang 310058, China
    3. School of Computer Science and Technology, Zhejiang University, Hangzhou Zhejiang 315100, China
  • Received:2012-09-06 Revised:2012-10-28 Online:2013-04-01 Published:2013-04-23
  • Contact: ZHANG Qinyan

摘要: 为了从这些海量信息中获取“有用的、满足用户需求的信息”,提出一个基于Hadoop和Lucene技术的分布式检索系统架构处理Web电子产品信息检索。利用Hadoop的Map和Reduce实现分布式索引文件的存储,通过Lucene检索技术实现索引文件的访问,从而提高信息检索的效率。并且针对Lucene_Hadoop架构存在粗粒度检索问题,提出了一种细粒度检索方法,减少了系统建立索引的时间。实验表明基于Hadoop和Lucene的分布式检索系统在Web电子产品信息中具有较高的检索性能。

关键词: 分布式检索系统, Web电子产品信息, Hadoop, Lucene, 细粒度检索

Abstract: In order to obtain the useful information that can satisfy the user requirements, this paper proposed a distributed information retrieval system based on Hadoop and Lucene handling the Web electronic products information retrieval. In order to improve the retrieval efficiency, using the Map and Reduce method of Hadoop technology implemented the storage of distributed index files and using Lucene technology implemented the file access of distributed index files. At the same time, it also proposed an improved method at fine grain retrieval level, which reduced the index building time. The experiment demonstrates that our distributed information retrieval system has a good retrieval performance for Web electronic products information.

Key words: distributed information retrieval system, Web electronic products information, Hadoop, Lucene, fine grain retrieval

中图分类号: