计算机应用 ›› 2012, Vol. 32 ›› Issue (11): 2989-2993.DOI: 10.3724/SP.J.1087.2012.02989

• 先进计算 • 上一篇    下一篇

基于MapReduce的微博用户搜索排名算法

梁秋实,吴一雷,封磊   

  1. 北京市科学技术研究院 北京市计算中心,北京 100012
  • 收稿日期:2012-05-13 修回日期:2012-06-28 发布日期:2012-11-12 出版日期:2012-11-01
  • 通讯作者: 梁秋实
  • 作者简介:梁秋实(1982-), 男,河南潢川人, 高级工程师, 主要研究方向:网格计算; 吴一雷(1980-), 男,安徽芜湖人, 研究员, 博士, 主要研究方向:人工智能、数据挖掘; 封磊(1987-), 男, 北京人, 高级工程师,主要研究方向:高性能计算。
  • 基金资助:
    北京市科学技术研究院萌芽计划基金资助项目

User ranking algorithm for microblog search based on MapReduce

LIANG Qiu-shi,WU Yi-lei,FENG Lei   

  1. Beijing Computing Center, Beijing Academy of Science and Technology, Beijing 100012, China
  • Received:2012-05-13 Revised:2012-06-28 Online:2012-11-12 Published:2012-11-01
  • Contact: LIANG Qiu-shi

摘要: 在微博搜索领域,单纯依赖于粉丝数量的搜索排名使刷粉行为有了可乘之机,通过将用户看作网页,将用户间的“关注”关系看作网页间的链接关系,使PageRank关于网页等级的基本思想融入到微博用户搜索,并引入一个状态转移矩阵和一个自动迭代的MapReduce工作流将计算过程并行化,进而提出一种基于MapReduce的微博用户搜索排名算法。在Hadoop平台上对该算法进行了实验分析,结果表明,该算法避免了用户排名单纯与其粉丝数量相关,使那些更具“重要性”的用户在搜索结果中的排名获得提升,提高了搜索结果的相关性和质量。

关键词: 微博搜索, 云计算, MapReduce编程模型, Hadoop平台/系统, PageRank算法

Abstract: When microblog users search someone, they would like to follow by keywords. Most service providers order their results list simply depending on the scale of followers. Unfortunately, this approach gives frauds quite a few opportunities to cheat the search engine. This paper, by regarding microblog users as Web pages, and the relationship between followers as the one between Web pages that linked each other, applied the basic idea of PageRank to rank microblog users. After introducing a statetransition matrix and an autoiterative MapReduce workflow to parallel the computation steps, this paper described a user ranking algorithm for microblog search. As shown in the experiment by using Hadoop platform, the algorithm increases the difficulty to cheat search engines, makes more important users get better rankings, and improves the relevance and quality of search results.

Key words: microblog search, cloud computing, MapReduce programming model, Hadoop platform/system, PageRank algorithm

中图分类号: