计算机应用 ›› 2009, Vol. 29 ›› Issue (11): 3100-3102.

• 数据库与数据挖掘 • 上一篇    下一篇

超链接导向搜索算法中主题漂移的研究

高琪1,张永平2   

  1. 1. 中国矿业大学计算机科学与技术学院计硕07级
    2. 中国矿业大学计算机科学与技术学院
  • 收稿日期:2009-04-29 修回日期:2009-06-14 出版日期:2009-11-01 发布日期:2009-11-26
  • 通讯作者: 高琪

Study on theme-drift of hyperlink-induced topic search algorithm

Qi GAO,Yong-ping ZHANG   

  • Received:2009-04-29 Revised:2009-06-14 Online:2009-11-01 Published:2009-11-26
  • Contact: Qi GAO

摘要: 超链接导向搜索(HITS)算法是比较经典的基于超链接的算法,但它忽视了链接页面的文本信息内容,没有区分链接的重要性,从而导致算法不可避免地发生主题漂移现象。为了解决这一问题,在原HITS算法的基础上,引入了经典的tf-idf算法,通过计算链接页面与查询主题的相关度来区分链接的重要性,以解决主题漂移的问题。改进算法使搜索引擎的排序结果更符合查询条件,相应的查确率也有很大提高。

关键词: 主题漂移, 页面排序, 搜索引擎

Abstract: Hyperlink-Induced Topic Search (HITS) algorithm is a classic hyperlink-based algorithm. But the HITS algorithm is purely based on the hyperlink, and it ignores the text of the linked page and does not distinguish the importance between the different hyperlinks. Because of this, a theme-drift phenomenon often happens when using HITS algorithm. The improved algorithm based on the HITS algorithm makes use of the classic tf-idf algorithm to calculate the related weight between the linked page and the query. The improved algorithm can make the search engine ranking results more in line with the query, and the corresponding precision rate has also been greatly improved.

Key words: theme-drift, sort page, search engine