计算机应用

• 典型应用系统 • 上一篇    下一篇

自适应最优搜索算法的网络蜘蛛的设计与实现

魏文国 谢桂园   

  1. 广东技术师范学院 广东技术师范学院
  • 收稿日期:2007-05-23 修回日期:2007-07-01 发布日期:2007-11-01 出版日期:2007-11-01
  • 通讯作者: 魏文国

Design and implementation of adaptive best-first Web spider

WEI Wen-guo XIE Gui-yuan   

  • Received:2007-05-23 Revised:2007-07-01 Online:2007-11-01 Published:2007-11-01
  • Contact: WEI Wen-guo

摘要: 主题搜索引擎NonHogSearch改进了采用最优搜索算法的网络蜘蛛的搜索过程,控制了搜索的贪婪程度;并引入网页信噪比概念,从而判断网页是否属于所要搜索的主题页面;进一步,NonHogSearch在爬行过程中自动更新链接的权重,当得到主题相关页面时产生回报,将回报沿链接链路逆向反馈,更新链路上所有链接的Q值,这样避免了网络蜘蛛过早陷入Web搜索空间中局部最优子空间的陷阱,并通过并行方式实现多条链路的同时搜索,改进了搜索引擎的性能。实验证实了该算法在查全率与查准率两方面都有一定的优越性。

关键词: 个性化网络蜘蛛, 最优搜索算法, 在线增量自学习, 网页信噪比

Abstract: NonHogSearch, a topic-specific search engine based on improved bestfirst search algorithm was designed and implemented, which decreased the searching greed degree. Signal-to-noise ratio of Web page was used to judge whether or not the page belonged to the search topic. Further NonHogSearch Web spider made online-incremental adaptive learning, the reward generated directly by the ontopic pages would be feedback along the link-chain to update all the value Q of the links. NonHogSearch avoids going into local best solutions space earlier, and the performance of Web spider was improved. Experiments prove that it has better recall rate and precision rate than others.

Key words: topic-specific Web spider, best-first search algorithm, online-incremental adaptive learning, signal-to-noise ratio of web page