计算机应用 ›› 2005, Vol. 25 ›› Issue (07): 1580-1583.
• 数据库技术 • 上一篇 下一篇
郭伟刚1,2,鞠时光2
收稿日期:
发布日期:
出版日期:
作者简介:
GUO Wei-gang1,2, JU Shi-guang2
Received:
Online:
Published:
摘要:
分析了网络机器人(Web Robot)的访问行为特点,发现Robot的访问序列一般不会形成具有链接关系的路径。在定义了用户事务的概念的基础上,提出了一个基于事务分析的检测算法。经实验验证,该算法可以有效地检测未知的和不遵守网络机器人排斥标准的Robot。
关键词: 搜索引擎;网络机器人;用户事务;检测;Web日志
Abstract:
After analyzing the navigational patterns of Web robots, the feature was found that the access sequence of robots usually didnt satisfy the hyperlink relations. The concept of episode was defined and a new algorithm based on episode analysis was proposed. The experiments show that the new algorithm can detect the unknown robots and unfriendly robots who do not obey the standard for robot exclusion.
Key words: search engine, Web robot, user episode, detection, Web log
郭伟刚,鞠时光. 一个基于事务分析的Web Robot检测算法[J]. 计算机应用, 2005, 25(07): 1580-1583.
GUO Wei-gang,JU Shi-guang. Web robot detection algorithm based on episode analysis[J]. Journal of Computer Applications, 2005, 25(07): 1580-1583.
0 / 推荐
导出引用管理器 EndNote|Ris|BibTeX
链接本文: http://www.joca.cn/CN/
http://www.joca.cn/CN/Y2005/V25/I07/1580
[1]The Web Robots Database. http://www.robotstxt.org/wc/active.html[EB/OL],2004-07.. [2]TAN PN, KUMAR V. Discovery of Web Robot Sessions based on their Navigational Patterns[J]. Data Mining and Knowledge Discovery, 2002,6(1): 9-35. [3]Robots Exclusion. http://www.robotstxt.org/wc/exclusion.html[EB/OL], 2004-07. [4]Internet Explorer Does Not Send Referer Header in Unsecured Situations[EB/OL]. http://support.microsoft.com/, 2004-07. [5]Tracking and Logging[EB/OL]. http://www.webmasterworld.com/,2004-07. [6]Hypertext Transfer Protocol-HTTP/1.1[S/OL]. http://www.w3.org, 2004-07.