计算机应用 ›› 2010, Vol. 30 ›› Issue (07): 1782-1784.

• 信息安全 • 上一篇    下一篇

基于陷阱技术的网络爬虫检测

范纯龙1,袁滨2,余周华3,徐蕾2   

  1. 1. 辽宁沈阳航空工业学院新校区
    2. 沈阳航空工业学院
    3.
  • 收稿日期:2010-01-26 修回日期:2010-02-24 发布日期:2010-07-01 出版日期:2010-07-01
  • 通讯作者: 范纯龙

Spider detection based on trap techniques

  • Received:2010-01-26 Revised:2010-02-24 Online:2010-07-01 Published:2010-07-01
  • Contact: fan chunlong

摘要: 网络爬虫作为一种网络资源获取程序,在被搜索引擎等领域广泛应用的同时,也带来隐私泄露、版权纠纷等诸多问题,因此需要检测和约束Spider的行为。总结了现有的Spider检测方法,介绍了陷阱技术在Spider检测中的应用现状,提出利用有结构的陷阱技术,构建Spider检测的网站模型和相应的检测算法,并对该方法的检测能力进行了分析和评价,最后在通过实验系统验证陷阱检测方法与人工分析结论相一致的基础上进一步分析了该检测结果的成因。

关键词: 陷阱技术, 覆盖率, 爬虫检测, 召回率

Abstract: Spider known as Web crawler, a program for capturing network resources, is widely used in the field of search engines. However, it also raises many problems such as privacy leakage and copyright dissension. Therefore, it is necessary to take measures to detect and restrict the Web crawler's behavior. This paper firstly briefly reviewed the current achievements in Web crawler's detection and the utilization of the trap technique for this purpose. And then, a structural trap technique was proposed to construct Website models and corresponding detection algorithms. Finally the authors measured the sensitivity of the model and summarized its performance. The results indicate that the precision stemming from the structural trap technique is generally consistent with the one from artificial analyses.

Key words: trap techniques, coverage rate, spider detection, recall rate