计算机应用 ›› 2011, Vol. 31 ›› Issue (05): 1387-1390.DOI: 10.3724/SP.J.1087.2011.01387

• 数据库技术 • 上一篇    下一篇

基于相对Hamming距离的Web聚类算法

李彬1,汪天飞2,刘才铭1,张建东1   

  1. 1.乐山师范学院 智能信息处理及应用实验室,四川 乐山 614004
    2.乐山师范学院 数学与信息科学学院,四川 乐山 614004
  • 收稿日期:2010-11-02 修回日期:2011-01-10 发布日期:2011-05-01 出版日期:2011-05-01
  • 通讯作者: 李彬
  • 作者简介:李彬(1979-),女,四川乐山人,讲师,硕士,主要研究方向:数据挖掘、网络安全;汪天飞(1973-),男,四川乐山人,副教授,硕士,主要研究方向:组合与图论、数学建模;刘才铭(1979-),男,四川武胜人,副研究员,博士,主要研究方向:网络安全、人工智能;张建东(1980-),男,四川资阳人,讲师,硕士,主要研究方向:网络安全。
  • 基金资助:

    四川省教育厅基金资助项目(07ZB031;10ZC106)。

Web clustering algorithm based on relative hamming distance

LI Bin1, WANG Tian-fei2, LIU Cai-ming1, ZHANG Jian-dong1   

  1. 1.Laboratory of Intelligent Information Processing and Application, Leshan Normal University, Leshan Sichuan 614004, China
    2. College of Mathematics and Information Science, Leshan Normal University, Leshan Sichuan 614004, China
  • Received:2010-11-02 Revised:2011-01-10 Online:2011-05-01 Published:2011-05-01

摘要: 针对Web使用挖掘中聚类结果准确性不高的问题,提出了一种改进的基于相对Hamming距离和类不一致度的聚类算法。该算法首先以Web站点的URL为行、以UserID为列建立关联矩阵,元素值为用户的访问次数;然后,对所建立关联矩阵的列向量或行向量进行相似性度量,获得相似客户群体或相关页面。实验表明,该算法具有较高的准确性。

关键词: 聚类算法, 相对Hamming距离, 不一致度, Web使用挖掘, 网络安全

Abstract: Concerning the clustering inaccuracy in Web usage mining, an improved clustering algorithm based on relative Hamming distance and conflicting degree was given. In this algorithm, a URL-UserID associated matrix was set up, where URL and UserID of Web site were taken as row and column respectively, and each element's value of this matrix was the user's hits. Then, similar customer groups or relevant Web pages were obtained by measuring the similarity between column vectors or between row vectors of the associated matrix. The experiments show that the new algorithm is more accurate.

Key words: clustering algorithm, relative Hamming distance, conflicting degree, Web usage mining, network security