Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (12): 3476-3480.DOI: 10.11772/j.issn.1001-9081.2016.12.3476

Previous Articles     Next Articles

Simulation generating algorithm of Web log based on user interest migration

PENG Xingxiong1,2, XIAO Ruliang1,2   

  1. 1. Faculty of Software, Fujian Normal University, Fuzhou Fujian 350117, China;
    2. Fujian Provincial Engineering Research Center of Public Service Big Data Analysis and Application, Fuzhou Fujian 350117, China
  • Received:2016-04-23 Revised:2016-06-23 Online:2016-12-10 Published:2016-12-08
  • Supported by:
    This work is partially supported by the Fuijian Provincial Great Plan Project (2016H6007).

基于用户兴趣迁移的Web日志仿真生成算法

彭行雄1,2, 肖如良1,2   

  1. 1. 福建师范大学 软件学院, 福州 350117;
    2. 福建省公共服务大数据挖掘与应用工程研究中心, 福州 350117
  • 通讯作者: 肖如良
  • 作者简介:彭行雄(1991-),男,湖北孝感人,硕士研究生,主要研究方向:机器学习;肖如良(1966-),男,湖南娄底人,教授,博士,CCF高级会员,主要研究方向:软件工程、大数据软件。
  • 基金资助:
    福建省高校产学合作项目(2016H6007)。

Abstract: When the existing simulation generation algorithm uses the distribution of the static model to generate a Web log, there is a big difference with real data. In order to solve the problem, a new algorithm of Web Log Simulation Generation based on user interest migration (WLSG) was proposed. Firstly, the relationship between Web log and time was modeled. Secondly, the migration of user interest was simulated when the user accessed to the file in different time. Finally, it was also simulated that the user adaptively access to the file which he was most interested in at the current moment. Compared with the distribution of the existing static model, the proposed algorithm had significantly improved the self-similarity by about 2.86% on average. The experimental results show that, the proposed algorithm can well simulate Web log by user interest in migration to change user access sequence, which is capable of being effectively applied in the Web log simulation generation.

Key words: interest migration, time series, log analysis, self-similarity, simulation generation

摘要: 针对仿真生成算法采用静态分布模型生成Web日志,会造成与真实数据之间存在较大差异的问题,提出一种基于用户兴趣迁移的Web日志仿真生成(WLSG)算法。该算法首先对Web日志与时间的关系进行了建模;其次,模拟了用户在不同时间访问文件时用户的兴趣迁移;最后,也模拟了用户自适应访问当前时刻最感兴趣的文件。相对于现有的采用静态分布模型的仿真算法,所提算法能够提高自相似性指标约2.86%。实验结果表明,该算法通过用户的兴趣迁移来改变用户的访问序列,能够较好地模拟真实Web日志,有效地应用于Web日志的仿真生成。

关键词: 兴趣迁移, 时间序列, 日志分析, 自相似, 仿真生成

CLC Number: