计算机应用 ›› 2012, Vol. 32 ›› Issue (04): 1086-1089.DOI: 10.3724/SP.J.1087.2012.01086

• 数据库技术 • 上一篇    下一篇

基于马尔可夫混合模型的电子商务搜索引擎用户行为聚类

覃俊1,肖容2   

  1. 1. 中南民族大学 计算机科学学院,武汉 430074
    2. 淘宝(中国)软件有限公司,杭州 310099
  • 收稿日期:2011-09-13 修回日期:2011-11-23 发布日期:2012-04-20 出版日期:2012-04-01
  • 通讯作者: 覃俊
  • 作者简介:覃俊(1968-),女(土家族),湖南常德人,教授,博士,主要研究方向:数据挖掘、模式识别、智能优化;肖荣(1983-),男,湖北荆门人,工程师,硕士,主要研究方向:模式识别。
  • 基金资助:
    国家自然科学基金资助项目

Clustering user behavior patterns of E-commerce search engine based on mixture of Markov models

QIN Jun1,XIAO Rong2   

  1. 1. School of Computer Science, South-Central University of Nationalities,Wuhan Hubei 430074, China
    2. Taobao (China) Software Company Limited,Hangzhou Zhejiang 310099, China
  • Received:2011-09-13 Revised:2011-11-23 Online:2012-04-20 Published:2012-04-01
  • Contact: QIN Jun

摘要: 对搜索引擎用户行为进行聚类分析有利于为用户提供个性化的服务。为了能准确地刻画用户行为的动态性,提出利用马尔可夫混合模型,对电子商务搜索引擎的用户行为模式聚类。模型假设每一类用户行为可表示为一个马尔可夫模型,当用户使用搜索引擎时,每个用户以一定的概率属于某一聚类;该用户的行为序列,由对应的马尔可夫模型产生。同时,为了解决参数估计和模型自动选择的问题,将贝叶斯阴阳和谐学习理论应用于该混合模型,提出针对该模型的和谐度函数及自适应梯度算法。仿真实验结果表明,与传统的最大期望(EM)算法相比,基于贝叶斯阴阳机的自适应梯度算法能更高效和准确地同时进行参数学习和模型选择。最后,将所提出的聚类方法应用于真实的电子商务搜索引擎点击日志,初步验证了本模型的有效性。

关键词: 马尔可夫模型, 最大期望算法, 模型聚类, 贝叶斯阴阳机, 和谐度函数

Abstract: Clustering the behavior patterns of the customers is helpful to provide more specific services for E-commerce applications. A mixture model based on Markov models was proposed to solve this problem on the search engine of E-Commerce website. This model assumed that the behaviors of every customer who used the search engine can be represented by a Markov model and every user was assigned to a particular cluster randomly. Based on Bayesian Ying-Yang (BYY) harmony learning theory, a corresponding harmony function and an adaptive gradient algorithm were designed to deal with the parameter-learning and model-selection tasks. The experimental result shows that this adaptive gradient algorithm can achieve the model-selection and the parameter-learning more automatically and efficiently when compared with EM algorithm. At last, this clustering approach was applied on real-world click-through logs of the search engine on www.taobao.com and the result shows that this method can capture the nature of customers behaviors effectively.

Key words: Markov model, Expectation-Maximization (EM) algorithm, model-based clustering, Bayesian Ying-Yang (BYY), harmony function

中图分类号: