计算机应用 ›› 2015, Vol. 35 ›› Issue (3): 802-806.DOI: 10.11772/j.issn.1001-9081.2015.03.802

• 人工智能 • 上一篇    下一篇

基于小波的搜索量聚类及在变量选择中的应用

袁铭   

  1. 天津财经大学 统计学系, 天津 300222
  • 收稿日期:2014-10-10 修回日期:2014-11-16 出版日期:2015-03-10 发布日期:2015-03-13
  • 通讯作者: 袁铭
  • 作者简介:袁铭(1982-),男,天津人,讲师,博士,主要研究方向:数据挖掘、人工智能、计算机技术在经济研究中的应用
  • 基金资助:

    天津哲学社会科学规划项目(TJTJ13-002)

Search data clustering based on wavelet and its application in variable selection

YUAN Ming   

  1. Department of Statistics, Tianjin University of Finance and Economics, Tianjin 300222, China
  • Received:2014-10-10 Revised:2014-11-16 Online:2015-03-10 Published:2015-03-13

摘要:

针对使用网络购物搜索量数据建立预测模型时的变量选择问题,提出一种基于连续小波变换(CWT)及其逆变换的聚类方法。算法充分考虑了搜索量的数据特征,将原始序列分解成为不同时间尺度下的周期成分,并重构为输入向量。在此基础上通过加权模糊C均值(FCM)方法进行聚类。变量选择是根据聚类后每个分类中的关键词隶属度函数值确定的,选择效果通过我国居民消费价格指数(CPI)的预测模型进行验证。结果表明,搜索量序列具有不同长度的周期成分,聚类后同组关键词具有明显的商品类型一致性。与其他变量选择方法相比,基于小波重构序列聚类的预测模型具有更高的预测精度,单步和三步预测相对误差仅为0.3891%和0.5437%,预测变量也具有清晰的经济含义,因此特别适用于解决大数据背景下高维预测模型的变量选择问题。

关键词: 网络购物搜索量, 预测模型, 变量选择, 连续小波变换, 模糊聚类

Abstract:

A clustering method for online shopping search data based on Continuous Wavelet Transformation (CWT) and its inverse transformation was proposed for variable selection in predictive model. The method decomposed original series into different periodic components by taking full account of special characteristics of search data and reconstructed such periodic components into input vectors. Clustering was implemented through weighted Fuzzy C-Means (FCM) algorithm. The variables (keywords) were selected according to their membership function values in each group. Variable selection effectiveness was then evaluated through a prediction model for Chinese monthly Consumer Price Index (CPI). The experimental results indicate that search volume series have different periodic components and the keywords within the same group are highly consistent in commodity type. Compared to other variable selection methods, the prediction model based on the wavelet clustering can achieve better prediction accuracy, the one-step and three-step relative prediction errors are 0.3891% and 0.5437% respectively, and the selected variables also have clearly economic meaning. The proposed method is particularly suitable to address variable selection problem of high-dimensional predictive model in the big data era.

Key words: online shopping search volume, predictive model, variable selection, Continuous Wavelet Transformation (CWT), fuzzy clustering

中图分类号: