计算机应用 ›› 2019, Vol. 39 ›› Issue (12): 3434-3439.DOI: 10.11772/j.issn.1001-9081.2019071305

• 第十七届中国机器学习会议(CCML 2019)论文 • 上一篇    下一篇

基于深度LightGBM集成学习模型的谷歌商店顾客购买力预测

叶志宇, 冯爱民, 高航   

  1. 南京航空航天大学 计算机科学与技术学院, 南京 211100
  • 收稿日期:2019-04-29 修回日期:2019-07-25 发布日期:2019-12-17 出版日期:2019-12-10
  • 作者简介:叶志宇(1994-),男,福建三明人,硕士研究生,主要研究方向:机器学习、数据挖掘、树模型、深度模型;冯爱民(1971-),女,江苏南京人,副教授,博士,主要研究方向:机器学习、数据挖掘;高航(1964-),男,江苏南京人,副教授,博士,主要研究方向:多媒体技术、嵌入式系统。

Customer purchasing power prediction of Google store based on deep LightGBM ensemble learning model

YE Zhiyu, FENG Aimin, GAO Hang   

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing Jiangsu 211100, China
  • Received:2019-04-29 Revised:2019-07-25 Online:2019-12-17 Published:2019-12-10
  • Contact: 冯爱民

摘要: 针对轻量化梯度促进机(LightGBM)等集成学习模型只对数据信息进行一次挖掘,无法自动地细化数据挖掘粒度或通过深入挖掘得到更多的数据中潜在内部关联信息的问题,提出了深度LightGBM集成学习模型,该模型由滑动窗口和加深两部分组成。首先,通过滑动窗口使得集成学习模型能够自动地细化数据挖掘粒度,从而更加深入地挖掘数据中潜在的内部关联信息,同时赋予模型一定的表示学习能力。然后,基于滑动窗口,用加深步骤进一步地提升模型的表示学习能力。最后,结合特征工程对数据集进行处理。在谷歌商店数据集上进行的实验结果表明,所提深度集成学习模型相较原始集成学习模型的预测精度高出6.16个百分点。所提方法能够自动地细化数据挖掘粒度,从而获取更多数据集中的潜在信息,并且深度LightGBM集成学习模型与传统深度神经网络相比是非神经网络的深度模型,参数更少,可解释性更强。

关键词: 机器学习, 轻量化梯度促进机, 数据挖掘, 深度模型, 集成学习, 特征工程

Abstract: The ensemble learning models such as Light Gradient Boosting Machine (LightGBM) only mine data information once, and cannot automatically refine the granularity of data mining or obtain more potential internal correlation information in the data by deep digging. In order to solve the problems, a deep LightGBM ensemble learning model was proposed, which was composed of sliding window and deepening. Firstly, the ensemble learning model was able to automatically refine the granularity of data mining through the sliding window, so as to further mine the potential internal correlation information in the data and a certain expressive learning ability was given to the model. Secondly, based on the sliding window, the deepening step was used to further improve the representation learning ability of the model. Finally, the dataset was processed with feature engineering. The experimental results on the dataset of Google store show that, the prediction accuracy of the proposed deep ensemble learning model is 6.16 percentage points higher than that of original ensemble learning model. The proposed method can automatically refine the granularity of data mining, so as to obtain more potential information in the dataset. Moreover, compared with the traditional deep neural network, the deep LightGBM ensemble learning model has fewer parameters and better interpretability as a non-neural network.

Key words: machine learning, Light Gradient Boosting Machine (LightGBM), data mining, deep model, ensemble learning, feature engineering

中图分类号: