Abstract:The ensemble learning models such as Light Gradient Boosting Machine (LightGBM) only mine data information once, and cannot automatically refine the granularity of data mining or obtain more potential internal correlation information in the data by deep digging. In order to solve the problems, a deep LightGBM ensemble learning model was proposed, which was composed of sliding window and deepening. Firstly, the ensemble learning model was able to automatically refine the granularity of data mining through the sliding window, so as to further mine the potential internal correlation information in the data and a certain expressive learning ability was given to the model. Secondly, based on the sliding window, the deepening step was used to further improve the representation learning ability of the model. Finally, the dataset was processed with feature engineering. The experimental results on the dataset of Google store show that, the prediction accuracy of the proposed deep ensemble learning model is 6.16 percentage points higher than that of original ensemble learning model. The proposed method can automatically refine the granularity of data mining, so as to obtain more potential information in the dataset. Moreover, compared with the traditional deep neural network, the deep LightGBM ensemble learning model has fewer parameters and better interpretability as a non-neural network.
叶志宇, 冯爱民, 高航. 基于深度LightGBM集成学习模型的谷歌商店顾客购买力预测[J]. 计算机应用, 2019, 39(12): 3434-3439.
YE Zhiyu, FENG Aimin, GAO Hang. Customer purchasing power prediction of Google store based on deep LightGBM ensemble learning model. Journal of Computer Applications, 2019, 39(12): 3434-3439.
[1] OZA N C. Online ensemble learning[C]//Proceedings of the 7th National Conference on Artificial Intelligence and 12th Conference on Innovative Applications of Artificial Intelligence. Menlo Park:AAAI Press, 2000:1109. [2] BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1):5-32. [3] KE G, MENG Q, FINLEY T, et al. LightGBM:a highly efficient gradient boosting decision tree[C]//Proceedings of the 2017 Annual Conference on Neural Information Processing Systems. New York:Curran Associates Inc., 2017:3146-3154. [4] DOROGUSH A V, ERSHOV V, GULIN A. CatBoost:gradient boosting with categorical features support[EB/OL].[2019-03-20]. https://arxiv.org/pdf/1810.11363.pdf. [5] BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2):123-140. [6] GRABNER H, BISCHOF H. On-line boosting and vision[C]//Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2006:260-267. [7] FRIEDMAN J H. Greedy function approximation:a gradient boosting machine[J]. The Annals of Statistics, 2001, 29(5):1189-1232. [8] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780. [9] GRAVES A, FERNÁNDEZ S, SCHMIDHUBER J. Multi-dimen-sional recurrent neural networks[C]//Proceedings of the 2007 International Conference on Artificial Neural Networks, LNCS 4668. Berlin:Springer, 2007:549-558. [10] BAO W, YUE J, RAO Y. A deep learning framework for financial time series using stacked autoencoders and long-short term memory[J]. PloS One, 2017, 12(7):Article No. e0180944. [11] SHAO X, MA D, LIU Y, et al. Short-term forecast of stock price of multi-branch LSTM based on K-means[C]//Proceedings of the 2018 International Conference on Systems and Informatics. Piscataway:IEEE, 2018:1546-1551. [12] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-03-20]. https://arxiv.org/pdf/1409.1556.pdf. [13] BENGIO Y, COURVILLE A, VINCENT P. Representation learning:a review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1798-1828. [14] ZHOU Z-H, FENG J. Deep forest:towards an alternative to deep neural networks[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Menlo Park:AAAI Press, 2017:3553-3559. [15] SCOTT S, MATWIN S. Feature engineering for text classification[C]//Proceedings of the 1999 International Machine Learning Conference. San Francisco:Morgan Kaufmann Publishers Inc., 1999:379-388. [16] CHEN T, GUESTRIN C. XGBoost:a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2016:785-794. [17] TYREE S, WEINBERGER K Q, AGRAWAL K, et al. Parallel boosted regression trees for Web search ranking[C]//Proceedings of the 2011 International Conference on World Wide Web. New York:ACM, 2011:387-396. [18] PEDREGOSA F, VAROQUAUX G, GRAMFORT A, et al. Scikit-learn:machine learning in python[J]. Journal of Machine Learning Research, 2011, 12:2825-2830.