Journal of Computer Applications

    Next Articles

Two-stage energy consumption feature selection method based on tree-based models

  

  • Received:2025-07-29 Revised:2025-09-22 Online:2025-11-05 Published:2025-11-05

基于树基模型的双阶段能耗特征选择方法

田博然1,周建涛1,赵大明2   

  1. 1. 内蒙古大学计算机学院
    2. 清华大学计算机系
  • 通讯作者: 田博然

Abstract: In cloud computing platforms, multidimensional resource characteristics of application services significantly influence data center energy consumption. Extracting key energy consumption indicators through feature selection techniques serves as an effective means to build accurate prediction models. Existing studies have failed to adequately integrate feature interpretability with global optimal search capability, resulting in issues such as high redundancy, low prediction accuracy, or lack of interpretability in selected features. To address these problems, a two-stage energy consumption feature selection method based on tree-based models was proposed. In the first stage, TreeSHAP, a game theory-based interpretability technique for tree-based models, quantifies marginal contributions of features and enables highly transparent elimination of redundant features. In the second stage, rapid convergence of Ant Colony Optimization (ACO) algorithm was combined with global search capability of Gravitational Search Algorithm (GSA) to identify the optimal feature combination within the reduced feature space. The two stages work collaboratively: the low-dimensional interpretable space filtered by TreeSHAP provides a foundation for ACO-GSA optimization, significantly reducing computational complexity, while ACO-GSA efficiently identifies key feature combinations through a hybrid search strategy. By comparing different feature selection methods and validating the performance of the feature subsets on multiple prediction models, a comprehensive evaluation is conducted in terms of feature dimensionality, prediction accuracy, and generalization performance. Experimental results demonstrate that the proposed method reduces the feature subset dimensionality by 58.3% compared to Least Absolute Shrinkage and Selection Operator (Lasso) and improves the prediction accuracy of the feature subset by 9.1% compared to ACO optimization algorithm on the University of Melbourne cloud dataset. The proposed method outperforms comparative methods in feature compactness, prediction accuracy, and generalization reliability, verifying its effectiveness.

Key words: Gravitational Search Algorithm (, GSA)

摘要: 云计算平台中应用服务的多维资源特征显著影响数据中心能耗,通过特征选择技术提取关键能耗指标是构建精准预测模型的有效手段,当前研究未能充分融合特征可解释性与全局最优搜索能力,导致所选特征存在冗余度高、预测精度差或决策不可解释等问题。为此,文中提出一种基于树基模型的双阶段能耗特征选择方法:第一阶段采用基于博弈论的树基模型可解释技术(TreeSHAP)量化特征边际贡献,实现高透明度冗余特征剔除;第二阶段融合蚁群优化算法(ACO)的快速收敛性与引力搜索算法(GSA)的全局搜索优势,在精简特征空间中定位最优组合。两阶段协同合作,TreeSHAP筛选出的低维可解释空间为ACO-GSA奠定了优化基础,大幅降低计算复杂度,ACO-GSA则在此基础上通过混合搜索策略,高效辨识出关键特征组合。通过对比不同特征选择方法,并在多种预测模型上验证特征子集的性能,从特征维度、预测精度及泛化性能等方面进行综合评价。实验结果表明,所提方法在墨尔本大学云数据集上获得的特征子集维度较最小绝对收缩和选择算法(Lasso)降低58.3%,特征子集预测精度较ACO算法提升了9.1%。所提方法在特征精简性、模型预测精度和泛化可靠性方面均优于对比方法,验证了所提方法的有效性。

CLC Number: