计算机应用 ›› 2012, Vol. 32 ›› Issue (08): 2202-2274.DOI: 10.3724/SP.J.1087.2012.02202

• 数据库技术 • 上一篇    下一篇

数据缺失条件下基于启发式构元的多元回归分析方法

张希翔,李陶深   

  1. 广西大学 计算机与电子信息学院,南宁 530004
  • 收稿日期:2012-02-06 修回日期:2012-03-06 发布日期:2012-08-28 出版日期:2012-08-01
  • 通讯作者: 李陶深
  • 作者简介:张希翔(1986-),男,广西南宁人,硕士研究生,CCF会员,主要研究方向:数据挖掘、云计算;
    李陶深(1957-),男,广西南宁人,教授,博士,CCF会员,主要研究方向:分布式数据库、无线Mesh网络、网络与信息安全、云计算。
  • 基金资助:
    国家自然科学基金资助项目(61174175);广西研究生科研创新项目(GXU11T32590)

Multivariate regression analytical method based on heuristic constructed variable under condition of incomplete data

ZHANG Xi-xiang,LI Tao-shen   

  1. College of Computer Science and Electronic Information, Guangxi University, Nanning Guangxi 530004, China
  • Received:2012-02-06 Revised:2012-03-06 Online:2012-08-28 Published:2012-08-01
  • Contact: LI Tao-shen

摘要: 传统的多元回归分析方法可以对缺失数据进行预测填补,但它在构造回归方程时存在自变量形式较为固定、单一等不足。为此,提出一种基于启发式构元的多元回归分析方法,通过贪婪算法找出现有变量的优化组合形式,选取若干新构变量进行回归分析,从而得到更好的拟合优度。通过对案例中小麦茎秆机械强度缺失数据信息进行仿真计算和评估,证实了方法的有效性。算例结果表明该方法运用在缺失数据预测中拥有较好的精准性。

关键词: 数据缺失, 贪婪算法, 多元回归分析, 相关系数, 拟合度

Abstract: Regression analysis is often used for filling and predicting incomplete data, whereas it has some flaws when constructing regression equation, the independent variable form is fixed and single. In order to solve the problem, the paper proposed an improved multivariate regression analytical method based on heuristic constructed variable. Firstly, the existing variables' optimized combination forms were found by means of greedy algorithm, then the new constructed variable for multivariate regression analysis was chosen to get a better goodness of fit. Results of calculating and estimating incomplete data of wheat stalks' mechanical strength prove that the proposed method is feasible and effective, and it can get a better goodness of fit when predicting incomplete data.

Key words: incomplete data, greedy algorithm, multivariate regression analysis, correlation coefficient, fitting degree

中图分类号: