Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (6): 1674-1679.DOI: 10.11772/j.issn.1001-9081.2017.06.1674

Previous Articles     Next Articles

Method for solving Lasso problem by utilizing multi-dimensional weight

CHEN Shanxiong1,2, LIU Xiaojuan1,2, CHEN Chunrong1, ZHENG fangyuan1   

  1. 1. College of Computer and Information Science, Southwest University, Chongqing 400715, China;
    2. School of Information Engineering, Guizhou University of Engineering Science, Bijie Guizhou 551700, China
  • Received:2016-11-07 Revised:2017-01-12 Online:2017-06-10 Published:2017-06-14
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61303227), the Plan of Guizhou Provincial Science and Technology Talents in Universities (KEHE KY[2016]098), the Joint Fund of Guizhou Science and Technology Agency (KEHE LH[2016]7053).

针对Lasso问题的多维权重求解算法

陈善雄1,2, 刘小娟1,2, 陈春蓉1, 郑方园1   

  1. 1. 西南大学 计算机与信息科学学院, 重庆 400715;
    2. 贵州工程应用技术学院 信息工程学院, 贵州 毕节 551700
  • 通讯作者: 陈善雄
  • 作者简介:陈善雄(1981-),男,重庆人,副教授,博士,主要研究方向:压缩感知、异常检测、模式识别;刘小娟(1990-),女,四川广安人,助教,硕士,主要研究方向:模式识别、神经网络;陈春蓉(1995-),女,重庆人,硕士研究生,主要研究方向:数据挖掘、智能信息处理;郑方园(1994-),男,河南焦作人,硕士研究生,主要研究方向:异常检测、网络安全。
  • 基金资助:
    国家自然科学基金资助项目(61303227);贵州省普通高等学校科技拔尖人才支持计划项目(黔教合KY字[2016]098);贵州省科技厅联合基金资助项目(黔科合LH字[2016]7053)。

Abstract: Least absolute shrinkage and selection operator (Lasso) has performance superiority in dimension reduction of data and anomaly detection. Concerning the problem that the accuracy is low in anomaly detection based on Lasso, a Least Angle Regression (LARS) algorithm based on multi-dimensional weight was proposed. Firstly, the problem was considered that each regression variable had different weight in the regression model. Namely, the importance of the attribute variable was different in the overall evaluation. So, in calculating angular bisector of LARS algorithm, the united correlation of regression variable and residual vector was introduced to distinguish the effect of different attribute variables on detection results. Then, the three weight estimation methods of Principal Component Analysis (PCA), independent weight evaluation and CRiteria Importance Though Intercriteria Correlation (CRITIC) were added into LARS algorithm respectively. The approach direction and approach variable selection in the solution of LARS were further optimized. Finally, the Pima Indians Diabetes dataset was used to prove the optimal property of the proposed algorithm. The experimental results show that, the LARS algorithm based on multi-dimensional weight has a higher accuracy than the traditional LARS under the same constraint condition with smaller threshold value, and can be more suitable for anomaly detection.

Key words: Least absolute shrinkage and selection operator (Lasso), variable selection, Least Angle Regression (LARS), Multiple Linear Regression (MLR), weighting

摘要: 最小绝对收缩和选择算子(Lasso)在数据维度约减、异常检测方面有着较强的计算优势。针对Lasso用于异常检测中检测精度不高的问题,提出了一种基于多维度权重的最小角回归(LARS)算法解决Lasso问题。首先考虑每个回归变量在回归模型中所占权重不同,即此属性变量在整体评价中的相对重要程度不同,故在LARS算法计算角分线时,将各回归变量与剩余变量的联合相关度纳入考虑,用来区分不同属性变量对检测结果的影响;然后在LARS算法中加入主成分分析(PCA)、独立权数法、基于Intercriteria相关性的指标的重要度评价(CRITIC)法这三种权重估计方法,并进一步对LARS求解的前进方向和前进变量选择进行优化。最后使用Pima Indians Diabetes数据集验证算法的优良性。实验结果表明,在更小阈值的约束条件下,加入多维权重后的LARS算法对Lasso问题的解具有更高的准确度,能更好地用于异常检测。

关键词: 最小绝对收缩和选择算子, 变量选择, 最小角回归, 多元线性回归, 加权

CLC Number: