计算机应用 ›› 2010, Vol. 30 ›› Issue (3): 810-812.

• 数据库与数据挖掘 • 上一篇    下一篇

全最小一乘准则下的LGA新算法

曹慧荣1,方杰2   

  1. 1. 廊坊师范学院数信学院
    2. 廊坊师范学院 数学与信息科学学院
  • 收稿日期:2009-09-23 修回日期:2009-11-27 发布日期:2010-03-14 出版日期:2010-03-01
  • 通讯作者: 曹慧荣
  • 基金资助:
    河北省教育厅自然科学研究项目

New linear grouping algorithm based on total least absolute deviation criteria

  • Received:2009-09-23 Revised:2009-11-27 Online:2010-03-14 Published:2010-03-01
  • Contact: rong huicao

摘要: 为克服传统提取数据集中线性结构的LGA对噪声数据比较敏感的缺陷,提出了两种基于稳健的全最小一乘准则下的LGA新算法。首先证明了全最小一乘准则下数据集最优划分的存在性,并据此给出一种有限步终止算法。其次为提高计算速度,根据k-means算法、全最小一乘准则和重抽样方法给出另一种快速收敛算法。通过与传统的LGA和基于Trimmed k-means思想的稳健LGA的比较,仿真结果表明提出的算法具有较好的稳健性,可以在离群数据较多的情形下,同时找出数据集合中的所有强线性结构。

关键词: LGA, 全最小一乘, 稳健性, 聚类分析

Abstract: To overcome the drawback that the traditional Linear Grouping Algorithm (LGA), when extracting linear structures, is sensitive to outliers in datasets, a new finite step according to existence of optimal linear grouping in data set and a new algorithm based on k-means clustering, total least absolute deviation and resampling were proposed, which detected several different linear relations at once to minimize the total orthogonal distances from n given points to its nearest hyperplanes. Finally, by comparison with linear grouping algorithm and robust linear grouping algorithm based on impartial trimmed k-means, the proposed algorithms are more robust and can detect all strong linear structures in datasets including a lot of outliers.

Key words: Linear Grouping Algorithm (LGA), total least absolute deviations, robustness, cluster analysis