Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2073-2079.DOI: 10.11772/j.issn.1001-9081.2023070923

• Data science and technology • Previous Articles     Next Articles

Distance weighted discriminant analysis based on robust principal component analysis for matrix data

Junchi GE, Weihua ZHAO()   

  1. School of Sciences,Nantong University,Nantong Jiangsu 226019,China
  • Received:2023-07-11 Revised:2023-09-19 Accepted:2023-09-20 Online:2023-10-26 Published:2024-07-10
  • Contact: Weihua ZHAO
  • About author:GE Junchi, born in 1993, M. S. candidate. His research interests include statistical learning, high-dimensional data analysis.
    First author contact:ZHAO Weihua, born in 1978, Ph. D., professor. His research interests include machine learning, complex data modeling and analysis.
  • Supported by:
    National Social Science Foundation of China(22BTJ025)

矩阵数据基于鲁棒主成分分析的距离加权判别分析

葛焌迟, 赵为华()   

  1. 南通大学 理学院,江苏 南通 226019
  • 通讯作者: 赵为华
  • 作者简介:葛焌迟(1993—),男,江苏南通人,硕士研究生,主要研究方向:统计学习、高维数据分析;
    第一联系人:赵为华(1978—),男,江苏海门人,教授,博士生导师,博士,主要研究方向:机器学习、复杂数据建模分析。
  • 基金资助:
    国家社会科学基金资助项目(22BTJ025)

Abstract:

Distance Weighted Discrimination (DWD) is a widely used matrix data classification model. However the model usually experiences significant performance degradation when severe noise contamination is present in the data. Robust Principal Component Analysis (RPCA) has become one of the effective ways to solve this problem due to its ability to separate the low-rank structure and sparse component of matrix data. Therefore a Robust DWD for matrix data RDWD-2D model was proposed. In particular the model performs robust principal component analysis on data in a supervised way which can achieve the recovery and classification of clean data simultaneously. Experimental results on MNIST and COIL20 datasets show that in the case of matrix data contaminated with noise or missing values the RDWD-2D model has the best data recovery capability and the highest classification accuracy compared with DWD-2D RPCA+DWD and other models. Also the RDWD-2D model demonstrates good robustness to the degree of data contamination.

Key words: robust classification model, Distance Weighted Discrimination (DWD), matrix data, Principal Component Analysis (PCA)

摘要:

距离加权判别(DWD)是一种已被广泛应用的矩阵数据分类模型,当数据中存在严重的噪声污染时,该模型的性能会明显下降。鲁棒主成分分析(RPCA)因具备分离数据矩阵低秩结构和稀疏部分的特性已成为解决该问题的有效手段之一。因此,提出一种矩阵数据鲁棒距离加权判别(RDWD-2D)模型。特别地,该模型以有监督的方式对数据矩阵进行鲁棒主成分分析,并同步实现干净数据的恢复与分类。在MNIST和COIL20数据集上的实验结果表明,针对有噪声污染或数据缺失的矩阵数据,与DWD-2D、RPCA+DWD等模型相比,RDWD-2D模型有最好的数据恢复能力和最高的分类准确率;同时RDWD-2D模型对于数据污染度也有较好的鲁棒性。

关键词: 鲁棒分类模型, 距离加权判别(DWD), 矩阵数据, 主成分分析(PCA)

CLC Number: