Journal of Computer Applications ›› 2011, Vol. 31 ›› Issue (10): 2786-2789.DOI: 10.3724/SP.J.1087.2011.02786

• Artificial intelligence • Previous Articles     Next Articles

Denoising method of non-spherical distributed data set

ZHANG Yan, YAN De-qin, ZHENG Hong-liang   

  1. School of Computer and Information Technology, Liaoning Normal University, Dalian Liaoning 116081, China
  • Received:2011-05-03 Revised:2011-06-07 Online:2011-10-11 Published:2011-10-01

非球形分布数据集的去噪方法

张岩,闫德勤,郑宏亮   

  1. 辽宁师范大学 计算机与信息技术学院, 辽宁 大连 116081
  • 通讯作者: 张岩
  • 作者简介:张岩(1982-),女(蒙古族),内蒙古锡林浩特人,硕士研究生,主要研究方向:模式识别、数据挖掘;闫德勤(1962-),男,山东菏泽人,教授,博士,主要研究方向:模式识别、数据挖掘、密码学、图像检索;郑宏亮(1970-),男,辽宁铁岭人,讲师,主要研究方向:人工智能。
  • 基金资助:

    辽宁省教育厅高等学校科学研究基金资助项目(2008344);中国科学院自动化研究所复杂系统与智能科学重点实验室开放课题基金资助项目(20070101)

Abstract: Considering the over-sensitiveness of traditional Support Vector Machine (SVM) to noises, and the excessive dependence on the geometric shape of sample set of Fuzzy SVM (FSVM), Rough Support Vector Machines based on Noise Filtering System (NFS-RSVM) was proposed. Firstly, the sample that was most likely to be noise was filtered out by Noise Filtering System (NFS); then the equivalence class which was implied in the data was integrated into the SVM model as a double punishment factor for distinguishing valid and noise samples. The simulation results on UCI show that NFS-RSVM can remove most of the noises effectively, and the accuracy is improved partly compared with the traditional SVM and FSVM. Therefore, NFS-RSVM shows better noise immunity, classification performance and generalization ability when dealing with the non-spherical distributed data set with too many noises.

Key words: Support Vector Machine (SVM), Rough SVM (RSVM), Noise Filtering System (NFS), equivalence class, denoising

摘要: 针对传统支持向量机(SVM)对噪声点过于敏感,模糊支持向量机(FSVM)又对样本集几何形状过分依赖等问题,提出基于噪声过滤系统的粗糙支持向量机(NFS-RSVM)。该方法首先用噪声过滤系统(NFS)将极可能为噪声点的样本过滤掉;然后将数据间隐含的等价类信息作为双惩戒因子融入到支持向量机模型中,进一步区分有效样本和噪声样本。基于UCI数据集的仿真结果表明,NFS-RSVM方法能有效地将数据中的大部分噪声点去除,与传统的SVM和FSVM相比分类精度有一定程度的提高。因此,该方法在处理噪声样本较多又呈现非球形分布的数据集时,表现出较好的抗噪性、分类效果和泛化能力。

关键词: 支持向量机, 粗糙支持向量机, 噪声过滤系统, 等价类, 去噪

CLC Number: