新颖的判别性特征选择方法

doi:10.11772/j.issn.1001-9081.2015.10.2752

计算机应用 ›› 2015, Vol. 35 ›› Issue (10): 2752-2756.DOI: 10.11772/j.issn.1001-9081.2015.10.2752

• 第十五届中国机器学习会议(CCML2015)论文 • 上一篇下一篇

新颖的判别性特征选择方法

吴锦华^1,2, 左开中^1,2, 接标^1,2,3, 丁新涛^1,2

1. 安徽师范大学数学计算机科学学院, 安徽芜湖 241003;
2. 安徽师范大学网络与信息安全工程技术研究中心, 安徽芜湖 241003;
3. 南京航空航天大学计算机科学与技术学院, 南京 210016

收稿日期:2015-06-16 修回日期:2015-06-27 出版日期:2015-10-10 发布日期:2015-10-14
通讯作者: 吴锦华(1991-),男,安徽安庆人,硕士研究生,主要研究方向:机器学习、信息安全,ahnu_wjh@139.com
作者简介:左开中(1974-),男,安徽宿州人,教授,博士,CCF会员,主要研究方向:信息安全、机器学习;接标(1977-),男,安徽宿州人,副教授,博士,主要研究方向:机器学习、医学图像处理;丁新涛(1979-),男,安徽芜湖人,讲师,博士,CCF会员,主要研究方向:模式识别、图像处理。
基金资助:
国家自然科学基金资助项目(61472005);安徽省自然科学基金资助项目(1508085MF125);模式识别国家重点实验室开放课题资助项目(201407361)。

New discriminative feature selection method

WU Jinhua^1,2, ZUO Kaizhong^1,2, JIE Biao^1,2,3, DING Xintao^1,2

1. School of Mathematics and Computer Science, Anhui Normal University, Wuhu Anhui 241003, China;
2. Network and Information Security Engineering Research Center, Anhui Normal University, Wuhu Anhui 241003, China;
3. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing Jiangsu 210016, China

Received:2015-06-16 Revised:2015-06-27 Online:2015-10-10 Published:2015-10-14

摘要/Abstract

摘要： 作为数据预处理的一种常用的手段,特征选择不仅能够提高分类器的分类性能,而且能增加对分类结果的解释性。针对基于稀疏学习的特征选择方法有时会忽略一些有用的判别信息而影响分类性能的问题,提出了一种新的判别性特征选择方法——D-LASSO,用于选择出更具有判别力的特征。首先D-LASSO模型包含一个L₁-范式正则化项,用于产生一个稀疏解;其次,为了诱导出更具有判别力的特征,模型中增加了一个新的判别性正则化项,用于保留同类样本以及不同类样本之间几何分布信息,用于诱导出更具有判别力的特征。在一系列Benchmark数据集上的实验结果表明,与已有方法相比较,D-LASSO不仅能进一步提高分类器的分类精度,而且对参数也较为鲁棒。

关键词: 特征选择, 稀疏解, L₁-范式, 判别正则化项, 分类

Abstract: As a kind of common method for data preprocessing, feature selection can not only improve the classification performance, but also increase the interpretability of the classification results. In sparse-learning-based feature selection methods, some useful discriminative information is ignored, and it may affect the final classification performance. To address this problem, a new discriminative feature selection method called Discriminative Least Absolute Shrinkage and Selection Operator (D-LASSO) was proposed to choose the most discriminative features. In detail, firstly, the proposed D-LASSO method contained a L₁-norm regularization item, which was used to produce sparse solution. Secondly, in order to induce the most discriminative features, a new discriminative regularization term was introduced to embed the geometric distribution information of samples with the same class label and samples with different class labels. Finally, the comparison experimental results obtained from a series of Benchmark datasets show that, the proposed D-LASSO method can not only improve the classification accuracy, but also be robust against parameters.

Key words: feature selection, sparse solution, L₁-norm, discriminative regularization item, classification

中图分类号:

TP181

吴锦华, 左开中, 接标, 丁新涛. 新颖的判别性特征选择方法[J]. 计算机应用, 2015, 35(10): 2752-2756.

WU Jinhua, ZUO Kaizhong, JIE Biao, DING Xintao. New discriminative feature selection method[J]. Journal of Computer Applications, 2015, 35(10): 2752-2756.

参考文献

[1] GUYON I, ELISSEEFF A. An introduction to variable and feature selection [J]. Journal of Machine Learning Research, 2003, 3: 1157-1182.
[2] HE X, CAI D, NIYOGI P. Laplacian score for feature selection [EB/OL].[2014-10-10]. http://people.cs.uchicago.edu/~niyogi/papersps/HeCaiNiylapscore.pdf.
[3] YU L, LIU H. Feature selection for high-dimensional data: a fast correlation-based filter solution [EB/OL]. [2014-10-10]. http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.2975.
[4] WESTON J, GUYON I. Support vector machine-recursive feature elimination (SVM-RFE): US, US8095483 B2[P]. 2010.
[5] TIBSHIRANI R. Regression shrinkage and selection via the LASSO: a retrospective [J]. Journal of the Royal Statistical Society, 2011, 73(3):273-282.
[6] PUDIL P, NOVOVICOVA1 J, KITTLER J. Floating search methods in feature selection [J]. Pattern Recognition Letters, 1994, 15(11): 1119-1125.
[7] NG A Y. Feature selection, L₁ vs. L₂ regularization, and rotational invariance[J]. International Conferences on Machine Learning, 2004, 19(5):379-387.
[8] ZHOU J, LU Z, SUN J, et al. FeaFiner: biomarker identification from medical data through feature generalization and selection [C]//KDD 2013: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2013: 1034-1042.
[9] LIU F, WEE C Y, CHEN H. Inter-modality relationship constrained multi-modality multi-task feature selection for Alzheimer's disease and mild cognitive impairment identification [J]. NeuroImage, 2014, 84: 466-475.
[10] ZOU H, HASTIE T. Regularization and variable selection via the elastic net [J]. Journal of the Royal Statistical Society, 2005, 67(2): 301-320.
[11] TIBSHIRANI R, SAUNDERS M, ROSSET S. Sparsity and smoothness via the fused lasso[J]. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 2005, 67(1): 91-108.
[12] YE G, XIE X. Split Bregman method for large scale fused lasso [J]. Computational Statistics & Data Analysis, 2011, 55(4): 1552-1569.
[13] YAMADA M, JITKRITTUM W, SIGAL L. High-dimensional feature selection by feature-wise kernelized lasso [J]. Neural Computation, 2014, 26(1): 185-207.
[14] CHEN X, PAN W, KWOK J T, et al. Accelerated gradient method for multi-task sparse learning problem [C]//Proceedings of the Ninth IEEE International Conference on Data Mining. Piscataway: IEEE Press,2009:746-751.
[15] LIU J, YE J. Efficient L₁/L_q norm regularization [R]. Arizona: Arizona State University, 2009.
[16] CAI D, HE X, ZHOU K. Locality sensitive discriminant analysis [C]//Proceedings of the 2007 International Joint Conference on Artifical Intelligence. [S. l.]: Morgan Kaufmann Press, 2007: 708-713.
[17] XUE H, CHEN S, YANG Q. Discriminatively regularized least-squares classification [J]. Pattern Recognition, 2009,42(1):93-104.

新颖的判别性特征选择方法

New discriminative feature selection method

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	毛铭泽, 曹芮浩, 闫春钢. 基于权值多样性的半监督分类算法[J]. 计算机应用, 2021, 41(9): 2473-2480.
[2]	湛航, 何朗, 黄樟灿, 李华峰, 张蔷, 谈庆. 改进的基于层次距离的基因表达式编程特征选择分类算法[J]. 计算机应用, 2021, 41(9): 2658-2667.
[3]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[4]	胡天杰, 胡文军, 王士同. 分布熵惩罚的支持向量数据描述[J]. 计算机应用, 2021, 41(8): 2212-2218.
[5]	祝承, 赵晓琦, 赵丽萍, 焦玉宏, 朱亚飞, 陈建英, 周伟, 谭颖. 基于谱聚类半监督特征选择的功能磁共振成像数据分类[J]. 计算机应用, 2021, 41(8): 2288-2293.
[6]	李蒙蒙, 秦伟, 刘艺, 刁兴春. 结合头脑风暴优化的混合蚁群优化算法[J]. 计算机应用, 2021, 41(8): 2412-2417.
[7]	朱亮, 徐华, 崔鑫. 基于基分类器系数和多样性的改进AdaBoost算法[J]. 计算机应用, 2021, 41(8): 2225-2231.
[8]	张洋, 江铭虎. 基于注意力机制的文本作者识别[J]. 计算机应用, 2021, 41(7): 1897-1901.
[9]	肖振远, 王逸涵, 罗建桥, 熊鹰, 李柏林. 基于部分加权损失函数的RefineDet[J]. 计算机应用, 2021, 41(7): 1928-1932.
[10]	尹春勇, 张帼杰. 面向分布式漂移数据流的集成分类模型[J]. 计算机应用, 2021, 41(7): 1947-1955.
[11]	章惠, 张娜娜, 黄俊. 优化LeNet-5网络的多角度头部姿态估计方法[J]. 计算机应用, 2021, 41(6): 1667-1672.
[12]	史杨潇, 章军, 陈鹏, 王兵. 基于轻量级网络的钢铁表面缺陷分类[J]. 计算机应用, 2021, 41(6): 1836-1841.
[13]	贾鹤鸣, 郎春博, 姜子超. 基于轻量级卷积神经网络的植物叶片病害识别方法[J]. 计算机应用, 2021, 41(6): 1812-1819.
[14]	郭帅, 苏旸. 基于数据流的加密流量分类方法[J]. 计算机应用, 2021, 41(5): 1386-1391.
[15]	贾鹤鸣, 姜子超, 李瑶, 孙康健. 基于改进斑点鬣狗优化算法的同步优化特征选择[J]. 计算机应用, 2021, 41(5): 1290-1298.