计算机应用 ›› 2017, Vol. 37 ›› Issue (3): 866-870.DOI: 10.11772/j.issn.1001-9081.2017.03.866

• 数据科学与技术 • 上一篇    下一篇

基于特征聚类集成技术的在线特征选择

杜政霖1, 李云1,2   

  1. 1. 南京邮电大学 计算机学院, 南京 210003;
    2. 桂林电子科技大学 广西高校云计算与复杂系统重点实验室, 广西 桂林 541004
  • 收稿日期:2016-08-17 修回日期:2016-10-24 出版日期:2017-03-10 发布日期:2017-03-22
  • 通讯作者: 杜政霖
  • 作者简介:杜政霖(1991-),男,江苏徐州人,硕士研究生,主要研究方向:在线特征选择;李云(1974-),男,安徽望江人,教授,博士,CCF会员,主要研究方向:机器学习、模式识别。
  • 基金资助:
    江苏省自然科学基金资助项目(BK20131378,BK20140885);广西高校云计算与复杂系统重点实验室资助项目(15206)。

Online feature selection based on feature clustering ensemble technology

DU Zhenglin1, LI Yun1,2   

  1. 1. College of Computer, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210003, China;
    2. Guangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
  • Received:2016-08-17 Revised:2016-10-24 Online:2017-03-10 Published:2017-03-22
  • Supported by:
    This work is partially supported by the Natural Science Foundation of Jiangsu Province (BK20131378, BK20140885), the Funds of Guangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems (15206).

摘要: 针对既有历史数据又有流特征的全新应用场景,提出了一种基于组特征选择和流特征的在线特征选择算法。在对历史数据的组特征选择阶段,为了弥补单一聚类算法的不足,引入聚类集成的思想。先利用k-means方法通过多次聚类得到一个聚类集体,在集成阶段再利用层次聚类算法对聚类集体进行集成得到最终的结果。在对流特征数据的在线特征选择阶段,对组构造产生的特征组通过探讨特征间的相关性来更新特征组,最终通过组变换获得特征子集。实验结果表明,所提算法能有效应对全新场景下的在线特征选择问题,并且有很好的分类性能。

关键词: 组特征选择, 聚类集成, 流特征, 在线特征选择

Abstract: According to the new application scenario with both historical data and stream features, an online feature selection based on group feature selection algorithm and streaming features was proposed. To compensate for the shortcomings of single clustering algorithm, the idea of clustering ensemble was introduced in the group feature selection of historical data. Firstly, a cluster set was obtained by multiple clustering using k-means method, and the final result was obtained by integrating hierarchical clustering algorithm in the integration stage. In the online feature selection phase of the stream feature data, the feature group generated by the group structure was updated by exploring the correlation among the features, and finally the feature subset was obtained by group transformation. The experimental results show that the proposed algorithm can effectively deal with the online feature selection problem in the new scenario, and has good classification performance.

Key words: group feature selection, clustering ensemble, streaming feature, online feature selection

中图分类号: