计算机应用 ›› 2018, Vol. 38 ›› Issue (11): 3150-3155.DOI: 10.11772/j.issn.1001-9081.2018041259

• 第七届中国数据挖掘会议(CCDM 2018) • 上一篇    下一篇

基于增强特征判别性的典型相关分析和分类集成的助学金预测方法

张芳娟1,2, 杨燕1,2, 杜圣东1,2   

  1. 1. 西南交通大学 信息科学与技术学院, 成都 611756;
    2. 四川省云计算与智能技术高校重点实验室(西南交通大学), 成都 611756
  • 收稿日期:2018-04-30 修回日期:2018-06-15 出版日期:2018-11-10 发布日期:2018-11-10
  • 通讯作者: 杨燕
  • 作者简介:张芳娟(1993-),女,甘肃秦安人,硕士研究生,主要研究方向:数据挖掘、集成学习、多视图;杨燕(1964-),女,安徽合肥人,教授,博士,CCF杰出会员,主要研究方向:人工智能、大数据分析与挖掘、集成学习;杜圣东(1981-),男,重庆云阳人,讲师,博士研究生,CCF会员,主要研究方向:数据挖掘、机器学习、交通大数据、医疗大数据。
  • 基金资助:
    国家自然科学基金资助项目(61572407)。

Stipend prediction based on enhanced-discriminative canonical correlations analysis and classification ensemble

ZHANG Fangjuan1,2, YANG Yan1,2, DU Shengdong1,2   

  1. 1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu Sichuan 611756, China;
    2. Sichuan Key Lab of Cloud Computing and Intelligent Technique(Southwest Jiaotong University), Chengdu Sichuan 611756, China
  • Received:2018-04-30 Revised:2018-06-15 Online:2018-11-10 Published:2018-11-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61572407).

摘要: 针对高校资助管理办法效率低下、工作量大等问题,提出一种增强特征判别性的典型相关分析(EN-DCCA)方法,并结合分类集成方法实现高校学生助学金预测。将学生在校多维度数据划分为两个不同视图,已有的各种多视图判别典型相关分析算法没有综合考虑视图类别之间的相关性和视图组合特征的判别性两者因素。EN-DCCA的优化目标在最大化类内相关的同时最小化类间相关,并且考虑了视图组合特征的判别性,进一步强化了属性的判别性能,更有利于分类预测。高校学生助学金预测的实现过程:首先,根据学生生活行为和学习表现将数据预处理为两个不同视图,然后用EN-DCCA方法对这两个视图数据进行特征学习,最后用分类集成方法完成预测。在真实的数据集上进行实验,所提方法的预测准确率达到90.01%,较增强视图组合特征判别性的典型相关分析(CECCA)的集成方法提高了2个百分点,实验结果表明,所提方法能有效实现高校助学金预测。

关键词: 分类集成, 多视图, 典型相关分析, 增强视图特征判别性

Abstract: For low efficiency and high workload of higher education institution's stipend management, an algorithm of Enhanced-Discriminative Canonical Correlations Analysis (EN-DCCA) was proposed, and the method of classification ensemble was combined to predict the stipend of undergraduates. The multi-dimensional data of undergraduates at school were divided into two different views. The existing multi-view discriminative canonical correlation analysis algorithms do not consider both the correlation between view categories and the discrimination of view's combination features. The optimization goal of EN-DCCA was to minimize inter-class correlation while maximizing intra-class correlation and considered the discrimination of view's combination features, which further enhanced the performance of attribute identification and was more conducive to classification prediction. The process of undergraduates' stipend prediction is as follows:firstly, according to undergraduates' learning behavior and living behavior at school, the data was preprocessed as two different views. Then, the two views were learned by EN-DCCA. Finally, the classification ensemble was used to complete the prediction. Experimented on a real data set, the prediction accuracy of the proposed method reached 90.01%, which was 2 percentage points higher than that of Combined-feature-discriminability Enhanced Canonical Correlation Analysis (CECCA) method. The experimental results show that the proposed method can effectively achieve the stipend prediction for higher education institutions.

Key words: classification ensemble, multi-view, Canonical Correlation Analysis (CCA), enhanced discriminablity of view features

中图分类号: