《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (2): 343-348.DOI: 10.11772/j.issn.1001-9081.2021071198

• 人工智能 • 上一篇    

元学习的不确定性特征构建及初步分析

李艳1,2,3(), 郭劼1,2, 范斌1,2   

  1. 1.河北大学 数学与信息科学学院, 河北 保定 071002
    2.河北省机器学习与计算智能重点实验室(河北大学), 河北 保定 071002
    3.北京师范大学珠海校区 应用数学与交叉科学研究中心, 广东 珠海 519087
  • 收稿日期:2021-07-12 修回日期:2021-08-06 接受日期:2021-08-12 发布日期:2022-02-21 出版日期:2022-02-10
  • 通讯作者: 李艳
  • 作者简介:李艳(1976—),女,河北衡水人,教授,博士,CCF会员,主要研究方向:机器学习、不确定性信息处理;
    郭劼(1995—),男,河北邯郸人,硕士研究生,主要研究方向:机器学习、不确定性信息处理;
    范斌(1995—),男,河北邢台人,硕士研究生,主要研究方向:机器学习、粒计算、知识发现。
  • 基金资助:
    国家自然科学基金资助项目(61976141);河北省教育厅科学技术重点项目(ZD2019021)

Feature construction and preliminary analysis of uncertainty for meta-learning

Yan LI1,2,3(), Jie GUO1,2, Bin FAN1,2   

  1. 1.College of Mathematics and Information Science,Hebei University,Baoding Hebei 071002,China
    2.Hebei Key Laboratory of Machine Learning and Computational Intelligence (Hebei University),Baoding Hebei 071002,China
    3.Research Center for Applied Mathematics and Interdisciplinary Sciences,Beijing Normal University at Zhuhai,Zhuhai Guangzhou 519087,China
  • Received:2021-07-12 Revised:2021-08-06 Accepted:2021-08-12 Online:2022-02-21 Published:2022-02-10
  • Contact: Yan LI
  • About author:LI Yan, born in 1976, Ph. D., professor. Her research interests include machine learning, uncertain information processing.
    GUO Jie, born in 1995, M. S. candidate. His research interests include machine learning, uncertain information processing.
    FAN Bin, born in 1995, M. S. candidate. His research interests include machine learning, granular computing, knowledge discovery.
  • Supported by:
    National Natural Science Foundation of China(61976141);Key Science and Technology Program of Hebei Educational Department(ZD2019021)

摘要:

元学习即应用机器学习的方法(元算法)寻求问题的特征(元特征)与算法相对性能测度间的映射,从而形成元知识的学习过程,如何构建和提取元特征是其重要的研究内容。针对目前相关研究所用到的元特征大部分是数据的统计特征的问题,提出不确定性建模并研究不确定性对于学习系统的影响。根据样本的不一致性、边界的复杂性、模型输出的不确定性、线性可分度、属性的重叠度以及特征空间的不确定性,建立了六种数据或模型的不确定性元特征;同时,从不同角度衡量学习问题本身的不确定性大小,并给出了具体的定义。在大量分类问题的人工数据和真实数据集上实验分析了这些元特征之间的相关性,并使用K最近邻(KNN)等多个分类算法对元特征与测试精度之间的相关度进行初步分析。结果表明相关度平均在0.8左右,可见这些元特征对学习性能具有显著影响。

关键词: 元学习, 元特征, 不确定性度量, 相关性分析, 数据集特征

Abstract:

Meta-learning is the learning process of applying machine learning methods (meta-algorithms) to seek the mapping between features of a problem (meta-features) and relative performance measures of the algorithm, thereby forming the learning process of meta-knowledge. How to construct and extract meta-features is an important research content. Concerning the problem that most of meta-features used in the existing related researches are statistical features of data, uncertainty modeling was proposed and the impact of uncertainty on learning system was studied. Based on inconsistency of data, complexity of boundary, uncertainty of model output, linear capability to be classified, degree of attribute overlap, and uncertainty of feature space, six kinds of uncertainty meta-features were established for data or models. At the same time,the uncertainty size of the learning problem itself was measured from different perspectives, and specific definitions were given. The correlations between these meta-features were analyzed on artificial datasets and real datasets of a large number of classification problems, and multiple classification algorithms such as K-Nearest Neighbor (KNN) were used to conduct a preliminary analysis of the correlation between meta-features and test accuracy. Results show that the average degree of correlation is about 0.8, indicating that these meta-features have a significant impact on learning performance.

Key words: meta-learning, meta-feature, uncertainty measure, correlation analysis, characteristics of dataset

中图分类号: