Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (12): 3437-3444.DOI: 10.11772/j.issn.1001-9081.2020060921

• 2020 China Conference on Granular Computing and Knowledge Discovery(CGCKD 2020) • Previous Articles     Next Articles

Multi-category active learning algorithm based on multiple clustering algorithms and multivariate linear regression

WANG Min1, WU Yubo1, MIN Fan2   

  1. 1. School of Electrical Engineering and Information, Southwest Petroleum University, Chengdu Sichuan 610500, China;
    2. School of Computer Science, Southwest Petroleum University, Chengdu Sichuan 610500, China
  • Received:2020-06-12 Revised:2020-08-28 Online:2020-12-10 Published:2020-10-20
  • Supported by:
    This work is partially supported by the National Science and Technology Major Project (2-16ZX05020-006), the Sichuan Youth Science and Technology Innovation Research Team Project (2019JDTD0017).

基于多种聚类算法和多元线性回归的多分类主动学习算法

汪敏1, 武禹伯1, 闵帆2   

  1. 1. 西南石油大学 电气信息学院, 成都 610500;
    2. 西南石油大学 计算机科学学院, 成都 610500
  • 通讯作者: 闵帆(1973-),男,重庆人,教授,博士,CCF会员,主要研究方向:粒计算、推荐系统、主动学习。minfanphd@163.com
  • 作者简介:汪敏(1980-),女,湖南邵阳人,教授,硕士,CCF会员,主要研究方向:数据挖掘、主动学习;武禹伯(1995-),男,黑龙江大庆人,硕士研究生,主要研究方向:主动学习
  • 基金资助:
    国家科技重大专项(2-16ZX05020-006);四川省青年科技创新研究团队项目(2019JDTD0017)。

Abstract: Concerning the problem that traditional lithology identification methods have low recognition accuracy and are difficult to integrate with geological experience organically, a multi-category Active Learning algorithm based on multiple Clustering algorithms and multivariate Linear regression algorithm (ALCL) was proposed. Firstly, the category matrix corresponding to each algorithm was obtained through multiple heterogeneous clustering algorithms, and the category matrices were labeled and pre-classified by querying common points. Secondly, the key examples used to train the weight coefficient model of the clustering algorithm were selected through the proposed priority largest search strategy and the most confusing query strategy. Thirdly, the objective solving function was defined, and the weight coefficients of clustering algorithms were obtained by training the key examples. Finally, the samples with high confidence in the results were classified by performing the classification calculation combined with the weight coefficient. Six public lithology datasets of oil wells in Daqing oilfield were used to carry out experiments. Experimental results show that when the classification accuracy of ALCL is the highest, it is improved by 2.07%-14.01% compared with those of the traditional supervised learning algorithms and other active learning algorithms. The results of hypothesis test and significance analysis prove that ALCL has better classification effect in lithology identification.

Key words: lithology identification, active learning, multivariate linear regression, sample selection strategy, density clustering

摘要: 针对传统岩性识别方法识别精度低,难以和地质经验有机结合的问题,提出了一种基于多种聚类算法和多元线性回归的多分类主动学习算法(ALCL)。首先,通过多种异构聚类算法聚类得到对应每种算法的类别矩阵,并通过查询公共点对类别矩阵进行标记和预分类;其次,提出优先级最大搜寻策略和最混乱查询策略选取用于训练聚类算法权重系数模型的关键实例;然后,定义目标求解函数,通过训练关键实例求解得到每种聚类算法的权重系数;最后,结合权重系数进行分类计算,从而对结果置信度高的样本进行分类。应用大庆油田油井的6个公开岩性数据集进行实验,实验结果表明,ALCL的分类精度最高时,比传统监督学习算法和其他主动学习算法提高了2.07%~14.01%。假设检验和显著性分析的结果验证了ALCL在岩性识别问题上具有更好的分类效果。

关键词: 岩性识别, 主动学习, 多元线性回归, 样本选择策略, 密度聚类

CLC Number: