计算机应用 ›› 2021, Vol. 41 ›› Issue (6): 1673-1678.DOI: 10.11772/j.issn.1001-9081.2020091384

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

基于改进的BSMOTE和时序特征的风机故障采样算法

杨鲜1, 赵计生2, 强保华1, 米路中2, 彭博3, 唐成华1, 李宝莲3   

  1. 1. 广西图像图形与智能处理重点实验室(桂林电子科技大学), 广西 桂林 541004;
    2. 北京华电天仁电力控制技术有限公司, 北京 100039;
    3. 中国电子科技集团公司第54研究所, 石家庄 050081
  • 收稿日期:2020-09-07 修回日期:2020-12-16 出版日期:2021-06-10 发布日期:2021-01-26
  • 通讯作者: 强保华
  • 作者简介:杨鲜(1994-),男,四川成都人,硕士研究生,主要研究方向:风电大数据、机器学习;赵计生(1980-),男,山西忻州人,高级工程师,硕士,主要研究方向:电气自动化、新能源智慧发电、风电远程监控;强保华(1972-),男,河南南阳人,教授,博士,CCF会员,主要研究方向:大数据分析、图像处理;米路中(1980-),男,江西金溪人,高级工程师,主要研究方向:风电大数据、人工智能;彭博(1990-),男,河北沙河人,工程师,主要研究方向:开源情报、大数据应用;唐成华(1974-),男,湖北黄冈人,教授,博士,主要研究方向:网络与信息安全、大数据处理与挖掘;李宝莲(1972-),女,河北石家庄人,高级工程师,硕士,主要研究方向:软件可靠性设计、大数据分析。
  • 基金资助:
    国家自然科学基金资助项目(61762025,62062028);广西重点研发计划项目(AB18126053,AB18126063,AD18281002);国家能源科技环保集团股份有限公司资助项目(IKY.2019.0002);广西自然科学基金资助项目(2017GXNSFAA198226,2019GXNSFDA185007,2019GXNSFDA185006,2018GXNSFAA294058);广西科技重大专项(AA18118031,AA18242028);中电科54所发展基金资助项目(SXX18138X017);桂林电子科技大学研究生教育创新项目(2019YCXS051,2020YCXS052);广西图像图形与智能处理重点实验室基金资助项目(GIIP201603,GIIP1806)。

Wind turbine fault sampling algorithm based on improved BSMOTE and sequential characteristics

YANG Xian1, ZHAO Jisheng2, QIANG Baohua1, MI Luzhong2, PENG Bo3, TANG Chenghua1, LI Baolian3   

  1. 1. Guangxi Key Laboratory of Image and Graphic Intelligent Processing(Guilin University of Electronic Technology), Guilin Guangxi 541004, China;
    2. Beijing Huadian Tianren Electric Power Control Technology Company Limited, Beijing 100039, China;
    3. The 54 th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang Hebei 050081, China
  • Received:2020-09-07 Revised:2020-12-16 Online:2021-06-10 Published:2021-01-26
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61762025, 62062028), the Guangxi Key Research and Development Program (AB18126053, AB18126063, AD18281002), the Program of CHN Energy Technology and Environment Group Corporation Limited (IKY.2019.0002), the Natural Science Foundation of Guangxi (2017GXNSFAA198226, 2019GXNSFDA185007, 2019GXNSFDA185006, 2018GXNSFAA294058), the Guangxi Key Science and Technology Program (AA18118031, AA18242028), the Development Foundation of CETC54 (SXX18138X017), the Innovation Project of GUET Graduate Education (2019YCXS051, 2020YCXS052), the Fund of Guangxi Key Laboratory of Image and Graphic Intelligent Processing (GIIP201603, GIIP1806).

摘要: 针对风机数据集的不平衡问题,提出了一种BSMOTE-Sequence采样算法,在合成新样本时综合考虑空间和时间特征,并对新样本进行清洗,从而有效减少噪声点的生成。首先,根据每个少数类样本的近邻样本的类别比例,将少数类样本划分为安全类样本、边界类样本和噪声类样本。然后,对每个边界类样本都遴选出空间距离、时间跨度最接近的少数类样本集,利用线性插值法合成新样本,并过滤掉噪声类样本以及类间重叠样本。最后,以支持向量机(SVM)、卷积神经网络(CNN)、长短期记忆(LSTM)人工神经网络作为风机齿轮箱故障检测模型,F1-Score、曲线下面积(AUC)和G-mean作为模型性能评价指标,在真实风机数据集上把所提算法与常用的多种采样算法进行对比,实验结果表明:相比已有算法,BSMOTE-Sequence算法所生成样本的分类效果更好,使得检测模型的F1-Score、AUC和G-mean平均提高了3%,该算法能有效地适用于数据具有时序规律且不平衡的风机故障检测领域。

关键词: 风机故障检测, 不均衡数据, 时序特征, 采样算法, 类间重叠样本

Abstract: To solve the imbalance problem of wind turbine dataset, a Borderline Synthetic Minority Oversampling Technique-Sequence (BSMOTE-Sequence) sampling algorithm was proposed. In the algorithm, when synthesizing new samples, the space and time characteristics were considered comprehensively, and the new samples were cleaned, so as to effectively reduce the generation of noise points. Firstly, the minority class samples were divided into security class samples, boundary class samples and noise class samples according to the class proportion of the nearest neighbor samples of each minority class sample. Secondly, for each boundary class sample, the minority class sample set with the closest spatial distance and time span was selected, the new samples were synthesized by linear interpolation method, and the noise class samples and the overlapping samples between classes were filtered out. Finally, Support Vector Machine (SVM), Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) were used as the fault detection models of wind turbine gear box, and F1-Score, Area Under Curve (AUC) and G-mean were used as performance evaluation indices of the models, and the proposed algorithm was compared with other sampling algorithms on real wind turbine datasets. Experimental results show that, compared with those of the existing algorithms, the classification effect of the samples generated by BSMOTE-Sequence algorithm is better with an average increase of 3% in F1-Score, AUC and G-mean of the detection models. The proposed algorithm can be effectively applicable to the field of wind turbine fault detection where the data with sequential rule is imbalanced.

Key words: wind turbine fault detection, imbalanced data, sequential characteristic, sampling algorithm, overlapping sample between classes

中图分类号: