Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Oversampling method for imbalanced data based on sample potential and noise evolution
Qiangkui LENG, Xuezi SUN, Xiangfu MENG
Journal of Computer Applications    2024, 44 (8): 2466-2475.   DOI: 10.11772/j.issn.1001-9081.2023081145
Abstract193)   HTML2)    PDF (2780KB)(82)       Save

In dealing with the problem of imbalanced data classification, oversampling methods are effective strategies. Existing methods mostly employ K-Nearest Neighbor (KNN) technique to select oversampling seed samples, but changes in KNN parameter values often lead to significant instability for most oversampling methods. Radial-Basis Oversampling (RBO) method can address this issue, but it tends to introduce a substantial amount of noise after oversampling. An imbalanced data oversampling method based on sample potential and noise evolution was proposed to further iteratively refine the oversampled dataset. Firstly, the RBO method was used to synthesize minority class samples and improve the imbalance of the original data by calculating sample potential. Secondly, Natural Neighbor (NaN) was employed as an error detection technique to identify suspected noise samples in the oversampled dataset. Finally, an improved Differential Evolution (DE) method was applied to iteratively refine the detected suspected noise samples. Compared to traditional oversampling methods, the proposed method can better explore important boundary information in the dataset, thus providing more assistance to classifiers to improve their classification performance. Extensive comparative experiments were conducted on 22 benchmark datasets with seven classical sampling methods (combined with three different classifiers). The experiment results show that the proposed method achieves higher F1 values and G-mean values and is superior in noise handling compared to sampling methods with post-filters, which can more effectively deal with the problem of imbalanced data classification. In addition, statistical analysis also indicates the proposed method achieves a higher Friedman ranking.

Table and Figures | Reference | Related Articles | Metrics