Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Labeling certainty enhancement-oriented positive and unlabeled learning algorithm
Yulin HE, Peng HE, Zhexue HUANG, Weicheng XIE, Fournier-Viger PHILIPPE
Journal of Computer Applications    2025, 45 (7): 2101-2112.   DOI: 10.11772/j.issn.1001-9081.2024070953
Abstract24)   HTML2)    PDF (3586KB)(9)       Save

Positive and Unlabeled Learning (PUL) is used to train classifiers with performance that can be accepted by practical applications when negative samples are unknown by utilizing a few known positive samples and many unlabeled samples. The existing PUL algorithms have a common flaw: big uncertainty in labeling unlabeled samples, leading to inaccurate classification boundaries learnt by the classifier and limiting the classifier’s generalization ability on new data. To solve this issue, an unlabeled sample Labeling Certainty Enhancement-oriented PUL (LCE-PUL) algorithm was proposed. Firstly, reliable positive samples were selected on the basis of similarity between posterior probability mean on the validation set and center point of the positive sample set, and the labeling process was refined gradually through iterations, so as to increase the accuracy of preliminary category judgments of unlabeled samples, thereby improving the certainty of labeling unlabeled samples. Secondly, these reliable positive samples were merged with the original positive sample set to form a new positive sample set, and then this set was removed from the unlabeled sample set. Thirdly, the new unlabeled sample set was traversed, and reliable positive samples were selected again based on similarity of each sample and multiple neighboring points, so as to further improve the inference of potential labels, thereby reducing mislabeling and enhancing certainty of labeling. Finally, the positive sample set was updated, and the unselected unlabeled samples were treated as negative samples. The feasibility, rationality, and effectiveness of LCE-PUL algorithm were validated on representative datasets. With the increase of iterations, the training of the LCE-PUL algorithm shows a convergent characteristic. When the proportion of positive samples is 40%, 35%, and 30%, the test accuracy of the classifier constructed by the LCE-PUL algorithm is improved by 5.8, 8.8, and 7.6 percentage points at most compared with the five representative comparative algorithms, including the Biased Support Vector Machine based on a specific cost function (Biased-SVM) algorithm, the Dijkstra-based Label Propagation for PUL (LP-PUL) algorithm, and the PUL by Label Propagation (PU-LP) algorithm. Experimental results show that LCE-PUL is an effective machine learning algorithm for handling PUL problems.

Table and Figures | Reference | Related Articles | Metrics