Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (8): 2204-2209.DOI: 10.11772/j.issn.1001-9081.2019010129

• Artificial intelligence • Previous Articles     Next Articles

Semi-supervised ensemble learning for video semantic detection based on pseudo-label confidence selection

YIN Yu, ZHAN Yongzhao, JIANG Zhen   

  1. School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang Jiangsu 212013, China
  • Received:2019-01-21 Revised:2019-03-10 Online:2019-04-15 Published:2019-08-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61672268), the Key Research & Development Program of Jiangsu Province (1721190141).

伪标签置信选择的半监督集成学习视频语义检测

尹玉, 詹永照, 姜震   

  1. 江苏大学 计算机科学与通信工程学院, 江苏 镇江 212013
  • 通讯作者: 詹永照
  • 作者简介:尹玉(1988-),男,黑龙江拜泉人,硕士研究生,主要研究方向:大数据、机器学习;詹永照(1962-),男,福建尤溪人,教授,博士,CCF高级会员,主要研究方向:人机交互、模式识别、多媒体技术;姜震(1976-),男,山东烟台人,副教授,博士,主要研究方向:机器学习、模式识别。
  • 基金资助:
    国家自然科学基金资助项目(61672268);江苏省重点研发计划项目(1721190141)。

Abstract: Focusing on the problems in video semantic detection that the insufficience of labeled samples would seriously affect the performance of the detection and the performances of the base classifiers in ensemble learning would be improved deficiently due to noise in the pseudo-label samples, a semi-supervised ensemble learning algorithm based on pseudo-label confidence selection was proposed. Firstly, three base classifiers were trained in three different feature spaces to get the label vectors of the base classifiers. Secondly, the error between the maximum and submaximal probability of a certain class of weighted fusion samples and the error between the maximum probability of a certain class of samples and the average probability of the other classes of samples were introduced as the label confidences of the base classifiers, and the pseudo-label and integrated confidence of samples were obtained through fusing label vectors and label confidences. Thirdly, samples with high degree of integrated confidence were added to the labeled sample set, and base classifiers were trained iteratively. Finally, the trained base classifiers were integrated to detect the video semantic concept collaboratively. The average accuracy of the algorithm on the experimental data set UCF11 reaches 83.48%. Compared with Co-KNN-SVM algorithm, the average accuracy is increased by 3.48 percentage points. The selected pseudo-label by the algorithm can reflect the overall variation among the class of samples and other classes, as well as the uniqueness of the class of samples, which can reduce the risk of using pseudo-label samples, and effectively improve the accuracy of video semantic concept detection.

Key words: video semantic concept detection, semi-supervised, ensemble learning, pseudo-label, confidence

摘要: 在视频语义检测中,有标记样本不足会严重影响检测的性能,而且伪标签样本中的噪声也会导致集成学习基分类器性能提升不足。为此,提出一种伪标签置信选择的半监督集成学习算法。首先,在三个不同的特征空间上训练出三个基分类器,得到基分类器的标签矢量;然后,引入加权融合样本所属某个类别的最大概率与次大概率的误差和样本所属某个类别的最大概率与样本所属其他各类别的平均概率的误差,作为基分类器的标签置信度,并融合标签矢量和标签置信度得到样本的伪标签和集成置信度;接着,选择集成置信度高的样本加入到有标签的样本集,迭代训练基分类器;最后,采用训练好的基分类器集成协作检测视频语义概念。该算法在实验数据集UCF11上的平均准确率到达了83.48%,与Co-KNN-SVM算法相比,平均准确率提高了3.48个百分点。该算法选择的伪标签能体现样本所属类别与其他类别的总体差异性,又能体现所属类别的唯一性,可减少利用伪标签样本的风险,有效提高视频语义概念检测的准确率。

关键词: 视频语义概念检测, 半监督, 集成学习, 伪标签, 置信度

CLC Number: