计算机应用 ›› 2015, Vol. 35 ›› Issue (7): 1939-1944.DOI: 10.11772/j.issn.1001-9081.2015.07.1939

• 人工智能 • 上一篇    下一篇

多标记数据特征提取方法的核改进

李华1,2, 李德玉1, 王素格3, 张晶3   

  1. 1. 山西大学 计算智能与中文信息处理教育部重点实验室, 太原 030006;
    2. 石家庄铁道大学 数理系, 石家庄 050043;
    3. 山西大学 计算机与信息技术学院, 太原 030006
  • 收稿日期:2015-01-26 修回日期:2015-03-30 出版日期:2015-07-10 发布日期:2015-07-17
  • 通讯作者: 李德玉(1965-),男,山西太原人,教授,博士,CCF会员,主要研究方向:智能计算、数据挖掘,lidy@sxu.edu.cn
  • 作者简介:李华(1978-),女,河北石家庄人,讲师,博士研究生,主要研究方向:粒计算、数据挖掘; 王素格(1964-),女,山西太原人,教授,博士,CCF会员,主要研究方向:中文信息处理、文本倾向分析、机器学习; 张晶(1990-),女,山西运城人,硕士研究生,主要研究方向:中文信息处理、数据挖掘。
  • 基金资助:

    国家自然科学基金资助项目(61272095);山西省回国留学人员科研项目(2013-014);河北省教育厅项目(Z2014106)。

Kernel improvement of multi-label feature extraction method

LI Hua1,2, LI Deyu1, WANG Suge3, ZHANG Jing3   

  1. 1. Key Laboratory of Computational Intelligence and Chinese Information Processing, Ministry of Education (Shanxi University), Taiyuan Shanxi 030006, China;
    2. Department of Mathematics and Physics, Shijiazhuang Tiedao University, Shijiazhuang Hebei 050043, China;
    3. School of Computer and Information Technology, Shanxi University, Taiyuan Shanxi 030006, China
  • Received:2015-01-26 Revised:2015-03-30 Online:2015-07-10 Published:2015-07-17

摘要:

针对多标记数据特征提取方法中输出核函数没有准确刻画标记间的相关性的问题,在充分度量标记间相关性的基础上,提出了两种新的输出核函数构造方法。第一种方法首先将多标记数据转化为单标记数据,并使用标记集合来刻画标记间的相关性;然后从损失函数的角度出发定义新的输出核函数。第二种方法是利用互信息来度量标记间的两两相关性,在此基础上进一步构造新的输出核函数。3个多标记数据集上2种分类器的实验结果表明,与原有核函数对应的多标记特征提取方法相比,基于损失函数的输出核函数对应的特征提取方法性能最好,5个评价指标的性能平均提高了10%左右, 尤其在Yeast数据集上,Coverage指标下降幅度达到了30%左右;基于互信息的输出核函数次之,性能平均提高了5%左右。实验结果表明,基于新的输出核函数的特征提取方法能够更加有效地提取特征,并进一步简化分类器的学习过程,提高分类器的泛化性能。

关键词: 多标记学习, 特征提取, 核函数, 损失函数, 互信息

Abstract:

Focusing on the issue that the label kernel functions do not take the correlation between labels into consideration in the multi-label feature extraction method, two construction methods of new label kernel functions were proposed. In the first method, the multi-label data were transformed into single-label data, and thus the correlation between labels could be characterized by the label set; then a new label kernel function was defined from the perspective of loss function of single-label data. In the second method, mutual information was used to characterize the correlation between labels, and a new label kernel function was proposed from the perspective of mutual information. Experiments on three real-life data sets using two multi-label classifiers demonstrated that the best method of all measures was feature extraction method with label kernel function based on loss function and the performance of five evaluation measures on average increased by 10%; especially on the data set Yeast, the evaluation measure Coverage reached a decline of about 30%. Closely followed by feature extraction method with label kernel function based on mutual information and the performance of five evaluation measures on average increased by 5%. The theoretical analysis and simulation results show that the feature extraction methods based on new output kernel functions can effectively extract features, simplify learning process of multi-label classifiers and, moreover, improve the performance of multi-label classification.

Key words: multi-label learning, feature extraction, kernel function, loss function, mutual information

中图分类号: