计算机应用 2015, Vol. 35 Issue (6): 1643-1648

邱云飞1, 刘世兴1, 林明明1, 邵良杉2   

  1. 1. 辽宁工程技术大学 软件学院, 辽宁 葫芦岛 125105;
    2. 辽宁工程技术大学 系统工程研究所, 辽宁 葫芦岛 125105
  收稿日期:2014-12-22 修回日期:2015-03-17
  • 通讯作者: 刘世兴(1990-),男,辽宁丹东人,硕士研究生,主要研究方向:数据挖掘、特征选择;494784913@qq.com
  • 作者简介:邱云飞(1976-),男,辽宁阜新人,教授,博士,CCF会员,主要研究方向:数据挖掘、情感分析;林明明(1989-),女,辽宁大连人,硕士研究生,主要研究方向:数据挖掘、情感分析;邵良杉(1961-),男,辽宁凌源人,教授,博士,主要研究方向:数据挖掘、情感分析。
Feature transfer weighting algorithm based on distribution and term frequency-inverse class frequency

QIU Yunfei1, LIU Shixing1, LIN Mingming1, SHAO Liangshan2   

  1. 1. School of Software, Liaoning Technical University, Huludao Liaoning 125105, China;
    2. System Engineering Institute, Liaoning Technical University, Huludao Liaoning 125105, China
  • Received:2014-12-22 Revised:2015-03-17 Online:2015-06-12



关键词: 迁移学习, 特征分布, 逆文本类别指数, 语义近似度, 特征加权


Traditional machine learning faces a problem: when the training data and test data no longer obey the same distribution, the classifier trained by training data can't classify test data accurately. To solve this problem, according to the transfer learning principle, the features were weighted according to the improved distribution similarity of source domain and target domain's intersection features. The semantic similarity and Term Frequency-Inverse Class Frequency (TF-ICF) were used to weight non-intersection features in source domain. Lots of labeled source domain data and a little labeled target domain were used to obtain the required features for building text classifier quickly. The experimental results on test dataset 20Newsgroups and non-text dataset UCI show that feature transfer weighting algorithm based on distribution and TF-ICF can transfer and weight features rapidly while guaranteeing precision.

Key words: transfer learning, feature distribution, Term Frequency-Inverse Class Frequency (TF-ICF), semantic similarity, feature weighting
