Feature transfer weighting algorithm based on distribution and term frequency-inverse class frequency
QIU Yunfei1, LIU Shixing1, LIN Mingming1, SHAO Liangshan2
1. School of Software, Liaoning Technical University, Huludao Liaoning 125105, China;
2. System Engineering Institute, Liaoning Technical University, Huludao Liaoning 125105, China
Traditional machine learning faces a problem: when the training data and test data no longer obey the same distribution, the classifier trained by training data can't classify test data accurately. To solve this problem, according to the transfer learning principle, the features were weighted according to the improved distribution similarity of source domain and target domain's intersection features. The semantic similarity and Term Frequency-Inverse Class Frequency (TF-ICF) were used to weight non-intersection features in source domain. Lots of labeled source domain data and a little labeled target domain were used to obtain the required features for building text classifier quickly. The experimental results on test dataset 20Newsgroups and non-text dataset UCI show that feature transfer weighting algorithm based on distribution and TF-ICF can transfer and weight features rapidly while guaranteeing precision.
邱云飞, 刘世兴, 林明明, 邵良杉. 基于分布和逆文本类别指数的特征迁移加权算法[J]. 计算机应用, 2015, 35(6): 1643-1648.
QIU Yunfei, LIU Shixing, LIN Mingming, SHAO Liangshan. Feature transfer weighting algorithm based on distribution and term frequency-inverse class frequency. Journal of Computer Applications, 2015, 35(6): 1643-1648.
[1] LIU X. Text classification research based on instance transfer learning [D]. Changchun: Jilin University, 2014. (刘晓明.基于实例迁移学习的文本分类研究[D].长春:吉林大学, 2014.) [2] XU M, WANG S, GU X. TL-SVM: A transfer learning algorithm[J]. Control and Decision, 2014, 29(1): 141-146. (许敏, 王士同, 顾鑫.TL-SVM: 一种迁移学习算法[J].控制与决策, 2014, 29(1):141-146.) [3] GUO Y. Research of transfer learning based on single-source and mutil-source [D]. Xi'an: Xidian University, 2013. (郭勇.基于单源及多源的迁移学习方法研究[D].西安:西安电子科技大学, 2013.) [4] WANG X, PAN J, CHENG Y, et al. Self-adaptive transfer for decision trees based on similarity metric[J]. Acta Automatic Sinica, 2013, 39(12): 2186-2192. (王雪松, 潘杰, 程玉虎, 等.基于相似度衡量的决策树自适应迁移[J].自动化学报, 2013, 39(12):2186-2192.) [5] KALE D, YAN L. Transfer learning based on the observation probability of each attribute [C]//Proceedings of the 2013 IEEE 13th International Conference on Data Mining. Piscataway: IEEE, 2014: 3627-3631. [6] WANG H, FAN S, SONG J, et al. Reinforcement learning transfer based on subgoal discovery and subtask similarity [J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(3): 257-266. [7] FANG Z, ZHANG Z. Cross domain shared subspace learning for unsupervised transfer classification [C]//Proceedings of the 2014 22nd International Conference on Pattern Recognition. Stockholm: ICPR, 2014: 3927-3932. [8] ZHU X. Cross-domain semi-supervised learning using feature formulation [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2012: 6(41): 1627-1638. [9] RAINA R, NG A Y, KOLLER D. Constructing informative priors using transfer learning [C]//Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 713-720. [10] DAI W, XUE G, YANG Q, et al. Transferring native Bayes classifiers for text classification [C]//Proceedings of the 22rd AAAI Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2007: 540-545. [11] DAI W, XUE G, YANG Q, et al. Co-clustering based classification for out-of-domain documents [C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2007: 210-219. [12] ARNOLD A, NALLAPATI R, COHEN W. A comparative study of methods for transductive transfer learning [C]//Proceedings of the 7th IEEE International Conference on Data Mining Workshops. Washington, DC: IEEE Computer Society, 2007: 77-82. [13] KALE D, LIU Y. Accelerating active learning with transfer learning[C]//Proceedings of the 2013 IEEE 13th International Conference on Data Mining. Piscataway: IEEE, 2013: 1085-1090. [14] OYEN D, LANE T. Bayesian discovery of multiple Bayesian networks via transfer learning [C]//Proceedings of the 2013 IEEE 13th International Conference on Data Mining. Piscataway: IEEE, 2013: 577-586. [15] WUP, DIETTERIEH G. Improving svm accuracy by training on auxiliary data sources [C]//Proceedings of the 21st International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 2004: 871-878. [16] ANDO S, SUZUKI E. Unsupervised cross-domain learning by interaction information co-clustering [C]//Proceedings of the 8th IEEE International Conference on Data Mining. Piscataway: IEEE, 2008: 13-22. [17] KAMISHIMA T, HAMASAKI M, AKAHO S. TrBagg: A simple transfer learning method and its application to personalization in collaborative tagging [C]//Proceedings of the 9th IEEE International Conference of Data Mining. Piscataway: IEEE, 2009: 219-228. [18] ZHANG Y, YEUNG D. Transfer metric learning by learning task relationships [C]//Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2010: 1199-1208. [19] CHEN B, LAM W, TSANG I, et al. Extracting discriminative concepts for domain adaptation in text mining [C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 179-188. [20] WANG P, DOMENICONI C, HU J. Using Wikipeda for co-clustering based cross-domain text classification [C]//Proceedings of the 8th IEEE International Conference on Data Mining. Piscataway: IEEE, 2008: 1085-1090. [21] GAO J, FAN W, JIANG J, et al. Knowledge transfer via multiple model local structure mapping [C]//Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2008: 283-291. [22] ARNOLD A, NALLAPATI R, COHEN W. Exploiting feature hierarchy for transfer learning in named entity recognition [C]//Proceedings of the 2008 Annual Meeting of the Association for Computational Linguistics. Stroudsberg: Association for Computational Linguistics, 2008: 245-253. [23] RICHMAN A, SCHONE P. Mining wiki resources for multilingual named entity recognition [C]//Proceedings of the 2008 Annual Meeting of the Association for Computational Linguistic. Stroudsberg: Association for Computational Linguistics, 2008: l-9. [24] GOLDWASSER D, ROTH D. Active sample selection for named entity transliteration [C]//Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Stroudsberg: Association for Computational Linguistics, 2008: 53-56. [25] PING Y. Research on clustering and text categorization based on support vector machine [D]. Beijing: Beijing University of Posts and Telecommunications, 2012. (平源. 基于支持向量机的聚类及文本分类研究[D]. 北京: 北京邮电大学, 2012.)