稀疏分层概率自组织图实例迁移学习方法

doi:10.11772/j.issn.1001-9081.2016.03.692

计算机应用 ›› 2016, Vol. 36 ›› Issue (3): 692-696.DOI: 10.11772/j.issn.1001-9081.2016.03.692

稀疏分层概率自组织图实例迁移学习方法

吴蕾, 田儒雅, 张学福

中国农业科学院农业信息研究所, 北京 100081

收稿日期:2015-08-11 修回日期:2015-11-03 出版日期:2016-03-10 发布日期:2016-03-17
通讯作者: 张学福
作者简介:吴蕾(1985-),女,辽宁沈阳人,助理研究员,博士,主要研究方向:概率图模型、机器学习、知识组织;田儒雅(1983-),女,吉林辽源人,助理研究员,博士,CCF会员,主要研究方向:知识分析、主题发现;张学福(1966-),男,黑龙江哈尔滨人,研究员,博士,主要研究方向:知识组织与检索、信息可视化。
基金资助:
国家自然科学基金资助项目(61305018);国家社会科学基金资助项目(15CTQ030);中国博士后科学基金第57批面上资助项目(2015M571183);中国农业科学院科技创新工程项目。

Instance transfer learning model based on sparse hierarchical probabilistic self-organizing graphs

WU Lei, TIAN Ruya, ZHANG Xuefu

Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China

Received:2015-08-11 Revised:2015-11-03 Online:2016-03-10 Published:2016-03-17
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61305018), the National Social Science Foundation of China(15CTQ030), the China Postdoctoral Science foundation (2015M571183) and CAAS Agricultural Science and Technology Innovation Program.

摘要/Abstract

摘要： 针对基于实例的迁移学习在关联多源异构领域数据时遇到的数据颗粒度不匹配问题,以单领域分层概率自组织图(HiPSOG)聚类方法为基础,提出一种具有迁移学习能力的稀疏化非监督分层概率自组织图(TSHiPSOG)方法。首先,在源领域和目标领域分别基于概率混合多变量高斯分布生成分层自组织模型以便在多领域中分别提取不同粒度的表示向量,并用稀疏图方法通过概率准则控制模型增长;其次,利用最大信息系数(MIC),在具有富信息的源领域中寻找与目标领域表示向量最相似的表示向量,并利用这些源领域表示向量的类别标签细化目标领域数据分类;最后,在国际通用分类数据集20新闻组数据集和垃圾邮件检测数据集上进行了实验,结果表明算法可以利用源领域的有用信息辅助目标领域的分类问题,并使分类准确率最高提高约15.26%和9.05%;对比其他经典迁移学习方法,通过稀疏分层可以挖掘不同颗粒度的表示向量,分类准确率最高提高约4.48%和4.13%。

关键词: 机器学习, 迁移学习, 非监督学习, 分层算法, 稀疏图方法

Abstract: The current study of instance-transfer learning suffers from the mismatch between the granularities of data from multi-source heterogeneous domains. A Transfer Sparse unsupervised Hierarchical Probabilistic Self-Organizing Graph (TSHiPSOG) method based on the framework of Hierarchical Probabilistic Self-Organizing Graph (HiPSOG) method in the single domain was proposed. Firstly, representation vectors with different granularities were extracted from source and target domains by using hierarchical self-organizing model based on a probabilistic mixture of multivariate Gaussian component; and the sparse graph probabilistic criterion was used to control the growth of the model. Secondly, the most similar representation vector of the target domain data was searched in the rich-information source domain by using the Maximum Information Coefficient (MIC). Then, the data in the target domain was classified using labels of similar representation vectors in the source domain. Finally, the experimental results on the international universal 20 Newsgroups dataset and the spam detection dataset show that the proposed method improves the average classifying accuracy of target domain using the information from source domain by 15.26% and 9.05%. Moreover, the approach improves the average classifying accuracy with mining different granularity representation vectors by 4.48% and 4.13%.

Key words: machine learning, transfer learning, unsupervised learning, hierarchical method, sparse graphical method

中图分类号:

TP181

吴蕾, 田儒雅, 张学福. 稀疏分层概率自组织图实例迁移学习方法[J]. 计算机应用, 2016, 36(3): 692-696.

WU Lei, TIAN Ruya, ZHANG Xuefu. Instance transfer learning model based on sparse hierarchical probabilistic self-organizing graphs[J]. Journal of Computer Applications, 2016, 36(3): 692-696.

参考文献

[1] PAN S J, YANG Q. A survey on transfer learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2010,22(10):1345-1359.
[2] 史荧中,王士同,蒋亦樟,等.迁移学习支持向量回归机[J].计算机应用,2013,33(11):3084-3089.(SHI Y Z, WANG S T, JIANG Y Z, et al. Transfer learning support vector regression [J]. Journal of Computer Applications, 2013,33(11):3084-3089.)
[3] YANG P, TAN Q, DING Y. Bayesian task-level transfer learning for non-linear regression [C]//Proceedings of the 2008 International Conference on Computer Science and Software Engineering. Piscataway, NJ: IEEE, 2008:62-65.
[4] XIE S, FAN W, PENG J, et al. Latent space domain transfer between high dimensional overlapping distributions [C]//Proceedings of the 18th International Conference on World Wide Web. New York: ACM, 2009:91-100.
[5] DELGADO S, MORÁN F, MORA A, et al. A novel representation of genomic sequences for taxonomic clustering and visualization by means of self-organizing maps [J]. Bioinformatics, 2015,31(5):736-744.
[6] 邵超,万春红.基于自组织映射的流形学习与可视化[J].计算机应用,2013,33(7):1917-1921.(SHAO C, WAN C H. Manifold learning and visualization based on self-organizing map [J]. Journal of Computer Applications, 2013,33(7):1917-1921.)
[7] CHENG S S, FU H C, WANG H M. Model-based clustering by probabilistic self-organizing maps [J]. IEEE Transactions on Neural Networks, 2009,20(5):805-826.
[8] LOPEZ-RUBIO E, PALOMO E J. Growing hierarchical probabilistic self-organizing graphs [J]. IEEE Transactions on Neural Networks, 2011, 22(7): 997-1008.
[9] RESHEF D N, RESHEF Y A, FINUCANE H K, et al. Detecting novel associations in large data sets [J]. Science, 2011,334(6062):1518-1524.
[10] GOSAVI A. Simulation-based optimization: an overview [M]//Simulation-Based Optimization. Berlin: Springer, 2015:29-35.
[11] RAVIKUMAR P, WAINWRIGHT M J, LAFFERTY J D. High-dimensional ising model selection using l1-regularized logistic regression [J]. The Annals of Statistics, 2010,38(3):1287-1319.
[12] ZHANG T, ZOU H. Sparse precision matrix estimation via lasso penalized D-trace loss [J]. Biometrika, 2014,101(1):103-120.
[13] MUKHERJEE S, HILL S M. Network clustering: probing biological heterogeneity by sparse graphical models [J]. Biostatistics. 2011,27(7):994-1000.
[14] SHIEH S L, LIAO I E. A new approach for data clustering and visualization using self-organizing maps [J]. Expert Systems with Applications, 2012,39(15):11924-11933.
[15] DU K L, SWAMY M N S. Clustering Ⅱ: topics in clustering [M]//Neural Networks and Statistical Learning. Berlin: Springer, 2014:259-297.
[16] FIANNACA A, DI FATTA G, RIZZO R, et al. Simulated annealing technique for fast learning of SOM networks [J]. Neural Computing and Applications, 2013,22(5):889-899.
[17] LOPEZ-PAZ D, HENNIG P, SCHÖLKOPF B. The randomized dependence coefficient [EB/OL]. [2015-03-12]. http://xueshu.baidu.com/s?wd=paperuri%3A%282c086713a1bc514b5a3d07e5def52bd4%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Farxiv.org%2Fpdf%2F1304.7717v1&ie=utf-8.
[18] TAN B, ZHONG E, XIANG E W, et al. Multi-transfer: transfer learning with multiple views and multiple sources [J]. Statistical Analysis and Data Mining: the ASA Data Science Journal, 2014,7(4):282-293.
[19] BICKEL S. ECML-PKDD discovery challenge 2006 overview [EB/OL]. [2015-03-09]. http://ceas2009.cc/discovery_challenge2006_overview.pdf.

稀疏分层概率自组织图实例迁移学习方法

Instance transfer learning model based on sparse hierarchical probabilistic self-organizing graphs

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	郭棉, 张锦友. 移动边缘计算环境中面向机器学习的计算迁移策略[J]. 计算机应用, 2021, 41(9): 2639-2645.
[2]	毛铭泽, 曹芮浩, 闫春钢. 基于权值多样性的半监督分类算法[J]. 计算机应用, 2021, 41(9): 2473-2480.
[3]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[4]	陈争涛, 黄灿, 杨波, 赵立, 廖勇. 基于迁移学习的并行卷积神经网络牦牛脸识别算法[J]. 计算机应用, 2021, 41(5): 1332-1336.
[5]	秦静, 左长青, 汪祖民, 季长清, 王宝凤. 基于堆叠分类器的心电异常监测模型设计[J]. 计算机应用, 2021, 41(3): 887-890.
[6]	姜倩玉, 王凤英, 贾立鹏. 基于感知哈希算法和特征融合的恶意代码检测方法[J]. 计算机应用, 2021, 41(3): 780-785.
[7]	王锦凯, 贾旭. 基于迁移孪生非负矩阵分解的静脉识别算法[J]. 计算机应用, 2021, 41(3): 898-903.
[8]	孟祥瑞, 杨文忠, 王婷. 基于图文融合的情感分析研究综述[J]. 计算机应用, 2021, 41(2): 307-317.
[9]	曹建芳, 闫敏敏, 贾一鸣, 田晓东. 融合迁移学习的Inception-v3模型在古壁画朝代识别中的应用[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3219-3227.
[10]	楼豪杰, 郑元林, 廖开阳, 雷浩, 李佳. 基于Siamese-YOLOv4的印刷品缺陷目标检测[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3206-3212.
[11]	赵津, 宋文爱, 邰隽, 杨吉江, 王青, 李晓丹, 雷毅, 邱悦. 儿童阻塞性睡眠呼吸暂停计算机人脸辅助诊断综述[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3394-3401.
[12]	刘晓龙, 王士同. 渐进式分离的开放集模糊域自适应算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3127-3131.
[13]	王雅辉, 钱宇华, 刘郭庆. 基于模糊优势互补互信息的有序决策树算法[J]. 计算机应用, 2021, 41(10): 2785-2792.
[14]	蒋阳升, 王胜男, 涂家祺, 李莎, 王红军. 面向高铁站的热舒适度和能耗综合预测[J]. 计算机应用, 2021, 41(1): 249-257.
[15]	黄学雨, 徐浩特, 陶剑文. 具有特征选择的多源自适应分类框架[J]. 计算机应用, 2020, 40(9): 2499-2506.