Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (8): 2252-2260.DOI: 10.11772/j.issn.1001-9081.2018112394
• Artificial intelligence • Previous Articles Next Articles
LIU Guoqing1,2,3, WANG Jieting1,2,3, HU Zhiguo1,2,3, QIAN Yuhua1,2,3
Received:
2018-12-04
Revised:
2019-01-31
Online:
2019-08-14
Published:
2019-08-10
Supported by:
刘郭庆1,2,3, 王婕婷1,2,3, 胡治国1,2,3, 钱宇华1,2,3
通讯作者:
钱宇华
作者简介:
刘郭庆(1994-),女,山西临汾人,硕士研究生,主要研究方向:强化学习;王婕婷(1991-),女,山西临汾人,博士研究生,主要研究方向:统计机器学习;胡治国(1977-),男,山西灵石人,讲师,博士,CCF会员,主要研究方向:计算机网络、分布式系统;钱宇华(1976-),男,山西晋城人,教授,博士,CCF会员,主要研究方向:数据智能、机器学习、大数据、复杂网络。
基金资助:
CLC Number:
LIU Guoqing, WANG Jieting, HU Zhiguo, QIAN Yuhua. Best action identification of tree structure based on ternary multi-arm bandit[J]. Journal of Computer Applications, 2019, 39(8): 2252-2260.
刘郭庆, 王婕婷, 胡治国, 钱宇华. 基于三元多臂赌博机的树结构最优动作识别[J]. 计算机应用, 2019, 39(8): 2252-2260.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2018112394
[1] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489. [2] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550(7676):354-359. [3] SILVER D, HUBERT T, SCHRITTWIESER J, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play[J]. Science, 2018, 362(6419):1140-1144. [4] GARIVIER A, KAUFMANN E, KOOLEN W M. Maximin action identification:a new bandit framework for games[C]//Proceedings of the 29th Annual Conference on Learning Theory.[S.l.]:PMLR, 2016, 49:1028-1050. [5] THOMPSON W R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples[J]. Biometrika, 1933, 25(3/4):285-294. [6] BUBECK S, CESA-BIANCHI N. Regret analysis of stochastic and nonstochastic multi-armed bandit problems[J]. Foundations & Trends in Machine Learning, 2012, 5(1):1-112. [7] ROBBINS H. Some aspects of the sequential design of experiments[J]. Bulletin of the American Mathematical Society, 1952, 58(5):527-535. [8] KALYANAKRISHNAN S, STONE P. Efficient selection of multiple bandit arms:theory and practice[C]//Proceedings of the 27th International Conference on Machine Learning. Cambridge, MA:MIT Press, 2010:511-518. [9] EVEN-DAR E, MANNOR S, MANSOUR Y, et al. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems[J]. Journal of Machine Learning Research, 2006, 7:1079-1105. [10] KALYANAKRISHNAN S, TEWARI A, AUER P, et al. PAC subset selection in stochastic multi-armed bandits[C]//Proceedings of the 29th International Conference on Machine Learning. Cambridge, MA:MIT Press, 2012:655-662. [11] KAUFMANN E, CAPPé O, GARIVIER A, et al. On the complexity of best-arm identification in multi-armed bandit models[J]. Journal of Machine Learning Research, 2016, 17(1):1-42. [12] MANNOR S, TSITSIKLIS J N. The sample complexity of exploration in the multi-armed bandit problem[J]. Journal of Machine Learning Research, 2004, 5:623-648. [13] GARIVIER A, KAUFMANN E. Optimal best arm identification with fixed confidence[C]//Proceedings of the 29th Annual Conference on Learning Theory.[S.l.]:PMLR, 2016, 49:998-1027. [14] AUDIBERT J-Y, BUBECK S, MUNOS R. Best arm identification in multi-armed bandits[C]//Proceedings of the 23rd Conference on Learning Theory.[S.l.]:PMLR, 2010:41-53. [15] BUBECK S, WANG T, VISWANATHAN N. Multiple identifications in multi-armed bandits[C]//Proceedings of the 30th International Conference on Machine Learning.[S.l.]:PMLR, 2013, 28(1):258-265. [16] SHAHRAMPOUR S, NOSHAD M, TAROKH V. On sequential elimination algorithms for best-arm identification in multi-armed bandits[J]. IEEE Transactions on Signal Processing, 2017, 65(16):4281-4292. [17] KAUFMANN E, KALYANAKRISHNAN S. Information complexity in bandit subset selection[C]//Proceedings of the 26th Conference on Learning Theory.[S.l.]:PMLR, 2013, 30:228-251. [18] CARPENTIER A, LOCATELLI A. Tight (lower) bounds for the fixed budget best arm identification bandit problem[C]//Proceedings of the 29th Annual Conference on Learning Theory.[S.l.]:PMLR, 2016, 49:590-604. [19] GABILLON V, GHAVAMZADEH M, LAZARIC A. Best arm identification:a unified approach to fixed budget and fixed confidence[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2012:3212-3220. [20] CHAPELLE O, LI L. An empirical evaluation of Thompson sampling[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems. New York:Curran Associates Inc, 2011:2249-2257. [21] MAY B C, KORDA N, LEE A, et al. Optimistic Bayesian sampling in contextual-bandit problems[J]. Journal of Machine Learning Research, 2012, 13(1):2069-2106. [22] KOMIYAMA J, HONDA J, NAKAGAWA H, et al. Optimal regret analysis of thompson sampling in stochastic multi-armed bandit problem with multiple plays[C]//Proceedings of the 32nd International Conference on Machine Learning.[S.l.]:PMLR, 2015:1152-1161. [23] BROWNE C B, POWLEY E, WHITEHOUSE D, et al. A survey of monte carlo tree search methods[J]. IEEE Transactions on Computational Intelligence & AI in Games, 2012, 4(1):1-43. [24] GELLY S, KOCSIS L, SCHOENAUER M, et al. The grand challenge of computer Go:monte carlo tree search and extensions[J]. Communications of the ACM, 2012, 55(3):106-113. [25] TERAOKA K, HATANO K, TAKIMOTO E. Efficient sampling method for Monte Carlo tree search problem[J]. IEICE Transactions on Information & Systems, 2014, E97-D(3):392-398. [26] KAUFMANN E, KOOLEN W M. Monte-Carlo tree search by best arm identification[J]. arXiv E-print, 2017:arXiv:1706.02986. Neural Information Processing Systems, 2017,30:4897-4906. [27] 高阳, 陈世福, 陆鑫. 强化学习研究综述[J]. 自动化学报, 2004, 30(1):86-100. (GAO Y, CHEN S F, LU X. Research on reinforcement learning technology:a review[J]. Acta Automatica Sinica, 2004, 30(1):86-100.) [28] 李宁, 高阳, 陆鑫,等.一种基于强化学习的学习Agent[J]. 计算机研究与发展, 2001, 38(9):1051-1056. (LI N, GAO Y, LU X, et al. A learning agent based on reinforcement learning[J]. Journal of Computer Research and Development, 2001, 38(9):1051-1056.) [29] 蔡庆生, 张波. 一种基于Agent团队的强化学习模型与应用研究[J].计算机研究与发展, 2000, 37(9):1087-1093. (CAI Q S, ZHANG B. An agent team based reinforcement learning model and its application[J]. Journal of Computer Research and Development, 2000, 37(9):1087-1093.) |
[1] | . Privacy preserving localization of surveillance images based on large vision models [J]. Journal of Computer Applications, 0, (): 0-0. |
[2] | . Federated class-incremental learning method with multi-head self-attention for label semantic embedding [J]. Journal of Computer Applications, 0, (): 0-0. |
[3] | Jingxin LIU, Wenjing HUANG, Liangsheng XU, Chong HUANG, Jiansheng WU. Unsupervised feature selection model with dictionary learning and sample correlation preservation [J]. Journal of Computer Applications, 2024, 44(12): 3766-3775. |
[4] | Yifei SONG, Yi LIU. Fast adversarial training method based on data augmentation and label noise [J]. Journal of Computer Applications, 2024, 44(12): 3798-3807. |
[5] | . Research review on explainable artificial intelligence in internet of things applications [J]. Journal of Computer Applications, 0, (): 0-0. |
[6] | Jiachen YU, Ye YANG. Irregular object grasping by soft robotic arm based on clipped proximal policy optimization algorithm [J]. Journal of Computer Applications, 2024, 44(11): 3629-3638. |
[7] | Yuxin HUANG, Yiwang HUANG, Hui HUANG. Meta label correction method based on shallow network predictions [J]. Journal of Computer Applications, 2024, 44(11): 3364-3370. |
[8] | Zhijie LI, Xuhong LIAO, Yuanxiang LI, Qinglan LI. Disease sample classification algorithm by Bayesian network with gene association analysis [J]. Journal of Computer Applications, 2024, 44(11): 3449-3458. |
[9] | HU Jie, ZHENG Qiyang, SUN Jun, ZHANG Yan. Multi-label classification model based on multi-relational label graph and local dynamic reconstruction learning [J]. Journal of Computer Applications, 0, (): 0-0. |
[10] | Wenze CHAI, Jing FAN, Shukui SUN, Yiming LIANG, Jingfeng LIU. Overview of deep metric learning [J]. Journal of Computer Applications, 2024, 44(10): 2995-3010. |
[11] | Chunyong YIN, Yongcheng ZHOU. Automatically adjusted clustered federated learning for double-ended clustering [J]. Journal of Computer Applications, 2024, 44(10): 3011-3020. |
[12] | Feng CAO, Xiaoling YANG, Jianbing YI, Jun LI. Contradiction separation super-deduction method and application [J]. Journal of Computer Applications, 2024, 44(10): 3074-3080. |
[13] | . Deep symbol regression method based on Transformer [J]. Journal of Computer Applications, 0, (): 0-0. |
[14] | . RecipeFlavor: Recipe Recommendation Model Based on Flavor Embedding and Heterogeneous Graph Hierarchical Learning [J]. Journal of Computer Applications, 0, (): 0-0. |
[15] | . Graph regularized Elastic Net Subspace Clustering [J]. Journal of Computer Applications, 0, (): 0-0. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||