Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (8): 2252-2260.DOI: 10.11772/j.issn.1001-9081.2018112394

• Artificial intelligence • Previous Articles     Next Articles

Best action identification of tree structure based on ternary multi-arm bandit

LIU Guoqing1,2,3, WANG Jieting1,2,3, HU Zhiguo1,2,3, QIAN Yuhua1,2,3   

  1. 1. Research Institute of Big Data Science and Industry, Shanxi University, Taiyuan Shanxi 030006, China;
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education(Shanxi University), Taiyuan Shanxi 030006, China;
    3. School of Computer and Information Technology, Shanxi University, Taiyuan Shanxi 030006, China
  • Received:2018-12-04 Revised:2019-01-31 Online:2019-08-10 Published:2019-08-14
  • Supported by:
    This work is partially supported by the National Natural Science Fundation of China (61672332, 61432011, U1435212), the Natural Science Foundation of Shanxi Province (201701D121052).

基于三元多臂赌博机的树结构最优动作识别

刘郭庆1,2,3, 王婕婷1,2,3, 胡治国1,2,3, 钱宇华1,2,3   

  1. 1. 山西大学 大数据科学与产业研究院, 太原 030006;
    2. 计算机智能与中文信息处理教育部重点实验室(山西大学), 太原 030006;
    3. 山西大学 计算机与信息技术学院, 太原 030006
  • 通讯作者: 钱宇华
  • 作者简介:刘郭庆(1994-),女,山西临汾人,硕士研究生,主要研究方向:强化学习;王婕婷(1991-),女,山西临汾人,博士研究生,主要研究方向:统计机器学习;胡治国(1977-),男,山西灵石人,讲师,博士,CCF会员,主要研究方向:计算机网络、分布式系统;钱宇华(1976-),男,山西晋城人,教授,博士,CCF会员,主要研究方向:数据智能、机器学习、大数据、复杂网络。
  • 基金资助:
    国家自然科学基金资助项目(61672332,61432011,U1435212);山西省自然科学基金资助项目(201701D121052)。

Abstract: Monte Carlo Tree Search (MCTS) shows excellent performance in chess game problem. Most existing studies only consider the success and failure feedbacks and assum that the results follow the Bernoulli distribution. However, this setting ignores the usual result of draw, causing inaccurate assessment of the disk status and missing of optimal action. In order to solve this problem, Ternary Multi-Arm Bandit (TMAB) model was constructed and Best Arm identification of TMAB (TBBA) algorithm was proposed. Then, TBBA algorithm was applied to Ternary Minimax Sampling Tree (TMST). Finally, TBBA_tree algorithm based on the simple iteration of TBBA and Best Action identification of TMST (TTBA) algorithm based on transforming the tree structure into TMAB were proposed. In the experiments, two arm spaces with different precision were established, and several comparative TMABs and TMSTs were constructed based on the two arm spaces. Experimental results show that compared to the accuracy of uniform sampling algorithm, the accuracy of TBBA algorithm keeps rising steadily and can reach 100% partially, and the accuracy of TBBA algorithm is basically more than 80% with good generalization and stability and without outliers or fluctuation ranges.

Key words: Monte Carlo Tree Search (MCTS), Ternary Multi-Arm Bandit (TMAB), Best Arm Identification (BAI), sequential decision-making, pure exploration

摘要: 蒙特卡罗树搜索(MCTS)在棋类博弈问题中展现出卓越的性能,但目前多数研究仅考虑胜负两种反馈从而假设博弈结果服从伯努利分布,然而这种设定忽略了常出现的平局结果,导致不能准确地评估盘面状态甚至错失最优动作。针对这个问题,首先构建了基于三元分布的多臂赌博机(TMAB)模型并提出了最优臂确认算法TBBA;然后,将TBBA算法应用到三元极大极小采样树(TMST)中,提出了简单迭代TBBA算法的TBBA_tree算法和通过将树结构转化成TMAB的TMST最优动作识别(TTBA)算法。在实验部分,建立了两个精度不同的摇臂空间并在其基础上构造了多个具有对比性的TMAB和TMST。实验结果表明,相比均匀采样算法,TBBA算法准确率保持稳步上升且部分能达到100%,TBBA算法准确率基本保持在80%以上且具有良好的泛化性和稳定性,不会出现异常值和波动区间。

关键词: 蒙特卡罗树搜索, 三元多臂赌博机, 最优臂确认, 序列决策, 纯探索

CLC Number: