1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China 2.University of Chinese Academy of Sciences,Beijing 100049,China 3.School of Network and Communication Engineering,Chengdu Technological University,Chengdu Sichuan 611730,China 4.National Key Laboratory of Science and Technology on Communications (University of Electronic Science and Technology of China),Chengdu Sichuan 611731,China 5.School of Computer and Software Engineering,Xihua University,Chengdu Sichuan 610039,China
About author:ZENG Bosen, born in 1982, Ph. D. candidate, senior engineer. His research interests include machine learning, wireless communications. ZHONG Yong, born in 1966, Ph. D., research fellow. His research interests include big data and their intelligent processing, cloud computing, software engineering. NIU Xianhua, born in 1983, Ph. D., professor. Her research interests include intelligent information processing, information security.
Supported by:
China Postdoctoral Science Foundation(2019M663475)
SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. 2nd ed. Cambridge: MIT Press, 2018: 2-9.
2
HANS A, SCHNEEGASS D, SCHÄFER A M, et al. Safe exploration for reinforcement learning[C/OL]// Proceedings of the 16th European Symposium on Artificial Neural Network. [2020-12-13]..
3
SMART W D, KAELBLING L P. Practical reinforcement learning in continuous spaces[C]// Proceedings of the 17th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 2000: 903-910.
4
MAIRE F, BULITKO V. Apprenticeship learning for initial value functions in reinforcement learning[C/OL]// Proceedings of the IJCAI 2005 Workshop on Planning and Learning in A Priori Unknown or Dynamic Domains. [2020-12-13].. http://eprints.qut.edu.au/23912/
5
SONG Y, LI Y B, LI C H, et al. An efficient initialization approach of q-learning for mobile robots[J]. International Journal of Control, Automation and Systems, 2012, 10(1):166-172. 10.1007/s12555-012-0119-9
6
TURCHETTA M, BERKENKAMP F, KRAUSE A. Safe exploration for interactive machine learning[C/OL]// Proceedings of the 33rd Conference on Neural Information Processing Systems. [2020-12-13].. 10.1109/cdc.2018.8619572
DUAN J M, CHEN Q L. Prior knowledge based Q-Learning path planning algorithm[J]. Electronics Optics and Control, 2019, 26(9):29-33. 10.3969/j.issn.1671-637X.2019.09.007
8
PECKA M, SVOBODA T. Safe exploration techniques for reinforcement learning — an overview[C]// Proceedings of the 2014 International Workshop on Modelling and Simulation for Autonomous Systems, LNCS8906. Cham: Springer, 2014: 357-375.
9
GEIBEL P. Reinforcement learning with bounded risk[C]// Proceedings of the 18th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 2001: 162-169.
10
HEGER M. Consideration of risk in reinforcement learning[C]// Proceedings of the 11th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 1994: 105-111. 10.1016/b978-1-55860-335-6.50021-0
11
WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3): 279-292. 10.1023/a:1022676722315
12
RENDLE S. Factorization machines[C]// Proceedings of the 2010 IEEE International Conference on Data Mining. Piscataway: IEEE, 2010: 995-1000. 10.1109/icdm.2010.127
ZHAO K K, ZHANG L F, ZHANG J, et al. Survey on factorization machines model[J]. Journal of Software, 2019, 30(3):799-821. 10.13328/j.cnki.jos.005698
14
GARCÍA J, FERNÁNDEZ F. Safe exploration of state and action spaces in reinforcement learning[J]. Journal of Artificial Intelligence Research, 2012, 45: 515-564. 10.1613/jair.3761
CESA-BIANCHI N, GENTILE C, LUGOSI G, et al. Boltzmann exploration done right[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6287-6296. 10.1109/isit.1998.708939
17
AUER P, CESA-BIANCHI N. FISCHER P. Finite time analysis of the multiarmed bandit problem[J]. Machine Learning, 2002, 47(2/3): 235-256. 10.1023/a:1013689704352