基于策略梯度算法的工作量证明中挖矿困境研究

doi:10.11772/j.issn.1001-9081.2018102197

计算机应用 ›› 2019, Vol. 39 ›› Issue (5): 1336-1342.DOI: 10.11772/j.issn.1001-9081.2018102197

基于策略梯度算法的工作量证明中挖矿困境研究

王甜甜, 于双元, 徐保民

北京交通大学计算机与信息技术学院, 北京 100044

收稿日期:2018-11-01 修回日期:2018-12-30 发布日期:2019-05-14 出版日期:2019-05-10
通讯作者: 王甜甜
作者简介:王甜甜(1994-),女,河南焦作人,硕士研究生,主要研究方向:文本挖掘、深度学习、强化学习;于双元(1965-),女,黑龙江佳木斯人,副教授,硕士,主要研究方向:大数据分析与挖掘、深度学习、并行与分布式计算、软件测试;徐保民(1966-),男,河南郑州人,教授,博士,主要研究方向:大数据处理、云计算。
基金资助:
国家自然科学基金资助项目（61572005）；河北省高等教育科技研究重点项目（ZD2017304）。

Research on proof of work mining dilemma based on policy gradient algorithm

WANG Tiantian, YU Shuangyuan, XU Baomin

School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

Received:2018-11-01 Revised:2018-12-30 Online:2019-05-14 Published:2019-05-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61572005), the Higher Education Science and Technology Research Key Project of Hebei Province (ZD2017304).

摘要/Abstract

摘要： 针对区块链中工作量证明（PoW）共识机制下区块截留攻击导致的挖矿困境问题，将矿池间的博弈行为视作迭代的囚徒困境（IPD）模型，采用深度强化学习的策略梯度算法研究IPD的策略选择。利用该算法将每个矿池视为独立的智能体（Agent），将矿工的潜入率量化为强化学习中的行为分布，通过策略梯度算法中的策略网络对Agent的行为进行预测和优化，最大化矿工的人均收益，并通过模拟实验验证了策略梯度算法的有效性。实验发现，前期矿池处于相互攻击状态，平均收益小于1，出现了纳什均衡的问题；经过policy gradient算法的自我调整后，矿池由相互攻击转变为相互合作，每个矿池的潜入率趋于0，人均收益趋于1。实验结果表明，policy gradient算法可以解决挖矿困境的纳什均衡问题，最大化矿池人均收益。

关键词: 区块链, 工作量证明机制, 博弈论, 深度强化学习, 策略梯度算法

Abstract: In view of the mining dilemma problem caused by block withholding attack under Proof of Work (PoW) consensus mechanism in the blockchain, the game behavior between mining pools was regarded as an Iterative Prisoner's Dilemma (IPD) model and the policy gradient algorithm of deep reinforcement learning was used to study IPD's strategy choices. Each mining pool was considered as an independent Agent and the miner's infiltration rate was quantified as a behavior distribution in reinforcement learning. The policy network in the policy gradient was used to predict and optimize the Agent's behavior in order to maximize miners' average revenues. And the effectiveness of the policy gradient algorithm was validated through simulation experiments. Experimental results show that the mining pools attack each other at the beginning with miners' average revenue less than 1, which causes Nash equilibrium problem. After self-adjustment by the policy gradient algorithm, the relationship between the mining pools transforms from mutual attack to mutual cooperation with infiltration rate of each mining pool tending to zero and miners' average revenue tending to 1. The results show that the policy gradient algorithm can solve the Nash equilibrium problem of mining dilemma and maximize the miners' average revenue.

Key words: blockchain, Proof of Work (PoW), game, deep reinforcement learning, policy gradient algorithm

中图分类号:

TP183

王甜甜, 于双元, 徐保民. 基于策略梯度算法的工作量证明中挖矿困境研究[J]. 计算机应用, 2019, 39(5): 1336-1342.

WANG Tiantian, YU Shuangyuan, XU Baomin. Research on proof of work mining dilemma based on policy gradient algorithm[J]. Journal of Computer Applications, 2019, 39(5): 1336-1342.

参考文献

[1] NAKAMOTO S. Bitcoin: a peer-to-peer electronic cash system[EB/OL].[2017-10-10]. https://bitcoin.org/bitcoin.pdf.
[2] COURTOIS N T, BAHACK L. On subversive miner strategies and block withholding attack in bitcoin digital currency[J/OL]. arXiv Preprint, 2014, 2014: arXiv:1402.1718(2014-01-28)[2014-12-02]. https://arxiv.org/abs/1402.1718.
[3] EYAL I. The miner’s dilemma[C]// Proceedings of the 2015 IEEE Symposium on Security and Privacy. Piscataway, NJ: IEEE, 2015:89-103.
[4] EYAL I, SIRER E G. Majority is not enough: bitcoin mining is vulnerable[C]// FC 2014: International Conference on Financial Cryptography and Data Security. Berlin: Springer, 2014: 436-454.
[5] KIAYIAS A, KOUTSOUPIAS E, KYROPOULOU M, et al. Blockchain mining games[C]// Proceedings of the 2016 ACM Conference on Economics and Computation. New York: ACM, 2016: 365-382.
[6] LEWENBERG Y, BACHRACH Y, SOMPOLINSKY Y, et al. Bitcoin mining pools: a cooperative game theoretic analysis[C]// Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2015: 919-927.
[7] LIU X, WANG W, NIYATO D, et al. Evolutionary game for mining pool selection in blockchain networks[J]. IEEE Wireless Communications Letters, 2017, 7(5): 760-763.
[8] 唐长兵, 杨珍, 郑忠龙,等. PoW共识算法中的博弈困境分析与优化[J]. 自动化学报, 2017, 43(9):1520-1531.(TANG C B, YANG Z, ZHENG Z L, et al. Game dilemma analysis and optimization of PoW consensus algorithm[J]. Acta Automatica Sinica, 2017, 43(9):1520-1531.)
[9] SUTTON R S, McALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]// NIPS 2000: Neural Information Processing Systems. Boston: MIT Press, 2000:1057-1063.
[10] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J].Machine Learning, 1992,8(3/4):229-256.
[11] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human level control through deep reinforcement learning[J].Nature, 2015,518(7540):529-533.
[12] TAMPUU A, MATⅡSEN T, KODELJA D, et al. Multiagent cooperation and competition with deep reinforcement learning[J].PLoS One, 2017, 12(4):e0172395.
[13] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J/OL]. arXiv Preprint, 2015, 2015: arXiv:1509.02971[2015-09-09]. https://arxiv.org/abs/1509.02971.
[14] 王兵团, 张作泉, 赵平福. 数值分析简明教程(大学数学系列丛书)[M]. 北京:清华大学出版社, 2012:50-60. (WANG B T, ZHANG Z Q, ZHAO P F. Numerical Analysis Concise Tutorial(University Mathematics Series)[M]. Beijing: Tsinghua University Press,2012:50-60.)

[1]	陈廷伟, 张嘉诚, 王俊陆. 面向联邦学习的随机验证区块链构建[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2770-2776.
[2]	孙晓玲, 王丹辉, 李姗姗. 基于区块链的动态密文排序检索方案[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2500-2505.
[3]	周毅, 高华, 田永谌. 基于裁剪优化和策略指导的近端策略优化算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2334-2341.
[4]	黄河, 金瑜. 基于投票和以太坊智能合约的云数据审计方案[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2093-2101.
[5]	马天, 席润韬, 吕佳豪, 曾奕杰, 杨嘉怡, 张杰慧. 基于深度强化学习的移动机器人三维路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2055-2064.
[6]	李皎, 张秀山, 宁远航. 降低跨分片交易比例的区块链分片方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1889-1896.
[7]	赵晓焱, 韩威, 张俊娜, 袁培燕. 基于异步深度强化学习的车联网协作卸载策略[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1501-1510.
[8]	唐睿, 庞川林, 张睿智, 刘川, 岳士博. D2D通信增强的蜂窝网络中基于DDPG的资源分配[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1562-1569.
[9]	赵莉朋, 郭兵. 基于BDLS的区块链共识改进算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1139-1147.
[10]	陈美宏, 袁凌云, 夏桐. 基于主从多链的数据分类分级访问控制模型[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1148-1157.
[11]	高改梅, 张瑾, 刘春霞, 党伟超, 白尚旺. 基于区块链与CP-ABE策略隐藏的众包测试任务隐私保护方案[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 811-818.
[12]	秦鑫彤, 宋政育, 侯天为, 王飞越, 孙昕, 黎伟. 基于自适应p持续的移动自组网信道接入和资源分配算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 863-868.
[13]	李源潮, 陶重犇, 王琛. 基于最大熵深度强化学习的双足机器人步态控制方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 445-451.
[14]	邓辅秦, 官桧锋, 谭朝恩, 付兰慧, 王宏民, 林天麟, 张建民. 基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 432-438.
[15]	马海峰, 李玉霞, 薛庆水, 杨家海, 高永福. 用于实现区块链隐私保护的属性基加密方案[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 485-489.

基于策略梯度算法的工作量证明中挖矿困境研究

Research on proof of work mining dilemma based on policy gradient algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics