计算机应用 ›› 2018, Vol. 38 ›› Issue (9): 2631-2636.DOI: 10.11772/j.issn.1001-9081.2018020502

• 计算机软件技术 • 上一篇    下一篇

主动容错云存储系统的可靠性评价模型

李静1, 刘冬实2   

  1. 1. 中国民航大学 计算机科学与技术学院, 天津 300300;
    2. 南开大学 计算机与控制工程学院, 天津 300500
  • 收稿日期:2018-03-13 修回日期:2018-04-24 出版日期:2018-09-10 发布日期:2018-09-06
  • 通讯作者: 李静
  • 作者简介:李静(1982—),女,山东德州人,讲师,博士,主要研究方向:大规模数据存储、机器学习;刘冬实(1994—),男,辽宁绥中人,硕士研究生,主要研究方向:机器学习、数据挖掘。
  • 基金资助:
    国家青年自然科学基金资助项目(61702521);中国民航大学科研启动基金资助项目(2017QD03S)。

Reliability evaluation model for cloud storage systems with proactive fault tolerance

LI Jing1, LIU Dongshi2   

  1. 1. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China;
    2. College of Computer and Control Engineering, Nankai University, Tianjin 300500, China
  • Received:2018-03-13 Revised:2018-04-24 Online:2018-09-10 Published:2018-09-06
  • Contact: 李静
  • Supported by:
    This work is partially supported by National Youth Natural Science Foundation (61702521), Scientific Research Startup Fund of Civil Aviation University of China (2017QD03S).

摘要: 除了传统的冗余机制,主动容错技术也被用来提高存储系统的可靠性。然而,当前对主动容错云存储系统可靠性的研究工作很少,而且都局限于硬盘故障服从指数分布的假设前提。针对主动容错磁盘冗余阵列RAID-5和RAID-6云存储系统提出两个可靠性状态转移模型,并基于转移模型设计了蒙特卡洛仿真算法,评价系统在一定运行周期内发生数据丢失事件的期望个数。该算法采用韦布分布函数模拟随时间变化(降低、恒定不变、或升高)的硬盘故障率,准确评价了主动容错机制、硬盘整体故障、故障修复、潜在块故障以及磁盘清洗过程对系统可靠性的影响。所提方法可以帮助系统设计者评估不同容错机制和系统参数对云存储系统可靠性的影响,有助于创建高可靠存储系统。

关键词: 主动容错, 云存储系统, 蒙特卡洛仿真, 可靠性评价, 韦布分布

Abstract: In addition to traditional reactive fault-tolerant technologies, proactive fault tolerance can be used to improve storage system reliability significantly. There is few research on reliability of proactive cloud storage systems, supposing exponential distribution of drive failure. Two reliability state transfer models were developed for proactive redundant arrays of independent disks RAID-5 and RAID-6 systems respectively. Based on the models, Monte Carlo simulations were designed to estimate the expected number of data-loss events in proactive RAID-5 and RAID-6 systems within a given time period. Weibull distribution was used to model time-based (decreasing, constant occurrence, or increasing) disk failure rates, and express the impact of proactive fault tolerance, operational failures, failure restoration, latent block defects, and drive scrubbing on the system's reliability. The proposed method can help system designers to evaluate the impact of different fault tolerance mechanisms and system parameters on the reliability of cloud storage systems, and help to create highly reliable storage systems.

Key words: proactive fault tolerance, cloud storage system, Monte Carlo simulation, reliability evaluation, Weibull distribution

中图分类号: