Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Review of checkpoint technology for multiple computing scenarios
Xiaolin CHEN, Yaqiang ZHANG, Hongzhi SHI
Journal of Computer Applications    2025, 45 (6): 1922-1933.   DOI: 10.11772/j.issn.1001-9081.2024050697
Abstract49)   HTML0)    PDF (2781KB)(12)       Save

Checkpoint technology is a method of saving the current computing task and system state in a computing system in order to roll back the system to the previously saved state when needed. It is commonly used in multiple scenarios such as system failure recovery, job migration, and job preemption. With the development of technology, there are more computing scenarios, larger computing scales, more complex structural hierarchy of computing systems, and more variable computing environments, which increase the probability of failure occurrence. At the same time, the Mean Time Between Failures (MTBT) is reduced from [6.50 h, 40.00 h] to 1.25 h. Therefore, checkpoint technology is becoming increasingly critical as a commonly used fault-tolerant method. Firstly, the development overview of checkpoint technology was introduced, and the existing checkpoint technologies were classified based on their technical characteristics. Then, the latest research progress was reviewed in four directions: incremental checkpoint, multi-level asynchronous checkpoint, optimal checkpoint interval, and fault perception-based checkpoint. And the current trends in checkpoint technology — dynamic, intelligent, and proactive trends, as well as challenges faced by this technology were summarized. Finally, main ideas and latest methods of optimizing checkpoint strategies were sorted out to help researchers grasp checkpoint technology’s current development status and future development trends quickly.

Table and Figures | Reference | Related Articles | Metrics