Advanced
Computer Architecture & Organization
Q97 / 100
What is "checkpoint and restart" (or "checkpointing") in the context of fault-tolerant computer architecture?
Correct! Well done.
Incorrect.
The correct answer is A) Periodically saving the complete state of a running computation so that, if a fault occurs, the system can roll back to the most recent checkpoint and resume rather than restarting from the beginning
A
Correct Answer
Periodically saving the complete state of a running computation so that, if a fault occurs, the system can roll back to the most recent checkpoint and resume rather than restarting from the beginning
Explanation
In long-running or fault-prone systems (e.g., HPC clusters), checkpointing periodically saves enough state to recover from failures without losing all prior progress, trading off the overhead of taking checkpoints against the cost of recomputation after a failure.
Progress
97/100