ROLLBACK RECOVERY MECHANISM

Log-Based rollback Recovery Mechanism

Checkpoint-Based Rollback Recovery Mechanism

In checkpointing based rollback recovery is a well-established technique to deal with process failures and increase the system reliability and fault-tolerance in distributed systems [23]. In this approach, the state of each process in the system is periodically saved on stable storage, which is called a checkpoint of a process. To recover from a failure, the system restarts its execution from a previous error-free, consistent global state [3]. In a distributed system, since the processes in the system do not share memory, a global state of the system is defined as a set of local states, one from each process. The processes exchange information with each other through messages. A global state is said to be “consistent” if it contains no orphan message; i.e., a message whose receive event is recorded, but its send event is lost [3]. There are several applications of checkpointing including: rollback recovery, playback debugging, process migration, job swapping and load balancing

 

Checkpointing Related Notations

Checkpoint Algorithms Assumptions