The typical situation in computational sciences is to have a program that runs for several days/weeks/months straight. As hardware/OS failures are inevitable, one typically utilize checkpointing, i.e. saves the state of the program from time to time. In case of failure, one restarts from the latest checkpoint.
What is the pythonic way to implement checkpointing?
For example, one can dump function's variables directly.
Alternatively, I am thinking of transforming such function into a class (see below). Arguments of the function would become arguments of a constructor. Intermediate data that constitute state of the algorithm would become class attributes. And pickle
module would help with the (de-)serialization.
import pickle
# The file with checkpointing data
chkpt_fname = 'pickle.checkpoint'
class Factorial:
def __init__(self, n):
# Arguments of the algorithm
self.n = n
# Intermediate data (state of the algorithm)
self.prod = 1
self.begin = 0
def get(self, need_restart):
# Last time the function crashed. Need to restore the state.
if need_restart:
with open(chkpt_fname, 'rb') as f:
self = pickle.load(f)
for i in range(self.begin, self.n):
# Some computations
self.prod *= (i + 1)
self.begin = i + 1
# Some part of the computations is completed. Save the state.
with open(chkpt_fname, 'wb') as f:
pickle.dump(self, f)
# Artificial failure of the hardware/OS/Ctrl-C/etc.
if (not need_restart) and (i == 3):
return
return self.prod
if __name__ == '__main__':
f = Factorial(6)
print(f.get(need_restart=False))
print(f.get(need_restart=True))