Proper finalization in Python
Asked Answered
L

1

7

I have a bunch of instances, each having a unique tempfile for its use (save data from memory to disk and retrieve them later).

I want to be sure that at the end of the day, all these files are removed. However, I want to leave a room for a fine-grained control of their deletion. That is, some files may be removed earlier, if needed (e.g. they are too big and not important any more).

What is the best / recommended way to achieve this?

May thoughts on that

  • The try-finalize blocks or with statements are not an option, as we have many files, whose lifetime may overlap each other. Also, it hardly admits the option of finer control.

  • From what I have read, __del__ is also not a feasible option, as it is not even guaranteed that it will eventually run (although, it is not entirely clear to me, what are the "risky" cases). Also (if it is still the case), the libraries may not be available when __del__ runs.

  • tempfile library seems promising. However, the file is gone after just closing it, which is definitely a bummer, as I want them to be closed (when they perform no operation) to limit the number of open files.

    • The library promises that the file "will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected)."

      How do they achieve the implicit close? E.g. in C# I would use a (reliable) finalizer, which __del__ is not.

  • atexit library seems to be the best candidate, which can work as a reliable finalizer instead of __del__ to implement safe disposable pattern. The only problem, compared to object finalizers, is that it runs truly at-exit, which is rather inconvenient (what if the object eligible to be garbage-collected earlier?).

    • Here, the question still stands. How the library achieves that the methods always run? (Except in a really unexpected cases with which is hard to do anything)

In ideal case, it seems that a combination of __del__ and atexit library may perform best. That is, the clean-up is both at __del__ and the method registered in atexit, while repeated clean-up would be forbidden. If __del__ was called, the registered will be removed.

The only (yet crucial) problem is that __del__ won't run if a method is registered at atexit, because a reference to the object exists forever.

Thus, any suggestion, advice, useful link and so on is welcomed.

Lucrative answered 7/6, 2021 at 8:27 Comment(2)
How about you put all the paths that you eventually want deleted into a list, and at some point you iterate through the list and delete all the extant files?Vegetarianism
@Vegetarianism Thank you for your suggestion. However, that feels to me to be a bit inconvenient. It is usually better when an object frees its resources supporting the information hiding principle. For example if the tempfile is replaced by a cloud storage, the code a quite unrealed place (with respect to the change) needs to be rewritten (let alone the case of two types of objects, one with tempfile, the other with cloud or even more such types). Yet, common dispose call to those objects is definitely on the menu, although, I would still far prefer automatic finalization (i.e. by GC).Telium
F
10

I suggest considering weakref built-in module for this task, more specifically weakref.finalize simple example:

import weakref
class MyClass:
    pass
def clean_up(*args):
    print('clean_up', args)
my_obj = MyClass()
weakref.finalize(my_obj, clean_up, 'arg1', 'arg2', 'arg3')
del my_obj  # optional

when run it will output

clean_up ('arg1', 'arg2', 'arg3')

Note that clean_up will be executed even without del-ing of my_obj (you might delete last line of code and behavior will not change). clean_up is called after all strong references to my_obj are gone or at end (like using atexit module).

Fabrin answered 7/6, 2021 at 9:6 Comment(2)
Yes, it seems to be exactly the thing I was looking for. Thank you!Telium
make sure that the cleanup-function and its arguments do not hold references to the object, otherwise the object will never be garbage-collected. cleanup must not be a method of MyClass (Your example is correct but it is easy to mess this up when adjusting the example to your own code). To see a production-version of this concept see how the multiprocessing-module does it to finalize Manager-objects that the user did not close: github.com/python/cpython/blob/main/Lib/multiprocessing/util.py and github.com/python/cpython/blob/main/Lib/multiprocessing/…Plum

© 2022 - 2024 — McMap. All rights reserved.