Correct cyclic garbage collection in extension modules
Asked Answered
K

2

7

Two sections of Python 2.7's documentation mentioned adding cyclic garbage collection (CGC) support for container objects defined in extension modules.

The Python/C API Reference Manual gives two rules, i.e.,

  1. The memory for the object must be allocated using PyObject_GC_New() or PyObject_GC_NewVar().
  2. Once all the fields which may contain references to other containers are initialized, it must call PyObject_GC_Track().

Whereas in Extending and Embedding the Python Interpreter, for the Noddy example, it seems that adding the Py_TPFLAGS_HAVE_GC flag and filling tp_traverse and tp_clear slots would be sufficient to enable CGC support. And the two rules above are not practiced at all.

When I modified the Noddy example to actually follow the rules of PyObject_GC_New()/PyObject_GC_Del() and PyObject_Track()/PyObject_GC_UnTrack(), it surprisingly raised assertion error saying,

Modules/gcmodule.c:348: visit_decref: Assertion "gc->gc.gc_refs != 0" failed. refcount was too small

What is the correct and safe way to implement CGC? What would be a neat example of a container object with CGC support?

Kilian answered 4/9, 2012 at 0:53 Comment(0)
K
4

Under most normal circumstances you shouldn't need to do do the tracking/untracking yourself. This is described in the documentation, however it isn't made specifically clear. In the case of the Noddy example you definitely don't.

The short version is that a TypeObject contains two function pointers: tp_alloc and tp_free. By default tp_alloc calls all the right functions on creation of a class (if Py_TPFLAGS_HAVE_GC is set) and tp_free untracks the class on destruction.

The Noddy documentation says (at the end of the section):

That’s pretty much it. If we had written custom tp_alloc or tp_free slots, we’d need to modify them for cyclic-garbage collection. Most extensions will use the versions automatically provided.

Unfortunately, the one place that doesn't make it clear that you don't need to do this yourself is the Supporting Cyclic Garbage Collection documentation.


Detail:

Noddy's are allocated using a function called Noddy_new put in the tp_new slots of the TypeObject. According to the documentation, the main thing the "new" function should do is call the tp_alloc slot. You typically don't write tp_alloc yourself, and it just defaults to PyType_GenericAlloc().

Looking at PyType_GenericAlloc() in the Python source shows a number of sections where it changes based on PyType_IS_GC(type). First it calls _PyObject_GC_Malloc instead of PyObject_Malloc, and second it calls _PyObject_GC_TRACK(obj). [Note that all that PyObject_New really does is call PyObject_Malloc and then tp_init.]

Similarly, on deallocation you call the tp_free slot, which is automatically set to PyObject_GC_Del for classes with Py_TPFLAGS_HAVE_GC. PyObject_GC_Del includes code that does the same as PyObject_GC_UnTrack so a call to untrack is unnecessary.

Kaolack answered 11/11, 2016 at 8:19 Comment(2)
So are you saying that the Noddy example does not need to call PyObject_GC_UnTrack in Noddy_dealloc? If that is the case, is the Noddy example wrong?Juncture
@VictorLiu It's been a while since I wrote this answer so it's a little vague in my mind. I'm fairly sure the bit about allocation is right and that GC_Malloc and GC_Track are automatically called appropriately. At the point I wrote the answer the default tp_free automatically untracked it. That behaviour has changed and it'll issue a warning in debug mode. So you should call PyObject_GC_UnTrack in Noddy_dealloc (and thus the Noddy example is right). I suspect the last paragraph of this answer wasn't good advice in hindsight - you get away with it, but really you should call it.Kaolack
E
2

I am not experienced enough in the C API myself to give you any advice, but there are plenty of examples in the Python container implementations themselves.

Personally, I'd start with the tuple implementation first, since it's immutable: Objects/tupleobject.c. Then move on to the dict, list and set implementations for further notes on mutable containers:

I can't help but notice that there are calls to PyObject_GC_New(), PyObject_GC_NewVar() and PyObject_GC_Track() throughout, as well as having Py_TPFLAGS_HAVE_GC set.

Endurable answered 4/9, 2012 at 6:41 Comment(1)
Thanks for the reply. I am investigating the possibility that some versions of the PyObject_GC_New() API cannot correctly handle subtypes, which is the cause of the AssertionError.Kilian

© 2022 - 2024 — McMap. All rights reserved.