Can you safely change a Python object's type in a C extension?
Asked Answered
E

2

7

Question

Suppose that I have implemented two Python types using the C extension API and that the types are identical (same data layouts/C struct) with the exception of their names and a few methods. Assuming that all methods respect the data layout, can you safely change the type of an object from one of these types into the other in a C function?

Notably, as of Python 3.9, there appears to be a function Py_SET_TYPE, but the documentation is not clear as to whether/when this is safe to do. I'm interested in knowing both how to use this function safely and whether types can be safely changed prior to version 3.9.

Motivation

I'm writing a Python C extension to implement a Persistent Hash Array Mapped Trie (PHAMT); in case it's useful, the source code is here (as of writing, it is at this commit). A feature I would like to add is the ability to create a Transient Hash Array Mapped Trie (THAMT) from a PHAMT. THAMTs can be created from PHAMTs in O(1) time and can be mutated in-place efficiently. Critically, THAMTs have the exact same underlying C data-structure as PHAMTs—the only real difference between a PHAMT and a THAMT is a few methods encapsulated by their Python types. This common structure allows one to very efficiently turn a THAMT back into a PHAMT once one has finished performing a set of edits. (This pattern typically reduces the number of memory allocations when performing a large number of updates to a PHAMT).

A very convenient way to implement the conversion from THAMT to PHAMT would be to simply change the type pointers of the THAMT objects from the THAMT type to the PHAMT type. I am confident that I can write code that safely navigates this change, but I can imagine that doing so might, for example, break the Python garbage collector.

(To be clear: the motivation is just context as to how the question arose. I'm not looking for help implementing the structures described in the Motivation, I'm looking for an answer to the Question, above.)

Extravagate answered 18/2, 2022 at 18:44 Comment(6)
You're proposing an interesting optimization, but IMO you're not likely to get any definitive answers. I'd try to get an accurate measure of any performance difference under your typical use cases and see if whatever performance gain you see is worth the risk of failure the next time someone runs apt upgrade or the phase of the moon is just wrong.Bowfin
@AndrewHenle That is my intuition about this kind of hack in general, but I can't find a single piece of documentation that suggests this is a bad idea, nor any explanation of the Py_SET_TYPE function. Does this question actually just live in the gray zone of the C-Python language spec?Extravagate
Do you really need to use the python type system to this end? Why don't you just store an "implementation" flag (or a boolean) in your struct to hold the information? You can still write extension functions to trigger the change between the implementations (or access the current state). I would think that changing the type of an object is bad practice even if it is technically allowed.Ignace
@Ignace No, this is definitely not necessary—the motivation I wrote is just to provide context for how the question arose. The question is just about clarifying the Python spec, which seems very ambiguous on this point, especially given the existence of Py_SET_TYPE.Extravagate
The two answers look pretty detailed and comprehensive. The only thing I have to add to them is "Numpy does it" (for dtype objects) so there's at least one reasonably large and well-used library that does use Py_SET_TYPE in this wayMeghannmegiddo
@Meghannmegiddo Ahh, thanks, that's useful as a reference! (And—I agree, both answers are quite good, and I'm not yet sure how I'm going to pick one for the bounty.)Extravagate
M
5

The supported way

It is officially possible to change an object's type in Python, as long as the memory layouts are compatible... but this is mostly limited to types not implemented in C. With some restrictions, it is possible to do

# Python attribute assignment, not C struct member assignment
obj.__class__ = some_new_class

to change an object's class, with one of the restrictions being that both the old and new classes must be "heap types", which all classes implemented in Python are and most classes implemented in C are not. (types.ModuleType and subclasses of that type are also specifically permitted, despite types.ModuleType not being a heap type. See the source for exact restrictions.)

If you want to create a heap type from C, you can, but the interface is pretty different from the normal way of defining Python types from C. Plus, for __class__ assignment to work, you have to not set the Py_TPFLAGS_IMMUTABLETYPE flag, and that means that people will be able to monkey-patch your classes in ways you might not like (or maybe you see that as an upside).

If you want to go that route, I suggest looking at the CPython 3.10 _functools module source code for an example. (They set the Py_TPFLAGS_IMMUTABLETYPE flag, which you'll have to make sure not to do.)


The unsupported way

There was an attempt at one point to allow __class__ assignment for non-heap types, as long as the memory layouts worked. It got abandoned because it caused problems with some built-in immutable types, where the interpreter likes to reuse instances. For example, allowing (1).__class__ = SomethingElse would have caused a lot of problems. You can read more in the big comment in the source code for the __class__ setter. (The comment is slightly out of date, particularly regarding the Py_TPFLAGS_IMMUTABLETYPE flag, which was added after the comment was written.)

As far as I know, this was the only problem, and I don't think any more problems have been added since then. The interpreter isn't going to aggressively reuse instances of your classes, so as long as you're not doing anything like that, and the memory layouts are compatible, I think changing the type of your objects should work for now, even for non-heap-types. However, it is not officially supported, so even if I'm right about this working for now, there's no guarantee it'll keep working.

Py_SET_TYPE only sets an object's type pointer. It doesn't do any refcount fixing that might be needed. It's a very low-level operation. If neither the old class nor the new class are heap types, no extra refcount fixing is needed, but if the old class is a heap type, you will have to decref the old class, and if the new class is a heap type, you will have to incref the new class.

If you need to decref the old class, make sure to do it after changing the object's class and possibly incref'ing the new class.

Mcgehee answered 2/3, 2022 at 1:55 Comment(0)
I
2

According to the language reference, chapter 3 "Data model" (see here):

An object’s type determines the operations that the object supports (e.g., “does it have a length?”) and also defines the possible values for objects of that type. The type() function returns an object’s type (which is an object itself). Like its identity, an object’s type is also unchangeable.[1]

which, to my mind states that the type must never change, and changing it would be illegal as it would break the language specification. The footnote however states that

[1] It is possible in some cases to change an object’s type, under certain controlled conditions. It generally isn’t a good idea though, since it can lead to some very strange behaviour if it is handled incorrectly.

I don't know of any method to change the type of an object from within python itself, so the "possible" may indeed refer to the CPython function.

As far as I can see a PyObject is defined internally as a

struct _object {
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    PyTypeObject *ob_type;
};

So the reference counting should still work. On the other hand you will segfault the interpreter if you set the type to something that is not a PyTypeObject, or if the pointer is free()d, so the usual caveats.

Apart from that I agree that the specification is a little ambiguous, but the question of "legality" may not have a good answer. The long and short of it seems to me to be "do not change types unless you know what your are doing, and if you are not hacking on CPython itself you do not know what you are doing".

Edit: The Py_SET_TYPE function was added in Python 3.9 based on this commit. Apparently, people used to just set the type using

Py_TYPE(obj) = typeobj;

So the inclusion (without being formerly announced as far as I can see) is more akin to adding a convenience function.

Ignace answered 2/3, 2022 at 0:41 Comment(1)
Py_SET_TYPE was added because Py_TYPE is being changed to a function in 3.11, so Py_TYPE(obj) = typeobj; won't work any more after the change.Mcgehee

© 2022 - 2025 — McMap. All rights reserved.