In python at runtime determine if an object is a class (old and new type) instance
Asked Answered
K

3

5

I am trying to write a deeply nested set of classes, attributes, bound methods, etc. to a HDF5 file using the h5py module for long-term storage. I am really close. The only wrinkle I have that I can't seem to resolve is to programmatically, at run-time, figure out a way to determine if something is a class instance type, rather than a list, int, etc. I need to recurse into the class instance, but obviously shouldn't recurse into an int, float, etc. This needs to work for both old- and new-style classes. Things that I researched that don't work/ I can't get to work:

Using inspect module

>>> class R(object): pass
...
>>> _R = R()
>>> import inspect
>>> inspect.isclass(_R)
False
>>> inspect.isclass(R)
True

This isn't helpful, I need a function like inspect.isclassinstance(_R) to return True

Using the types module

If you use old-style classes, there is a type called InstanceType that matches instances of old-style classes as in the code below

>>> import types
>>> class R(): pass #old-style class
...
>>> _R = R()
>>> import types
>>> type(_R) is types.InstanceType
True
>>> class R(object): pass #new-style class
...
>>> _R = R()
>>> type(_R) is types.InstanceType
False

But if you use new-style classes there is no corresponding type in types

Kr answered 2/9, 2012 at 2:59 Comment(4)
What defines a "class instance"? (Consider: isinstance(5, int) and type(int) is type(R)) And what precisely do you need to detect? I have a feeling your approach to serialization is flawed.Carlow
@delnan As in my example above, I need new- and old-style class instances to be matched. I'm not sure I follow your consider... comment. Care to spell it out?Kr
There is very little which is special about the objects which you seem to exclude from being class instances. int and other builtin types are, for (almost!) all intents and purposes, new-style classes, and values like 5 are, for (almost?) all intents and purposes, instances of these classes. While these types, like all others, need special handling for serialization, "is an object" is a useless distinction to make. And your argument for distinguishing them at all seems fishy to me.Carlow
Ah got it, I suppose you are correct that everything is now a "class". Even 5 is a class instance. Hmm... Back to the drawing board.Kr
E
6

While the poster might most likely need to rethink his design, in some cases there is a legitimate need to distinguish between instances of built-in/extension types, created in C, and instances of classes created in Python with the class statement. While both are types, the latter are a category of types that CPython internally calls "heap types" because their type structures are allocated at run-time. That python continues to distinguish them can be seen in __repr__ output:

>>> int       # "type"
<type 'int'>
>>> class X(object): pass
... 
>>> X         # "class"
<class '__main__.X'>

The __repr__ distinction is implemented exactly by checking whether the type is a heap type.

Depending on the exact needs of the application, an is_class_instance function can be implemented in one of the following ways:

# Built-in types such as int or object do not have __dict__ by
# default. __dict__ is normally obtained by inheriting from a
# dictless type using the class statement.  Checking for the
# existence of __dict__ is an indication of a class instance.
#
# Caveat: a built-in or extension type can still request instance
# dicts using tp_dictoffset, and a class can suppress it with
# __slots__.
def is_class_instance(o):
    return hasattr(o, '__dict__')

# A reliable approach, but one that is also more dependent
# on the CPython implementation.
Py_TPFLAGS_HEAPTYPE = (1<<9)       # Include/object.h
def is_class_instance(o):
    return bool(type(o).__flags__ & Py_TPFLAGS_HEAPTYPE)

EDIT

Here is an explanation of the second version of the function. It really tests whether the type is a "heap type" using the same test that CPython uses internally for its own purposes. That ensures that it will always return True for instances of heap types ("classes") and False for instances of non-heap-types ("types", but also old-style classes, which is easy to fix). It does that by checking whether the tp_flags member of the C-level PyTypeObject structure has the Py_TPFLAGS_HEAPTYPE bit set. The weak part of the implementation is that it hardcodes the value of the Py_TPFLAGS_HEAPTYPE constant to the currently observed value. (This is necessary because the constant is not exposed to Python by a symbolic name.) While in theory this constant could change, it is highly unlikely to happen in practice because such a change would gratuitously break the ABI of existing extension modules. Looking at the definitions of Py_TPFLAGS constants in Include/object.h, it is apparent that new ones are being carefully added without disturbing the old ones. Another weakness is that this code has zero chance running on a non-CPython implementation, such as Jython or IronPython.

Edelsten answered 2/9, 2012 at 15:31 Comment(4)
Brilliant! I will put in another answer with a small mod so that it will return True for old-style and new-style classes. This works a treat. Would you be able to explain what you did in the second solution? I'm not following. It works though, which is goodKr
I've now added an explanation; hope it helps.Edelsten
Phenomenal. Sadly, this doesn't quite work as advertised under Python 3.x: L is dead, so the magic number assignment should read Py_TPFLAGS_HEAPTYPE = (1<<9). Likewise, as duly noted, this fails on pure-Python classes optimized with __slots__. But the very well-authored explanation of CPython-specific "heap types" is a lucrative gold mine. I've never actually seen a reliable means of distinguishing pure-Python from C-based classes, even if it is CPython-specific. Every upvote is deserved.Dimer
@CecilCurry Thanks. Note that, to the best of my knowledge, only the first version of the function fails for Python classes that define __slots__ without a dict. I've now removed the L (which wasn't really necessary in Python 2 either), and the code should work just fine in both Python 2 and 3.Edelsten
K
0

Thanks to @user4815162342, I have been able to get this to work. Here is a slightly modified version that will return True for instances of old- and new-style classes:

#Added the check for old-style class
Py_TPFLAGS_HEAPTYPE = (1L<<9)       # Include/object.h
def is_class_instance(o):
    import types
    return (bool(type(o).__flags__ & Py_TPFLAGS_HEAPTYPE) 
            or type(o) is types.InstanceType)
Kr answered 3/9, 2012 at 20:58 Comment(0)
D
0

tl;dr Just call the is_object_pure_python() function defined far, far below.

Like ibell, I was dutifully impressed by user4815162342's authoritative Python 2.x-specific solution. All is not well in Pythonic paradise, however.

Problems. Problems Everywhere.

That solution (though insightful) has suffered a bit of bit rot not trivially resolvable by simple edits, including:

  • The L type suffix is unsupported under Python 3.x. Admittedly, trivially resolvable.
  • The cross-interpreter is_class_instance() implementation fails to account for pure-Python classes optimized with __slots__.
  • The CPython-specific is_class_instance() implementation fails under non-CPython interpreters (e.g., pypy).
  • No comparable implementations for detecting whether classes (rather than class instances) are pure-Python or C-based.

Solutions! Solutions Everywhere!

To solve these issues, the following Python 3.x-specific solution drops L, detects __slots__, has been refactored so as to prefer the more reliable CPython-specific is_class_instance() implementation under CPython and fallback to the less reliable cross-interpreter is_class_instance() implementation under all other interpreters, and has been generalized to detect both classes and class instances.

For sanity, let's detect class instances first:

import platform

# If the active Python interpreter is the official CPython implementation,
# prefer a more reliable CPython-specific solution guaranteed to succeed.
if platform.python_implementation() == 'CPython':
    # Magic number defined by the Python codebase at "Include/object.h".
    Py_TPFLAGS_HEAPTYPE = (1<<9)

    def is_instance_pure_python(obj: object) -> bool:
        '''
        `True` if the passed object is an instance of a pure-Python class _or_
        `False` if this object is an instance of a C-based class (either builtin
        or defined by a C extension).
        '''

        return bool(type(obj).__flags__ & Py_TPFLAGS_HEAPTYPE)

# Else, fallback to a CPython-agnostic solution typically but *NOT*
# necessarily succeeding. For all real-world objects of interest, this is
# effectively successful. Edge cases exist but are suitably rare.
else:
    def is_instance_pure_python(obj: object) -> bool:
        '''
        `True` if the passed object is an instance of a pure-Python class _or_
        `False` if this object is an instance of a C-based class (either builtin
        or defined by a C extension).
        '''

        return hasattr(obj, '__dict__') or hasattr(obj, '__slots__')

The Proof is in Guido's Pudding

Unit tests demonstrate the uncomfortable truth:

>>> class PurePythonWithDict(object): pass
>>> class PurePythonWithSlots(object): __slots__ = ()
>>> unslotted = PurePythonWithDict()
>>> slotted = PurePythonWithSlots()
>>> is_instance_pure_python(unslotted)
True
>>> is_instance_pure_python(slotted)
True
>>> is_instance_pure_python(3)
False
>>> is_instance_pure_python([3, 1, 4, 1, 5])
False
>>> import numpy
>>> is_instance_pure_python(numpy.array((3, 1, 4, 1, 5)))
False

Does This Generalize to Classes without Instances?

Yes, but doing so is non-trivial. Detecting whether a class (rather than class instance) is pure-Python or C-based is oddly difficult. Why? Because even C-based classes provide the __dict__ attribute. Hence, hasattr(int, '__dict__') == True.

Nonetheless, where this is a hacky way there is a hacky will. For unknown (probably banal) reasons, the dir() builtin strips the __dict__ attribute name from its returned list only for C-based classes. Hence, detecting whether a class is pure-Python or C-based in a cross-interpreter manner reduces to iteratively searching the list returned by dir() for __dict__. For the win:

import platform

# If the active Python interpreter is the official CPython interpreter,
# prefer a more reliable CPython-specific solution guaranteed to succeed.
if platform.python_implementation() == 'CPython':
    # Magic number defined by the Python codebase at "Include/object.h".
    Py_TPFLAGS_HEAPTYPE = (1<<9)

    def is_class_pure_python(cls: type) -> bool:
        '''
        `True` if the passed class is pure-Python _or_ `False` if this class
        is C-based (either builtin or defined by a C extension).
        '''

        return bool(cls.__flags__ & Py_TPFLAGS_HEAPTYPE)

# Else, fallback to a CPython-agnostic solution typically but *NOT*
# necessarily succeeding. For all real-world objects of interest, this is
# effectively successful. Edge cases exist but are suitably rare.
else:
    def is_class_pure_python(cls: type) -> bool:
        '''
        `True` if the passed class is pure-Python _or_ `False` if this class
        is C-based (either builtin or defined by a C extension).
        '''

        return '__dict__' in dir(cls) or hasattr(cls, '__slots__')

More Proof. More Pudding.

More test-driven truthiness:

>>> class PurePythonWithDict(object): pass
>>> class PurePythonWithSlots(object): __slots__ = ()
>>> is_class_pure_python(PurePythonWithDict)
True
>>> is_class_pure_python(PurePythonWithSlots)
True
>>> is_class_pure_python(int)
False
>>> is_class_pure_python(list)
False
>>> import numpy
>>> is_class_pure_python(numpy.ndarray)
False

That's All She Wrote

For generality, let's unify the low-level functions defined above into two high-level functions supporting all possible types under all possible Python interpreters:

def is_object_pure_python(obj: object) -> bool:
   '''
   `True` if the passed object is either a pure-Python class or instance of
   such a class _or_ `False` if this object is either a C-based class
   (builtin or defined by a C extension) or instance of such a class.
   '''

   if isinstance(obj, type):
       return is_class_pure_python(obj)
   else:
       return is_instance_pure_python(obj)


def is_object_c_based(obj: object) -> bool:
   '''
   `True` if the passed object is either a C-based class (builtin or
   defined by a C extension) or instance of such a class _or_ `False` if this
   object is either a pure-Python class or instance of such a class.
   '''

   return not is_object_pure_python(obj)

Behold! Pure Python.

Dimer answered 7/12, 2016 at 8:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.