Building Self-Referencing Tuples
Asked Answered
P

5

21

After seeing a conversation in a forum from many years ago that was never resolved, it caused me to wonder how one would correctly create a tuple that referenced itself. Technically, this is a very bad idea since tuples are supposed to be immutable. How could an immutable object possibly contain itself? However, this question is not about best practices but is a query regarding what is possible in Python.

import ctypes

def self_reference(array, index):
    if not isinstance(array, tuple):
        raise TypeError('array must be a tuple')
    if not isinstance(index, int):
        raise TypeError('index must be an int')
    if not 0 <= index < len(array):
        raise ValueError('index is out of range')
    address = id(array)
    obj_refcnt = ctypes.cast(address, ctypes.POINTER(ctypes.c_ssize_t))
    obj_refcnt.contents.value += 1
    if ctypes.cdll.python32.PyTuple_SetItem(ctypes.py_object(array),
                                            ctypes.c_ssize_t(index),
                                            ctypes.py_object(array)):
        raise RuntimeError('PyTuple_SetItem signaled an error')

The previous function was designed to access the C API of Python while keeping internal structures and datatypes in mind. However, the following error is usually generated when running the function. Through unknown processes, it has been possible to create a self-referencing tuple via similar techniques before.

Question: How should the function self_reference be modified to consistently work all of the time?

>>> import string
>>> a = tuple(string.ascii_lowercase)
>>> self_reference(a, 2)
Traceback (most recent call last):
  File "<pyshell#56>", line 1, in <module>
    self_reference(a, 2)
  File "C:/Users/schappell/Downloads/srt.py", line 15, in self_reference
    ctypes.py_object(array)):
WindowsError: exception: access violation reading 0x0000003C
>>> 

Edit: Here are two different conversations with the interpreter that are somewhat confusing. The code up above appears to be correct if I understand the documentation correctly. However, the conversations down below appear to both conflict with each other and the self_reference function up above.

Conversation 1:

Python 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
on win32
Type "copyright", "credits" or "license()" for more information.
>>> from ctypes import *
>>> array = tuple(range(10))
>>> cast(id(array), POINTER(c_ssize_t)).contents.value
1
>>> cast(id(array), POINTER(c_ssize_t)).contents.value += 1
>>> cast(id(array), POINTER(c_ssize_t)).contents.value
2
>>> array
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
>>> cdll.python32.PyTuple_SetItem(c_void_p(id(array)), 0,
                                  c_void_p(id(array)))
Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    cdll.python32.PyTuple_SetItem(c_void_p(id(array)), 0,
                                  c_void_p(id(array)))
WindowsError: exception: access violation reading 0x0000003C
>>> cdll.python32.PyTuple_SetItem(c_void_p(id(array)), 0,
                                  c_void_p(id(array)))
Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    cdll.python32.PyTuple_SetItem(c_void_p(id(array)), 0,
                                  c_void_p(id(array)))
WindowsError: exception: access violation reading 0x0000003C
>>> array
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
>>> cdll.python32.PyTuple_SetItem(c_void_p(id(array)), 0,
                                  c_void_p(id(array)))
0
>>> array
((<NULL>, <code object __init__ at 0x02E68C50, file "C:\Python32\lib
kinter\simpledialog.py", line 121>, <code object destroy at 0x02E68CF0,
file "C:\Python32\lib   kinter\simpledialog.py", line 171>, <code object
body at 0x02E68D90, file "C:\Python32\lib      kinter\simpledialog.py",
line 179>, <code object buttonbox at 0x02E68E30, file "C:\Python32\lib
kinter\simpledialog.py", line 188>, <code object ok at 0x02E68ED0, file
"C:\Python32\lib        kinter\simpledialog.py", line 209>, <code object
cancel at 0x02E68F70, file "C:\Python32\lib    kinter\simpledialog.py",
line 223>, <code object validate at 0x02E6F070, file "C:\Python32\lib
kinter\simpledialog.py", line 233>, <code object apply at 0x02E6F110, file
"C:\Python32\lib     kinter\simpledialog.py", line 242>, None), 1, 2, 3, 4,
5, 6, 7, 8, 9)
>>>

Conversation 2:

Python 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
on win32
Type "copyright", "credits" or "license()" for more information.
>>> from ctypes import *
>>> array = tuple(range(10))
>>> cdll.python32.PyTuple_SetItem(c_void_p(id(array)), c_ssize_t(1),
                                  c_void_p(id(array)))
0
>>> array
(0, (...), 2, 3, 4, 5, 6, 7, 8, 9)
>>> array[1] is array
True
>>>
Pishogue answered 8/8, 2012 at 21:27 Comment(5)
In which Python version did it work at least once?Irksome
The edit shows Python's version while operating in IDLE. Also, does it matter that it is actually a 64-bit computer?Pishogue
i guess it turns out tuples are not immutable at the C levelAttend
Technically, nothing is immutable at the C level (except read-only memory regions...). For example, passing a Python string (an immutable construct in Python) to a C function that modifies its input will modify the string. This is generally a bad idea since it could cause an interned string to change value, but it is still possible.Emelun
Funny, the docs explicitly claim this is impossible: "it’s possible to prove that no reference cycle can be composed entirely of tuples."Octad
P
10

Thanks to nneonneo's help, I settled on the following implementation of the self_reference method.

import ctypes

ob_refcnt_p = ctypes.POINTER(ctypes.c_ssize_t)

class GIL:
    acquire = staticmethod(ctypes.pythonapi.PyGILState_Ensure)
    release = staticmethod(ctypes.pythonapi.PyGILState_Release)

class Ref:
    dec = staticmethod(ctypes.pythonapi.Py_DecRef)
    inc = staticmethod(ctypes.pythonapi.Py_IncRef)

class Tuple:
    setitem = staticmethod(ctypes.pythonapi.PyTuple_SetItem)
    @classmethod
    def self_reference(cls, array, index):
        if not isinstance(array, tuple):
            raise TypeError('array must be a tuple')
        if not isinstance(index, int):
            raise TypeError('index must be an int')
        if not 0 <= index < len(array):
            raise ValueError('index is out of range')
        GIL.acquire()
        try:
            obj = ctypes.py_object(array)
            ob_refcnt = ctypes.cast(id(array), ob_refcnt_p).contents.value
            for _ in range(ob_refcnt - 1):
                Ref.dec(obj)
            if cls.setitem(obj, ctypes.c_ssize_t(index), obj):
                raise SystemError('PyTuple_SetItem was not successful')
            for _ in range(ob_refcnt):
                Ref.inc(obj)
        finally:
            GIL.release()

To use the method, follow the example shown down below for creating your own self-referencing tuples.

>>> array = tuple(range(5))
>>> Tuple.self_reference(array, 1)
>>> array
(0, (...), 2, 3, 4)
>>> Tuple.self_reference(array, 3)
>>> array
(0, (...), 2, (...), 4)
>>> 
Pishogue answered 23/8, 2012 at 19:48 Comment(0)
E
8

AFAICT, the reason you are seeing problems is because PyTuple_SetItem fails if the refcount of the tuple isn't exactly one. This is to prevent the function from being used if the tuple has already been used elsewhere. I'm not sure why you get an access violation from that, but it may be because the exception thrown by PyTuple_SetItem isn't properly dealt with. Furthermore, the reason why the array seems to mutate to some other object is because PyTuple_SetItem DECREF's the tuple on each failure; after two failures, the refcount is zero so the object is freed (and some other object apparently ends up in the same memory location).

Using the pythonapi object in ctypes is the preferred way to get access to the Python DLL, as it handles Python exceptions properly and is guaranteed to use the correct calling convention.

I don't have a Windows machine handy to test this out, but the following works fine on Mac OS X (both Python 2.7.3 and 3.2.2):

import ctypes

def self_reference(array, index):
    # Sanity check. We can't let PyTuple_SetItem fail, or it will Py_DECREF
    # the object and destroy it.
    if not isinstance(array, tuple):
        raise TypeError("array must be a tuple")

    if not 0 <= index < len(array):
        raise IndexError("tuple assignment index out of range")

    arrayobj = ctypes.py_object(array)

    # Need to drop the refcount to 1 in order to use PyTuple_SetItem.
    # Needless to say, this is incredibly dangerous.
    refcnt = ctypes.pythonapi.Py_DecRef(arrayobj)
    for i in range(refcnt-1):
        ctypes.pythonapi.Py_DecRef(arrayobj)

    try:
        ret = ctypes.pythonapi.PyTuple_SetItem(arrayobj, ctypes.c_ssize_t(index), arrayobj)
        if ret != 0:
            raise RuntimeError("PyTuple_SetItem failed")
    except:
        raise SystemError("FATAL: PyTuple_SetItem failed: tuple probably unusable")

    # Restore refcount and add one more for the new self-reference
    for i in range(refcnt+1):
        ctypes.pythonapi.Py_IncRef(arrayobj)

Result:

>>> x = (1,2,3,4,5)
>>> self_reference(x, 1)
>>> import pprint
>>> pprint.pprint(x)
(1, <Recursion on tuple with id=4299516720>, 3, 4, 5)
Emelun answered 22/8, 2012 at 16:1 Comment(3)
Thank you so much for your help! I combined our work into one answer. ctypes.pythonapi.Py_DecRef(arrayobj) was returning the address of the object instead of the reference count, so I modified the code to manually get the number. Your insight really helped in getting an answer for the question.Pishogue
Yes, my bad. Py_DecRef and Py_IncRef return void, so you do have to pull the refcount out of the object struct.Emelun
Does the answer I provided work on your platform? I only tested on Windows.Pishogue
S
6

More simple solution:

import ctypes
tup = (0,)
ctypes.c_longlong.from_address(id(tup)+24).value = id(tup)

Result:

>>> tup
((...),)
>>> type(tup)
tuple
>>> tup[0] is tup
True
Schoolmaster answered 17/4, 2020 at 20:35 Comment(0)
A
3

Immutability should not prevent an object from referencing itself. This is easy to do in Haskell because it has lazy evaluation. Here is an imitation that does that by using a thunk:

>>> def self_ref_tuple():
    a = (1, 2, lambda: a)
    return a

>>> ft = self_ref_tuple()
>>> ft
(1, 2, <function <lambda> at 0x02A7C330>)
>>> ft[2]()
(1, 2, <function <lambda> at 0x02A7C330>)
>>> ft[2]() is ft
True

This is not a complete answer, just preliminary. Am working out to see if there's another way to make this possible.

Attend answered 22/8, 2012 at 16:54 Comment(1)
The goal was to have a direct reference, not to use a thunk.Pishogue
R
2

Technically, you could wrap the reference to the tuple inside a mutable object.

>>> c = ([],)
>>> c[0].append(c)
>>> c
([(...)],)
>>> c[0]
[([...],)]
>>> 
Reading answered 22/8, 2012 at 16:48 Comment(1)
The goal was to have a direct reference, not to use another container.Pishogue

© 2022 - 2024 — McMap. All rights reserved.