You can sort of do this, but all options are messy and full of caveats to the point of near-uselessness, so first, let's consider whether you really want to.
Interning a string doesn't prolong its lifetime. You don't have to worry about the interned dict growing forever, full of strings you don't need. Thus, string interning is unlikely to be an actual memory problem, and learning how many strings have been interned might be pretty useless.
If you still want to do this, let's go through your options.
The Right Way would probably be to use your own interning implementation... except that Python's lackluster weak reference support doesn't let you create weak references to strings. That means that if you try this approach, you're stuck either passing around your own weak-referenceable string wrappers or keeping interned strings alive forever. Both options are terrible.
There is actually a function that prints the information you're asking about... but it also de-interns everything. Its existence is an implementation detail, and it's only accessible through the C API, so we'll need to use ctypes.pythonapi
to get at it.
import ctypes
_Py_ReleaseInternedStrings = ctypes.pythonapi._Py_ReleaseInternedStrings
_Py_ReleaseInternedStrings.argtypes = ()
_Py_ReleaseInternedStrings.restype = None
_Py_ReleaseInternedStrings()
Output:
releasing 3461 interned strings
total size of all interned strings: 33685/0 mortal/immortal
The total sizes listed are sums of string lengths, so they don't include object headers or null terminators.
You're probably not happy about having to release all interned strings every time you want to check how many there were. Unfortunately, Python doesn't expose the interned dict, even through the C API or through GC hooks. What else could you try? Well, moving on to even crazier options, there's the debugger.
ecatmur posted a crazy hack launching a GDB process in unattended mode and using a conditional breakpoint to get at errnomap
, a very similar dict to the interned
dict you'd like to access. This could be adapted to access the interned
dict instead. It would be highly non-portable and extremely difficult to maintain.
Launching a debugger is also a terrible option. What else could you try? Well, you could always build your own custom build of Python. Download the source from python.org, add
PyObject *
AwfulHackToGetTheInternedDict(void)
{
if (interned == NULL) {
// No interned dict yet.
Py_RETURN_NONE;
}
Py_INCREF(interned);
return interned;
}
to Objects/stringobject.c
, build, and install. You'll probably want to use a virtualenv to keep this separate from your normal Python interpreter. With this awful hack in place, you can do
import ctypes
AwfulHackToGetTheInternedDict = ctypes.pythonapi.AwfulHackToGetTheInternedDict
AwfulHackToGetTheInternedDict.argtypes = ()
AwfulHackToGetTheInternedDict.restype = ctypes.py_object
interned = AwfulHackToGetTheInternedDict()
to get the dict of all interned strings.
So, those are your options, or at least, the options I've thought of. I also tried forcing the GC to track a string and then interning it to make the interned dict visible through the GC, but calling PyObject_GC_Track
on a string caused a fatal error, so that doesn't work.
interned
dict, since the (non-literal) strings in my applications have always been of more consequence, so ensuring I only have one copy of each of those strings has been where I've spent my time. As as result, I'm still curious as to what your goal is - if you have the information you're asking for, how would you use it? – Charmioninterned
dict itself; the number (and size) of interned strings that are referred to from nowhere else; the number (and size) of interned strings that are referred to from only one other place. Together, these help answer the question: are we wasting significant amounts of memory by interning strings unnecessarily. – Elastin