Are strings pooled in Python?
Asked Answered
B

4

18

Does Python have a pool of all strings and are they (strings) singletons there?

More precise, in the following code, are one or two strings created in memory?

a = str(num)
b = str(num)
Boice answered 25/3, 2010 at 21:33 Comment(3)
Just for reference, strings can't be singletons. A singleton is a class for which there can only be one instance, and that instance must be accessible globally. There can (hopefully) be many instances of the str class; therefore it's not a singleton.Huldahuldah
The concept you're looking for is string interning: en.wikipedia.org/wiki/String_interningThurmond
@Huldahuldah Thank you for comment. I meant something like value-singleton (pool or string interning is right word for it - en.wikipedia.org/wiki/String_interning).Boice
C
26

Strings are immutable in Python, so the implementation can decide whether to intern (that's a term often associated with C#, meaning that some strings are stored in a pool) strings or not.

In your example, you're dynamically creating strings. CPython does not always look into the pool to detect whether the string is already there - it also doesn't make sense because you first have to reserve memory in order to create the string, and then compare it to the pool content (inefficient for long strings).

But for strings of length 1, CPython does look into the pool (cf. "stringobject.c"):

static PyStringObject *characters[UCHAR_MAX + 1];

...

PyObject *
PyString_FromStringAndSize(const char *str, Py_ssize_t size)
{

...

    if (size == 1 && str != NULL &&
    (op = characters[*str & UCHAR_MAX]) != NULL)
    {
        #ifdef COUNT_ALLOCS
            one_strings++;
        #endif

        Py_INCREF(op);
        return (PyObject *)op;
    }

...

So:

a = str(num)
b = str(num)
print a is b # <-- this will print False in most cases (but try str(1) is str(1))

But when using constant strings directly in your code, CPython uses the same string instance:

a = "text"
b = "text"
print a is b # <-- this will print True
Confirmand answered 25/3, 2010 at 21:40 Comment(5)
@Andidog: If CPython does not look into the pool to check if the string is already there, then why does print a is b print true when num is equal to 5?Selenite
@Brian: Sorry, that was a bit inaccurate. Edited my answer to explain the way CPython implements that.Confirmand
Good answer. The only detail I'd add is to note that Python does have intern()Piscary
@keturn: Thanks, didn't even know about intern() yet.Confirmand
str(1) is str(1) returns true in python 2.7 but false in python 3Pelmas
T
8

In general, strings are not interned in Python, but they do sometimes seem to be:

>>> str(5) is str(5)
True
>>> str(50) is str(50)
False

This isn't uncommon in Python, where common objects might be optimized in ways that unusual ones are not:

>>> int(5+0) is int(5+0)
True
>>> int(50+0) is int(50+0)
True
>>> int(500+0) is int(500+0)
False

And keep in mind, all of these sorts of details will differ between implementations of Python, and even between versions of the same implementation.

Thurmond answered 25/3, 2010 at 21:40 Comment(0)
C
2

Strings are not interned in general. In your example two strings will be created (with the exception of values between 0 and 9). To test this we can use the is operator to see if the two strings are the same object:

>>> str(1056) is str(1056)
False
Coplanar answered 25/3, 2010 at 21:35 Comment(3)
What about this: In [1]: x = str(5) In [2]: y = str(5) In [3]: id(x) Out[3]: 3077925280L In [4]: id(y) Out[4]: 3077925280L ?Hysteric
gruszczy: That's a good question. This is a special case that only applies to the numbers 0 to 9. In general though, the statement is not true. I've clarified my answer.Coplanar
0 to 9 is a specific case on a specific compiler (though admittedly, it's the compiler most people use). Other compilers may choose a different number of pre-defined strings.Selenite
S
0

the constant of pool in python distinguish small integer pool and large integer pool, and the small integer pool is range in [-5, 257); and other integers in the large integer pool. In Cython, if defines a linked-list to store this data, where fetch the data becomes very convenient and fast.

# ifndef NSMALLPOSINTS
    # define NSMALLPOSINTS 257
# endif

# ifndef NSMALLNEGINTS
    # define NSMALLNEGINTS 5
# endif

# if NSMALLPOSINTS + NSMALLNEGINTS > 0
    static PyIntObject * small_ints[NSMALLPOSINTS + NSMALLNEGINTS];
# endif

BTW: the integer 257 maybe stranger; if two objects that they have the same values are in the same field, their address may be or not be the same, it depends on the context of process; whereas, if they are in the different fields, their addresses must be difference

and by the way, according to string type, cython also provide a constant pool that the length of string should be one, whereas it may be not the same object

a = str(11)
b = str(11)
print a == b      # True 
print a is b      # False

c = str("A")
d = str("A")   
print c == d    # True
print c is d    # True

aa = 12
bb = 12
print aa == bb    # True
print aa is bb    # True

cc = 333
dd = 333
print cc == dd    # True
print cc is dd    # False

comparing their addresses, is transparently getting the ahead solutions

Sucy answered 7/9, 2018 at 8:25 Comment(1)
Please don't insert screenshots of the code. Copy the text instead.Serosa

© 2022 - 2024 — McMap. All rights reserved.