What are the rules for cpython's string interning?
Asked Answered
A

2

13

In python 3.5, is it possible to predict when we will get an interned string or when we will get a copy? After reading a few Stack Overflow answers on this issue I've found this one the most helpful but still not comprehensive. Than I looked at Python docs, but the interning is not guaranteed by default

Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

So, my question is about inner intern() conditions, i.e. decision-making (whether to intern string literal or not): why the same piece of code works on one system and not on another one and what rules did author of the answer on mentioned topic mean when saying

the rules for when this happens are quite convoluted

Ardolino answered 4/3, 2016 at 20:50 Comment(12)
Just use == and forget about it. It's implementation detail anyway.Rhizoid
@erip I believe OP is aware of that. After getting through the boilerplate, this question seems to be asking about the interning rules.Overelaborate
If you really want to know the differences in implementation, it would probably make sense to specify the Python versions installed on both systems.Emmalynne
@Rhizoid I don't want to forget, I want learn and understand.Ardolino
@LevLevitsky Thanks for editing the question for it to be more relevant.Ardolino
Then could you clarify your question and remove all the irrelevant preamble about ==? Is your question "when will a string will be interned in cpython?" Note that this is no longer a python question, because python the language may not even have string interning.Rhizoid
@Rhizoid I'd love to, but my experience with Python is not very high, so you're welcome to edit the question by yourself as you see itArdolino
OK, I will edit it. But I'm not sure exactly what your question is, because it's rambling a bit. Are you asking "when will a string will be interned in cpython?" note: You should add your specific version because there are many builds of python3Rhizoid
@Rhizoid Yes, when will a string be interned in cpythonArdolino
The only rule is that the return value of intern is interned. Everything else is a morass of implementation details, inconsistent because there's little point to being consistent.Lillylillywhite
I've edited the content to discourage those kind of useless answers this question was attracting (the ones which don't tell you anything you don't already know). If you don't think it's an improvement, feel free to rollback.Rhizoid
@Rhizoid Thanks for refactoring, I appreciate your helpArdolino
L
9

You think there are rules?

The only rule for interning is that the return value of intern is interned. Everything else is up to the whims of whoever decided some piece of code should or shouldn't do interning. For example, "left" gets interned by PyCodeNew:

/* Intern selected string constants */
for (i = PyTuple_GET_SIZE(consts); --i >= 0; ) {
    PyObject *v = PyTuple_GetItem(consts, i);
    if (!all_name_chars(v))
        continue;
    PyUnicode_InternInPlace(&PyTuple_GET_ITEM(consts, i));
}

The "rule" here is that a string object in the co_consts of a Python code object gets interned if it consists purely of ASCII characters that are legal in a Python identifier. "left" gets interned, but "as,df" wouldn't be, and "1234" would be interned even though an identifier can't start with a digit. While identifiers can contain non-ASCII characters, such characters are still rejected by this check. Actual identifiers don't ever pass through this code; they get unconditionally interned a few lines up, ASCII or not. This code is subject to change, and there's plenty of other code that does interning or interning-like things.

Asking us for the "rules" for string interning is like asking a meteorologist what the rules are for whether it rains on your wedding. We can tell you quite a lot about how it works, but it won't be much use to you, and you'll always get surprises.

Lillylillywhite answered 4/3, 2016 at 22:3 Comment(0)
B
-4

From what I understood from the post you linked:

When you use if a == b, you are checking if the value of a is the value of b, whereas when you use if a is b, you are checking if a and b are the same object (or share the same spot in the memory).

Now python interns the constant strings (defined by "blabla"). So:

>>> a = "abcdef"
>>> a is "abcdef"
True

But when you do:

>>> a = "".join([chr(i) for i in range(ord('a'), ord('g'))])
>>> a
'abcdef'
>>> a is "abcdef"
False

In the C programming language, using a string with "" will make it a const char *. I think this is what is happening here.

Balsamiferous answered 4/3, 2016 at 21:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.