Python3 multiple assignment and memory address [duplicate]

B

3

10

After reading this and this, which are pretty similar to my question, I still cannot understand the following behaviour:

a = 257
b = 257
print(a is b) #False
a, b = 257, 257
print(a is b) #True

When printing id(a) and id(b) I can see that the variables, to which the values were assigned in separate lines, have different ids, whereas with multiple assignment both values have the same id:

a = 257
b = 257
print(id(a)) #139828809414512
print(id(b)) #139828809414224
a, b = 257, 257
print(id(a)) #139828809414416
print(id(b)) #139828809414416

But it's impossible to explain this behaviour by saying that multiple assignment of same values always creates pointers to the same id since:

a, b = -1000, -1000  
print(id(a)) #139828809414448
print(id(b)) #139828809414288

Is there a clear rule, which explains when the variables get the same id and when not?

edit

relevant info: The code in this question was run in interactive mode(ipython3)

Butchery answered 8/2, 2016 at 16:56 Comment(3)

Check out [id(i) for i in (1000,1000,1000,1000)] :-) – Hestia 8/2, 2016 at 17:28

Note: the behavior is inconsistent because it never matters. If there were any reason to care whether two 257s were the same object, there would be a simple, sensible pattern to it. – Spaniard 8/2, 2016 at 17:32

peephole optimizations, irrelevant how it works as it is never something you would or should ever rely on. – Precipitant 8/2, 2016 at 18:1

S

2

This is due to a constant folding optimization in the bytecode compiler. When the bytecode compiler compiles a batch of statements, it uses a dict to keep track of the constants it's seen. This dict automatically merges any equivalent constants.

Here's the routine responsible for recording and numbering constants (as well as a few related responsibilities):

static int
compiler_add_o(struct compiler *c, PyObject *dict, PyObject *o)
{
    PyObject *t, *v;
    Py_ssize_t arg;

    t = _PyCode_ConstantKey(o);
    if (t == NULL)
        return -1;

    v = PyDict_GetItem(dict, t);
    if (!v) {
        arg = PyDict_Size(dict);
        v = PyInt_FromLong(arg);
        if (!v) {
            Py_DECREF(t);
            return -1;
        }
        if (PyDict_SetItem(dict, t, v) < 0) {
            Py_DECREF(t);
            Py_DECREF(v);
            return -1;
        }
        Py_DECREF(v);
    }
    else
        arg = PyInt_AsLong(v);
    Py_DECREF(t);
    return arg;
}

You can see that it only adds a new entry and assigns a new number if it doesn't find an equivalent constant already present. (The _PyCode_ConstantKey bit makes sure things like 0.0, -0.0, and 0 are considered inequivalent.)

In interactive mode, a batch ends every time the interpreter has to actually run your command, so constant folding mostly doesn't happen across commands:

>>> a = 1000
>>> b = 1000
>>> a is b
False
>>> a = 1000; b = 1000 # 1 batch
>>> a is b
True

In a script, all top-level statements are one batch, so more constant folding happens:

a = 257
b = 257
print a is b

In a script, this prints True.

A function's code gets its constants tracked separately from code outside the function, which limits constant folding:

a = 257

def f():
    b = 257
    print a is b

f()

Even in a script, this prints False.

Spaniard answered 8/2, 2016 at 17:53 Comment(5)

thank you. I will edit my question to make clear it was tested in interactive mode. – Butchery 8/2, 2016 at 17:55

So what about integers smaller than -5 in python 3? for example a, b = -6, -6 – Grove 8/2, 2016 at 18:5

@Kasramvd: The compiler compiles -6 into a LOAD_CONST and a UNARY_NEGATIVE; there's a separate optimization responsible for converting that into a LOAD_CONST that loads a -6. That optimization doesn't use the dict. – Spaniard 8/2, 2016 at 18:14

@Spaniard indeed, but I didn't find any restriction for smaller than -5 in C code! :( is it relative to this point that negative numbers are represented in a variant of 2’s complement which gives the illusion of an infinite string of sign bits extending to the left? – Grove 8/2, 2016 at 18:22

@Kasramvd: -5 gets handled by yet another optimization, the small integer pool. All -5s are the same object, no matter how or where you create them. – Spaniard 8/2, 2016 at 18:27

G

3

That's because of pythons interpreter optimization at UNPACK_SEQUENCE time, during loading the constant values. When python encounters an iterable during the unpacking, it doesn't load the duplicate objects multiple times, instead it just keeps the first object and assigns all your duplicate variable names to one pointer (In CPython implementation). Therefore, all your variables will become same references to one object. At python level you can think of this behavior as using a dictionary as the namespace which doesn't keep duplicate keys.

In other words, your unpacking would be equivalent to following command:

a = b = 257

And about the negative numbers, in python 2.X it doesn't make any difference but in python 3.X it seems that for numbers smaller than -5 python will create new object during unpacking:

>>> a, b = -6, -6
>>> a is b
False
>>> a, b = -5, -5
>>> 
>>> a is b
True

Grove answered 8/2, 2016 at 17:12 Comment(6)

How does your answer explain the last part of my question, with a,b=-1000, -1000 ? – Butchery 8/2, 2016 at 17:26

@isternberg In last part you have create your variables separately, but there is another point in negative numbers unpacking in python3 which I'll update the answer with the reason as soon as possible. – Grove 8/2, 2016 at 17:33

"one LOAD_CONSTANT and 2 STORE_FAST" -- While you're correct, the LOAD_CONSTANT is loading a constant tuple which holds 2 items. I'm not sure that it's clear to me how that explains the two values having the same IDs (when they are outside the normal range of python's interned integers). – Fabianfabianism 8/2, 2016 at 17:38

No, no, no! This is completely wrong. UNPACK_SEQUENCE has nothing to do with it. Decompile a, b = 257, 258 and you'll see the same 1 LOAD_CONST, 1 UNPACK_SEQUENCE behavior, but a and b are clearly different. You can also build a (257, 257) tuple in a way that avoids the optimization, and UNPACK_SEQUENCE won't merge the constants. – Spaniard 8/2, 2016 at 17:41

@Kasramvd, thx for letting me know about the mistake in my question. I fixed it. – Butchery 8/2, 2016 at 17:41

@Spaniard Yes, I'll update the answer to removing this illusion but it still happens during unpacking sequence. I think that dis module is not helpful here, thanks for attention and noting that. – Grove 8/2, 2016 at 17:45

S

2

This is due to a constant folding optimization in the bytecode compiler. When the bytecode compiler compiles a batch of statements, it uses a dict to keep track of the constants it's seen. This dict automatically merges any equivalent constants.

Here's the routine responsible for recording and numbering constants (as well as a few related responsibilities):

static int
compiler_add_o(struct compiler *c, PyObject *dict, PyObject *o)
{
    PyObject *t, *v;
    Py_ssize_t arg;

    t = _PyCode_ConstantKey(o);
    if (t == NULL)
        return -1;

    v = PyDict_GetItem(dict, t);
    if (!v) {
        arg = PyDict_Size(dict);
        v = PyInt_FromLong(arg);
        if (!v) {
            Py_DECREF(t);
            return -1;
        }
        if (PyDict_SetItem(dict, t, v) < 0) {
            Py_DECREF(t);
            Py_DECREF(v);
            return -1;
        }
        Py_DECREF(v);
    }
    else
        arg = PyInt_AsLong(v);
    Py_DECREF(t);
    return arg;
}

You can see that it only adds a new entry and assigns a new number if it doesn't find an equivalent constant already present. (The _PyCode_ConstantKey bit makes sure things like 0.0, -0.0, and 0 are considered inequivalent.)

In interactive mode, a batch ends every time the interpreter has to actually run your command, so constant folding mostly doesn't happen across commands:

>>> a = 1000
>>> b = 1000
>>> a is b
False
>>> a = 1000; b = 1000 # 1 batch
>>> a is b
True

In a script, all top-level statements are one batch, so more constant folding happens:

a = 257
b = 257
print a is b

In a script, this prints True.

A function's code gets its constants tracked separately from code outside the function, which limits constant folding:

a = 257

def f():
    b = 257
    print a is b

f()

Even in a script, this prints False.

Spaniard answered 8/2, 2016 at 17:53 Comment(5)

thank you. I will edit my question to make clear it was tested in interactive mode. – Butchery 8/2, 2016 at 17:55

So what about integers smaller than -5 in python 3? for example a, b = -6, -6 – Grove 8/2, 2016 at 18:5

@Kasramvd: The compiler compiles -6 into a LOAD_CONST and a UNARY_NEGATIVE; there's a separate optimization responsible for converting that into a LOAD_CONST that loads a -6. That optimization doesn't use the dict. – Spaniard 8/2, 2016 at 18:14

@Spaniard indeed, but I didn't find any restriction for smaller than -5 in C code! :( is it relative to this point that negative numbers are represented in a variant of 2’s complement which gives the illusion of an infinite string of sign bits extending to the left? – Grove 8/2, 2016 at 18:22

@Kasramvd: -5 gets handled by yet another optimization, the small integer pool. All -5s are the same object, no matter how or where you create them. – Spaniard 8/2, 2016 at 18:27

B

0

Any such rule is implementation-specific. CPython, for example, pre-allocates int objects for small integers (-5 through 256) as a performance optimization.

The only general rule is to assume any use of a literal will generate a new object.

Bashee answered 8/2, 2016 at 17:10 Comment(3)

The number is 257 and you can see this behaviour by using larger integers too. – Grove 8/2, 2016 at 17:14

I always forget the exact range. – Bashee 8/2, 2016 at 17:36

not "will generate", but "might generate". – Easeful 8/2, 2016 at 18:4

Recommended topics

Hot tags