Does Slicing `a` (e.g. `a[1:] == a[:-1]`) create copies of the `a`?
Asked Answered
A

4

11

A friend of mine showed me the following Python code:

a[1:] == a[:-1]

Which returns True iff all the items in a are identical.

I argued the code is hard to understand from first sight, and moreover - it is inefficient in memory usage, because two copies of a will be created for the comparison.

I've used Python's dis to see what happens behind the hood at a[1:]==a[:-1]:

>>> def stanga_compare(a):
...     return a[1:]==a[:-1]
...
>>> a=range(10)
>>> stanga_compare(a)
False
>>> a=[0 for i in range(10)]
>>> stanga_compare(a)
True

>>> dis.dis(stanga_compare)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_CONST               1 (1)
              6 SLICE+1
              7 LOAD_FAST                0 (a)
             10 LOAD_CONST               2 (-1)
             13 SLICE+2
             14 COMPARE_OP               2 (==)
             17 RETURN_VALUE

It boils down to two slicing commands - SLICE+1 and SLICE+2. The documentation is unclear as to whether these opcodes actually create a new copy of a, or just a reference to it.

  • Does the SLICE commands copy a?
  • Does the answer varies between Python implementations (Cython, Jython)?

Update

This snippet is clearly unreadable and confusing, and I'm not going to ever use it in actual code. My interest is purely technical - whether slicing copies the list, and whether the answer varies under different circumstances.

Alienage answered 8/12, 2013 at 6:44 Comment(2)
The "varies" comment doesn't make much sense IMO. Even x == y can do anything for user defined classes (including opening pop-up windows or connecting to database servers) so?Withal
It is a feature of python, not just Cpython. I've linked the section in the python tutorial.Mashe
M
11

The documentation is unclear because slicing different objects does different things. In the case of a list, slicing does make a (shallow) copy1. Note that this is a feature of python lists independent of python implementation. In the case of other objects (like numpy arrays), it might not create a copy.

If you want a better way to check that all the elements in a list are the same, I would probably recommend:

 all(lst[0] == item for item in lst)

From a performance standpoint, your friend's code might actually outperform this for small lists since list slicing is so optimized. But, this is IMHO much easier to tell what is going on and has the opportunity to "short circuit" as soon as it finds a non-match.

1The actual function to look at is list_subscript, but for most cases, it just calls list_slice

Mashe answered 8/12, 2013 at 6:52 Comment(5)
See my comment below - it seems that the list is copied only once for two slices.Alienage
@AdamMatan -- No. I've answered your comment below.Mashe
Or for arbitrary iterables: i = iter(lst); all(next(i, object()) == item for item in i) or somethingOrigami
@JonClements -- I don't think that quite works either. You'd need to factor the next out of the loop. (I thought about this already). Yours would inconveniently work out for [1,1,2,2,3,3]. but the idea's right.Mashe
@Mashe yup... looks like I never got around to actually updating that comment :)Origami
C
3

If a is a list or tuple or string, len(a) is n, and n > 0, then each slice creates (at C level) a new array of length n-1. At C level, all objects in CPython are implemented as pointers, so these new arrays contain n-1 pointers copied from a (well, not really for strings - the string representation is more frugal).

But, as @mgilson said, what slicing returns is up to a's type. Some types may return a compact descriptor instead of copying anything. And a type may even implement slicing in such a way that the code shown wouldn't work as intended.

But you really had a list in mind ;-)

Copaiba answered 8/12, 2013 at 6:59 Comment(5)
Numpy is a perfect example where the code above would not work as intended. you'd get back a boolean array which has no truth value at all :-).Mashe
Why there is only one copy when we use slicing twice? In my example, first two items print the same addressRufina
@mgilson: numpy redefines any/all too. See https://mcmap.net/q/1016417/-why-do-numpy-all-and-any-give-wrong-results-if-you-use-generator-expressions/320726Withal
n > 0 is not a necessary condition: try a=[]; b=[:]; b.append(1); print(a).Syst
@larsmans, n > 0 is necessary in what I wrote, because of the claim that "each slice creates (at C level) a new array of length n-1". It's hard to create a new array of length -1 ;-)Copaiba
W
3

Yes, with list objects Python creates shallow copies when slicing, however the loop is made in C (for cpython) and for that reason is going to be much faster than anything you can write in Python for doing the same. Looping two times in C for the shallow copy and looping again for comparison is going to be faster than just looping in Python once.

Please remember than cpython is quite often fast enough, but that Python code is still about 100 times slower than C code. Leaving cpython doing the loops for you is therefore better in general if you want a bit of speed. Note that even things like c = a + b in Python mean the execution of a lot of logic (including branches and memory allocations).

On the other side however if for your code this kind of micro optimization is fundamental then probably Python is not the right tool for the problem you are fighting and you should consider other options (like writing a small C++ extension interfaced with sip, using Cython, PyPy ...).

For sure that code is not readable for the untrained eye and in case the list is long and often not constant then all(y == x[0] for y in x) is going to be faster because of short circuiting (even if the loop is in Python and an extra subscript operation is done for every element).

Readability counts. A lot.

EDIT

Another interesting option for having C code looping over the elements is

x and x.count(x[0]) == len(x)

This doesn't offer short-circuiting, but on my pc is about 75 times faster than the all-based solution for a list for 1000 elements all of them equal and about 6 times faster than x[1:] == x[:-1].

I find it also a bit more readable than x[1:] == x[:-1], but it's probably a matter of taste.

Withal answered 8/12, 2013 at 7:29 Comment(0)
V
0

For vanilla lists, slicing does create a copy. You can prevent the copy using iteration instead:

import itertools
a1 = iter(a)
a2 = iter(a)
a2.next() # start a2 iterator one removed
all_are_identical = all((i1 == i2 for i1, i2 in itertools.izip(a1, a2)))

The line (i1 == i2 for i1, i2 in itertools.izip(a1, a2)) creates a generator that will return whether each element in a is equal to the next, one at a time, to all. The results are evaluated one by one, instead of being put in a list first, so you save memory at the cost of some performance.

Vocalist answered 8/12, 2013 at 7:17 Comment(3)
@Mashe Can you give an example where the above is not equivalent to a[1:] == a[:-1], (the formula from the question the code is meant to replicate)?Vocalist
@Mashe all_are_identical = False for a = [1, 2, 3, 4, 5], as expected.Vocalist
Oh, my fault. I missed the a2.next() call. Although, I would probably write that as next(a2) to future proof it.Mashe

© 2022 - 2025 — McMap. All rights reserved.