Concatenate tuples using sum()

Asked 6/2, 2017 at 2:38 Answered 27/10, 2021 at 4:20

Solved python sum tuples python-itertools

From this post I learned that you can concatenate tuples with sum():

>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))
>>> sum(tuples, ())
('hello', 'these', 'are', 'my', 'tuples!')

Which looks pretty nice. But why does this work? And, is this optimal, or is there something from itertools that would be preferable to this construct?

Petrarch answered 6/2, 2017 at 2:38 Comment(3)

Why shouldn't it work? It's just adding the tuples together, but it's not particularly efficient. Take a look at itertools.chain. Eg, tuple(chain(*tuples)) – Urano 6/2, 2017 at 2:57

@PM2Ring. Avoid using chain like that as it's even more inefficient than sum (unless the collection of tuples is very small). Use chain.from_iterable instead. – Chitter 6/2, 2017 at 3:16

@Chitter Oops! Yes, chain.from_iterable is better. And as Boud's answer shows, it's actually slower than sum for small collections of tuples. – Urano 6/2, 2017 at 3:22

the addition operator concatenates tuples in python:

('a', 'b')+('c', 'd')
Out[34]: ('a', 'b', 'c', 'd')

From the docstring of sum:

Return the sum of a 'start' value (default: 0) plus an iterable of numbers

It means sum doesn't start with the first element of your iterable, but rather with an initial value that is passed through start= argument.

By default sum is used with numeric thus the default start value is 0. So summing an iterable of tuples requires to start with an empty tuple. () is an empty tuple:

type(())
Out[36]: tuple

Therefore the working concatenation.

As per performance, here is a comparison:

%timeit sum(tuples, ())
The slowest run took 9.40 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 285 ns per loop


%timeit tuple(it.chain.from_iterable(tuples))
The slowest run took 5.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 625 ns per loop

Now with t2 of a size 10000:

%timeit sum(t2, ())
10 loops, best of 3: 188 ms per loop

%timeit tuple(it.chain.from_iterable(t2))
1000 loops, best of 3: 526 µs per loop

So if your list of tuples is small, you don't bother. If it's medium size or larger, you should use itertools.

Unworthy answered 6/2, 2017 at 2:48 Comment(3)

Interesting timings. Which Python version did you use? – Urano 6/2, 2017 at 3:4

@PM2Ring 3.5 64bits – Unworthy 6/2, 2017 at 3:8

best of 3 => kindly refer to %timeit documentation in ipython – Unworthy 6/2, 2017 at 3:22

It works because addition is overloaded (on tuples) to return the concatenated tuple:

>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!')
('hello', 'these', 'are', 'my', 'tuples!')

That's basically what sum is doing, you give an initial value of an empty tuple and then add the tuples to that.

However this is generally a bad idea because addition of tuples creates a new tuple, so you create several intermediate tuples just to copy them into the concatenated tuple:

()
('hello',)
('hello', 'these', 'are')
('hello', 'these', 'are', 'my', 'tuples!')

That's an implementation that has quadratic runtime behavior. That quadratic runtime behavior can be avoided by avoiding the intermediate tuples.

>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))

Using nested generator expressions:

>>> tuple(tuple_item for tup in tuples for tuple_item in tup)
('hello', 'these', 'are', 'my', 'tuples!')

Or using a generator function:

def flatten(it):
    for seq in it:
        for item in seq:
            yield item


>>> tuple(flatten(tuples))
('hello', 'these', 'are', 'my', 'tuples!')

Or using itertools.chain.from_iterable:

>>> import itertools
>>> tuple(itertools.chain.from_iterable(tuples))
('hello', 'these', 'are', 'my', 'tuples!')

And if you're interested how these perform (using my simple_benchmark package):

import itertools
import simple_benchmark

def flatten(it):
    for seq in it:
        for item in seq:
            yield item

def sum_approach(tuples):
    return sum(tuples, ())

def generator_expression_approach(tuples):
    return tuple(tuple_item for tup in tuples for tuple_item in tup)

def generator_function_approach(tuples):
    return tuple(flatten(tuples))

def itertools_approach(tuples):
    return tuple(itertools.chain.from_iterable(tuples))

funcs = [sum_approach, generator_expression_approach, generator_function_approach, itertools_approach]
arguments = {(2**i): tuple((1,) for i in range(1, 2**i)) for i in range(1, 13)}
b = simple_benchmark.benchmark(funcs, arguments, argument_name='number of tuples to concatenate')

b.plot()

(Python 3.7.2 64bit, Windows 10 64bit)

So while the sum approach is very fast if you concatenate only a few tuples it will be really slow if you try to concatenate lots of tuples. The fastest of the tested approaches for many tuples is itertools.chain.from_iterable

Meningitis answered 26/1, 2019 at 16:18 Comment(0)

That's clever and I had to laugh because help expressly forbids strings, which are also immutable, but it works

sum(...)
    sum(iterable[, start]) -> value
    
    Return the sum of an iterable of numbers (NOT strings) plus the value
    of parameter 'start' (which defaults to 0).  When the iterable is
    empty, return start.

You can add tuples to get a new, bigger tuple. And since you gave a tuple as a start value, the addition works.

Browne answered 6/2, 2017 at 2:49 Comment(6)

In this example, sum hasn't summed strings: there are no two strings that were separate in the input that have been joined here. (E.g. there's no way to turn hello and world into helloworld using sum.) – Silvanasilvano 6/2, 2017 at 5:3

IMO what Python does is just stupid. Sum should be able to sum anything that supports the + operator. Strings do. Explicitly disallowing that particular case for strings in the name of performance and good convention (while python has plenty of other antipatterns not disallowed) is just not good design – Termination 6/2, 2017 at 6:53

@Silvanasilvano i am quite aware of that. The help mentions strings but I went on to state that this is really adding tuples. I just thought it was amusing and assumed people can read. – Browne 6/2, 2017 at 7:5

@progo - I'm not sure why its forbidden, but I agree, sum should do what plus does. Maybe its to catch the common error where strings are mistaken for ints. But still... – Browne 6/2, 2017 at 7:8

For the strings part, see https://mcmap.net/q/322152/-python-sum-why-not-strings-closed. It's mainly efficiency. – Inflated 6/2, 2017 at 9:32

"help expressly forbids strings, but it works" This can be misunderstood as "it also works for strings", which is not true. Also, how does this answer the question? The quoted help does not even mention that the values or start can be anything else but numbers, let alone tuples. – Harald 6/2, 2017 at 9:55

Just to complement the accepted answer with some more benchmarks:

import functools, operator, itertools
import numpy as np
N = 10000
M = 2

ll = tuple(tuple(x) for x in np.random.random((N, M)).tolist())

%timeit functools.reduce(operator.add, ll)
# 407 ms ± 5.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit functools.reduce(lambda x, y: x + y, ll)
# 425 ms ± 7.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit sum(ll, ())
# 426 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit tuple(itertools.chain(*ll))
# 601 µs ± 5.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit tuple(itertools.chain.from_iterable(ll))
# 546 µs ± 25.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

EDIT: the code is updated to actually use tuples. And, as per comments, the last two options are now inside a tuple() constructors, and all the times have been updated (for consistency). The itertools.chain* options are still the fastest but now the margin is reduced.

Synchronous answered 21/9, 2017 at 23:45 Comment(1)

Your last two timings are not representative. The itertools.chain and itertools.chain.from_iterable return iterators. For fair timings you need to iterate these using tuple(itertools.chain...). – Meningitis 26/1, 2019 at 16:37

The second argument start, where you put (), is the starting object to add to, it's 0 in default for number addition.

Here is a sample implementation of sum (what I expect):

def sum(iterable, /, start=0):
    for element in iterable:
        start += element
    return start

Example:

>>> sum([1, 2, 3])
6
>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))
>>> sum(tuples)
TypeError: unsupported operand type(s) for +=: 'int' and 'tuple'
>>> sum(tuples, ())
('hello', 'these', 'are', 'my', 'tuples!')
>>>

It works since tuple concatenation with + is supported.

Virtually this gets translated to:

>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!')
('hello', 'these', 'are', 'my', 'tuples!')
>>>

Cattier answered 27/10, 2021 at 4:20 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags