Which is more pythonic in a for loop: zip or enumerate?
Asked Answered
S

6

9

Which one of these is considered the more pythonic, taking into account scalability and readability? Using enumerate:

group = ['A','B','C']
tag = ['a','b','c']

for idx, x in enumerate(group):
    print(x, tag[idx])

or using zip:

for x, y in zip(group, tag):
    print(x, y)

The reason I ask is that I have been using a mix of both. I should keep to one standard approach, but which should it be?

Soni answered 27/11, 2015 at 10:52 Comment(3)
zip is designed exactly for tasks like this. Your task is to iterate over each pair, not to iterate over numbers. And Python syntax allows you exactly that.Domiciliate
Yes. zip is more pythonic.Canfield
Or you could even use map(lambda x, y:sys.stdout.write(x+" "+y+"\n"),group,tag) provided the lists are of same length.Ferry
C
13

No doubt, zip is more pythonic. It doesn't require that you use a variable to store an index (which you don't otherwise need), and using it allows handling the lists uniformly, while with enumerate, you iterate over one list, and index the other list, i.e. non-uniform handling.

However, you should be aware of the caveat that zip runs only up to the shorter of the two lists. To avoid duplicating someone else's answer I'd just include a reference here: someone else's answer.

@user3100115 aptly points out that in python2, you should prefer using itertools.izip over zip, due its lazy nature (faster and more memory efficient). In python3 zip already behaves like py2's izip.

Canfield answered 27/11, 2015 at 11:6 Comment(0)
S
9

While others have pointed out that zip is in fact more pythonic than enumerate, I came here to see if it was any more efficient. According to my tests, zip is around 10 to 20% faster than enumerate when simply accessing and using items from multiple lists in parallel.

Here I have three lists of (the same) increasing length being accessed in parallel. When the lists are more than a couple of items in length, the time ratio of zip/enumerate is below zero and zip is faster.

Graphed in R-Studio

Code I used:

import timeit

setup = \
"""
import random
size = {}
a = [ random.randint(0,i+1) for i in range(size) ]
b = [ random.random()*i for i in range(size) ]
c = [ random.random()+i for i in range(size) ]
"""
code_zip = \
"""
data = []
for x,y,z in zip(a,b,c):
    data.append(x+z+y)
"""
code_enum = \
"""
data = []
for i,x in enumerate(a):
    data.append(x+c[i]+b[i])
"""
runs = 10000
sizes = [ 2**i for i in range(16) ]
data = []

for size in sizes:
    formatted_setup = setup.format(size)
    time_zip = timeit.timeit(code_zip, formatted_setup, number=runs)
    time_enum = timeit.timeit(code_enum, formatted_setup, number=runs)
    ratio = time_zip/time_enum
    row = (size,time_zip,time_enum,ratio)
    data.append(row)

with open("testzipspeed.csv", 'w') as csv_file:
    csv_file.write("size,time_zip,time_enumerate,ratio\n")

    for row in data:
        csv_file.write(",".join([ str(i) for i in row ])+"\n")
Strabismus answered 19/9, 2018 at 21:26 Comment(0)
D
3

The answer to the question asked in your title, "Which is more pythonic; zip or enumerate...?" is: they both are. enumerate is just a special case of zip.

The answer to your more specific question about that for loop is: use zip, but not for the reasons you've seen so far.

The biggest advantage of zip in that loop has nothing to do with zip itself. It has to do with avoiding the assumptions made in your enumerate loop. To explain, I'll make two different generators based on your two examples:

def process_items_and_tags(items, tags):
    "Do something with two iterables: items and tags."
    for item, tag in zip(items, tag):
        yield process(item, tag)

def process_items_and_list_of_tags(items, tags_list):
    "Do something with an iterable of items and an indexable collection of tags."
    for idx, item in enumerate(items):
        yield process(item, tags_list[idx])

Both generators can take any iterable as their first argument (items), but they differ in how they handle their second argument. The enumerate-based approach can only process tags in a list-like collection with [] indexing. That rules out a huge number of iterables, like file streams and generators, for no good reason.

Why is one parameter more tightly constrained than the other? The restriction isn't inherent in the problem the user is trying to solve, since the generator could just as easily have been written the other way 'round:

def process_list_of_items_and_tags(items_list, tags):
    "Do something with an indexable collection of items and an iterable of tags."
    for idx, tag in enumerate(tags):
        yield process(items[idx], tag)

Same result, different restriction on the inputs. Why should your caller have to know or care about any of that?

As an added penalty, anything of the form some_list[some_index] could raise an IndexError, which you would have to either catch or prevent in some way. That's not normally a problem when your loop both enumerates and accesses the same list-like collection, but here you're enumerating one and then accessing items from another. You'd have to add more code to handle an error that could not have happened in the zip-based version.

Avoiding the unnecessary idx variable is also nice, but hardly the deciding difference between the two approaches.

For more on the subject of iterables, generators, and functions that use them, see Ned Batchelder's PyCon US 2013 talk, "Loop Like a Native" (text, 30-minute video).

Doublejointed answered 27/11, 2015 at 12:16 Comment(2)
The other answer mentions this: "iterate over one list, and index the other list". But it's nice to highlight the difference.Ulmaceous
@arekolek: True, it's mentioned in the accepted answer, but more as an aesthetic consideration. That answer doesn't say that the [] notation restricts the type of the iterable used, or that indexing introduces a new failure mode.Doublejointed
F
0

zip is more pythonic as said where you don't require another variable while you could also use

from collections import deque
deque(map(lambda x, y:sys.stdout.write(x+" "+y+"\n"),group,tag),maxlen=0)

Since we are printing output here a the list of None values need to be rectified and also provided your lists are of same length.

Update : Well in this case it may not be as good because you are printing group and tag values and it generates a list of None values because of sys.stdout.write but practically if you needed to fetch values it would be better.

Ferry answered 27/11, 2015 at 11:19 Comment(7)
How is that simpler than for x, y in zip(group, tag): print(x, y)? Also, see https://mcmap.net/q/75357/-non-lazy-evaluation-version-of-map-in-python3.Ulmaceous
Also Is it Pythonic to use list comprehensions for just side effects?. Hint: this question also asks what is more Pythonic.Ulmaceous
@Ulmaceous This is a native c implementation of zip https://hg.python.org/cpython/file/57c157be847f/Python/bltinmodule.c which may involve more iterations than a map I guess.Ferry
I've measured time of both solutions in ipython using %timeit on two lists with 1 million random letters each. Yours was 3 times faster, but only because it used write instead of print. When using write in both, there's no difference whatsoever.Ulmaceous
BTW, you could make this more concise by writing _ = deque(map(print, group, tag), 0). But it's still not more Pythonic, faster, simpler, readable or better in any other sense.Ulmaceous
@Ulmaceous agreed . Thanks for correcting. Though its faster its not better.Ferry
No, it's not faster. Only write seems to be faster than print. Your solution seemed to be about map instead of zip, not write instead of print.Ulmaceous
A
0

zip might be more Pythonic, but it has a gotcha. If you want to change elements in place, you need to use indexing. Iterating over the elements will not work. For example:

x = [1,2,3]
for elem in x:
    elem *= 10
print(x)

Output: [1,2,3]

y = [1,2,3]
for index in range(len(y)):
    y[i] *= 10
print(y)

Output: [10,20,30]

Aliment answered 7/10, 2019 at 16:5 Comment(0)
B
0

This is a trivial starting question. I think range(len([list])) isn´t pythonic trying a non pythonist solution.

Thinking about it and reading excelent python documentation, I really like docs as numpy format style in simple pythonic code, that enumerate is a solution for iterables if you need a for loop because make an iterable is a comprehensive form.

list_a = ['a', 'b', 'c']; 
list_2 = ['1', '2', '3',]

[print(a) for a in lista]

is for exec the printable line and perhaps better is a generator,

item = genetator_item = (print(i, a) for i, a in enumerate(lista) if a.find('a') == 0)
next(item)

for multiline for and more complex for loops, we can use the enumerate(zip(.

for i, (arg1, arg2) i in enumerate(zip(list_a, list_2)):
    print('multiline')  # do complex code

but perhaps in extended pythonic code we can use anotrher complex format with itertools, note idx at the end for len(list_a[:]) slice

from itertools import count as idx
for arg1, arg2, i in zip(list_a, list_2, idx(start=1)):
    print(f'multiline {i}: {arg1}, {arg2}')  # do complex code
Bedspring answered 7/4, 2022 at 9:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.