Which is generally faster, a yield or an append?
Asked Answered
B

5

20

I am currently in a personal learning project where I read in an XML database. I find myself writing functions that gather data and I'm not sure what would be a fast way to return them.

Which is generally faster:

  1. yields, or
  2. several append()s within the function then return the ensuing list?

I would be happy to know in what situations where yields would be faster than append()s or vice-versa.

Brockway answered 15/8, 2010 at 14:34 Comment(0)
S
25

yield has the huge advantage of being lazy and speed is usually not the best reason to use it. But if it works in your context, then there is no reason not to use it:

# yield_vs_append.py
data = range(1000)

def yielding():
    def yielder():
        for d in data:
            yield d
    return list(yielder())

def appending():
    lst = []
    for d in data:
        lst.append(d)
    return lst

This is the result:

python2.7 -m timeit -s "from yield_vs_append import yielding,appending" "yielding()"
10000 loops, best of 3: 80.1 usec per loop

python2.7 -m timeit -s "from yield_vs_append import yielding,appending" "appending()"
10000 loops, best of 3: 130 usec per loop

At least in this very simple test, yield is faster than append.

Sarabia answered 15/8, 2010 at 14:46 Comment(3)
Does lazy mean low memory requirement?Brockway
I wrote a compressor/decompressor for the WKdm algorithm. After profiling one function that unpacks the bits into a list was the slowest. I converted it to a generator and it was even slower. The yield version provided about 22MB/s, and the append version provided about 38MB/s. So it really depends on what you are doing.Beffrey
lst.append lookup might slow down the appending(). You could try it with append = lst.append outside of the loop.Flavoring
T
10

I recently asked myself a similar question exploring ways of generating all permutations of a list (or tuple) either via appending to a list or via a generator, and found (for permutations of length 9, which take about a second or so to generate):

  • The naive approach (permutations are lists, append to list, return list of lists) takes about three times the time of itertools.permutations
  • Using a generator (ie yield) reduces this by approx. 20 %
  • Using a generator and generating tuples is the fastest, about twice the time of itertools.permutations .

Take with a grain of salt! Timing and profiling was very useful:

if __name__ == '__main__':
    import cProfile
    cProfile.run("main()")
Thitherto answered 15/8, 2010 at 14:49 Comment(0)
E
6

There is a even faster alternative to TH4Ck's yielding(). It is list comprehension.

In [245]: def list_comp():
   .....:     return [d for d in data]
   .....:

In [246]: timeit yielding()
10000 loops, best of 3: 89 us per loop

In [247]: timeit list_comp()
10000 loops, best of 3: 63.4 us per loop

Of course it is rather silly to micro-benchmark these operations without knowing the structure of your code. Each of them are useful in difference situation. For example list comprehension is useful if you want to apply a simple operation that can be express as an single expression. Yield has a significant advantage for you to isolate the traversal code into a generator method. Which one is appropriate depends a lot on the usage.

Earthiness answered 15/8, 2010 at 16:26 Comment(1)
I actually wanted to include list comprehensions, but I'm choosing between these two: [n for n in func_that_yields()] or [n for n in func_that_returns_an_iterable()]. Note that n can be a simple element unpack, or a complex element-by-element operation. Anyway, good point you have in there :)Brockway
C
0

Primally u must decide, if u need generator, this also got improved method. Like list generator "[elem for elem in somethink]". And generators be recommended if u just use value in list for some operations. But if u need list for many changes, and work with many elements at the same time, this must be list. (Like 70% times if standard programmer use list, better will be generator. use less memory, just many people just doesn't see other way of list. Unfortunately at our epoch, many people pee at good optymalization, and do just to work.)

If u use generator for list to improve return, let's do that same with yield guys. Anyway, we got multiple more optimized methods for all actions in Python programming language.

Yield is faster then return, and I'll prove this. Just check this guys:

data = range(1000)

def yielder():
    yield from data

def appending():
    L = []
    app = list.append
    for i in data:
        app(L, i)
    return L

def list_gen():
    return [i for i in data]

Of course appending will be slower then other ideas, becouse we create and extend list any loop time. Just loop "for" is very unoptymalized, if u can avoid this, do that. Becouse at any step this function load next element and write our variable, to got this object value in memory. So we jump at any element, create reference, extend list in loop (declared method is huge speed optymalizer), when we generate just return, summary got 2000 elements at two lists.

list_gen is less memorywise, we just return elements, but like up, we generate secound list. Now we got two lists, orginal data, and her copy. Summary 2000 elements. There just we avoid step with create reference to variable. Becouse our gen in lists avoid this step. Just write elements.

yielder use least of all memory, becouse we got just yielded value from data. We avoid one reference. For example:

data = range(1000)

def yielder():
    yield from data

def list_gen():
    return [i for i in data]

#Now we generate next reference after line [i for i in data]
for i in list_gen():
    #some instruction

#This is our first reference, becouse was yield from data.
for i in yielder():
    #some instruction

Use only one element to some instruction, not all from list, next one value yielder will return at next loop, not magazine all 1000 elements to write in reference.

Srry for little dig out topic, just when i accidentally came a cross from google search, other beginner python programmers can see this nonsense.

Conciliator answered 20/6, 2017 at 17:30 Comment(0)
R
0

A quick note on yield, it is not persistent. The lack of persistence is confusing to new - or tired - developers, which may make your code(ing) slower:

def count_to_five():
    for i in range(5):
        yield i + 1

five_count = count_to_five()
print(list(five_count)) # [1, 2, 3, 4, 5]

# What? I didn't do anything. I just printed the result again!
print(list(five_count)) # []

I realize this is not the "speed" you are curious about, but development speed is a metric to consider when using lists or generators. This is not a pathological example either; someone may use print statements to debug a result and not realize that they're changing the result in doing so.

Racing answered 1/3, 2024 at 23:15 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.