What can you use generator functions for?
Asked Answered
L

16

240

I'm starting to learn Python and I've come across generator functions, those that have a yield statement in them. I want to know what types of problems that these functions are really good at solving.

Levey answered 19/9, 2008 at 14:58 Comment(3)
maybe a better question would be when we should not use 'emKnorring
Real world example hereInstil
Well you can use it to generate fibonacci number , process large data or log files, piping multiple generator and handling multiple task .Ez
E
270

Generators give you lazy evaluation. You use them by iterating over them, either explicitly with 'for' or implicitly by passing it to any function or construct that iterates. You can think of generators as returning multiple items, as if they return a list, but instead of returning them all at once they return them one-by-one, and the generator function is paused until the next item is requested.

Generators are good for calculating large sets of results (in particular calculations involving loops themselves) where you don't know if you are going to need all results, or where you don't want to allocate the memory for all results at the same time. Or for situations where the generator uses another generator, or consumes some other resource, and it's more convenient if that happened as late as possible.

Another use for generators (that is really the same) is to replace callbacks with iteration. In some situations you want a function to do a lot of work and occasionally report back to the caller. Traditionally you'd use a callback function for this. You pass this callback to the work-function and it would periodically call this callback. The generator approach is that the work-function (now a generator) knows nothing about the callback, and merely yields whenever it wants to report something. The caller, instead of writing a separate callback and passing that to the work-function, does all the reporting work in a little 'for' loop around the generator.

For example, say you wrote a 'filesystem search' program. You could perform the search in its entirety, collect the results and then display them one at a time. All of the results would have to be collected before you showed the first, and all of the results would be in memory at the same time. Or you could display the results while you find them, which would be more memory efficient and much friendlier towards the user. The latter could be done by passing the result-printing function to the filesystem-search function, or it could be done by just making the search function a generator and iterating over the result.

If you want to see an example of the latter two approaches, see os.path.walk() (the old filesystem-walking function with callback) and os.walk() (the new filesystem-walking generator.) Of course, if you really wanted to collect all results in a list, the generator approach is trivial to convert to the big-list approach:

big_list = list(the_generator)
Erlking answered 19/9, 2008 at 15:9 Comment(3)
Does a generator such as one that produces filesystem lists perform actions in parallel to the code that runs that generator in a loop? Ideally the computer would run the body of the loop (processing the last result) while concurrently doing whatever the generator must do to obtain the next value.Celloidin
@StevenLu: Unless it goes to the trouble to manually launch threads before the yield and join them after to get the next result, it does not execute in parallel (and no standard library generator does this; secretly launching threads is frowned upon). The generator pauses at each yield until the next value is requested. If the generator is wrapping I/O, the OS might be proactively caching data from the file on the assumption it will be requested shortly, but that's the OS, Python isn't involved.Alarcon
#19845601 An example of the callback vs generator design can be seen hereUsually
A
97

One of the reasons to use generator is to make the solution clearer for some kind of solutions.

The other is to treat results one at a time, avoiding building huge lists of results that you would process separated anyway.

If you have a fibonacci-up-to-n function like this:

# function version
def fibon(n):
    a = b = 1
    result = []
    for i in xrange(n):
        result.append(a)
        a, b = b, a + b
    return result

You can more easily write the function as this:

# generator version
def fibon(n):
    a = b = 1
    for i in xrange(n):
        yield a
        a, b = b, a + b

The function is clearer. And if you use the function like this:

for x in fibon(1000000):
    print x,

in this example, if using the generator version, the whole 1000000 item list won't be created at all, just one value at a time. That would not be the case when using the list version, where a list would be created first.

Amateurism answered 19/9, 2008 at 15:9 Comment(2)
and if you need a list, you can always do list(fibon(5))Daggerboard
I wanted to add that if you try running the function with n = 1,000,000, then your computer will have a very hard time. Running it with the generator is perfectly fine though.Kilowatthour
I
51

Real World Example

Let's say you have 100 million domains in your MySQL table, and you would like to update Alexa rank for each domain.

First thing you need is to select your domain names from the database.

Let's say your table name is domains and column name is domain.

If you use SELECT domain FROM domains it's going to return 100 million rows which is going to consume lot of memory. So your server might crash.

So you decided to run the program in batches. Let's say our batch size is 1000.

In our first batch we will query the first 1000 rows, check Alexa rank for each domain and update the database row.

In our second batch we will work on the next 1000 rows. In our third batch it will be from 2001 to 3000 and so on.

Now we need a generator function which generates our batches.

Here is our generator function:

def ResultGenerator(cursor, batchsize=1000):
    while True:
        results = cursor.fetchmany(batchsize)
        if not results:
            break
        for result in results:
            yield result

As you can see, our function keeps yielding the results. If you used the keyword return instead of yield, then the whole function would be ended once it reached return.

return - returns only once
yield - returns multiple times

If a function uses the keyword yield then it's a generator.

Now you can iterate like this:

db = MySQLdb.connect(host="localhost", user="root", passwd="root", db="domains")
cursor = db.cursor()
cursor.execute("SELECT domain FROM domains")
for result in ResultGenerator(cursor):
    doSomethingWith(result)
db.close()
Instil answered 7/5, 2014 at 23:20 Comment(1)
it would be more practical , if yield could be explained in terms of recursive/dyanmic programming !Cephalothorax
T
50

I find this explanation which clears my doubt. Because there is a possibility that person who don't know Generators also don't know about yield

Return

The return statement is where all the local variables are destroyed and the resulting value is given back (returned) to the caller. Should the same function be called some time later, the function will get a fresh new set of variables.

Yield

But what if the local variables aren't thrown away when we exit a function? This implies that we can resume the function where we left off. This is where the concept of generators are introduced and the yield statement resumes where the function left off.

  def generate_integers(N):
    for i in xrange(N):
    yield i

    In [1]: gen = generate_integers(3)
    In [2]: gen
    <generator object at 0x8117f90>
    In [3]: gen.next()
    0
    In [4]: gen.next()
    1
    In [5]: gen.next()

So that's the difference between return and yield statements in Python.

Yield statement is what makes a function a generator function.

So generators are a simple and powerful tool for creating iterators. They are written like regular functions, but they use the yield statement whenever they want to return data. Each time next() is called, the generator resumes where it left off (it remembers all the data values and which statement was last executed).

Tropho answered 18/1, 2013 at 8:17 Comment(0)
A
45

See the "Motivation" section in PEP 255.

A non-obvious use of generators is creating interruptible functions, which lets you do things like update UI or run several jobs "simultaneously" (interleaved, actually) while not using threads.

Afterglow answered 19/9, 2008 at 15:7 Comment(1)
The Motivation section is nice in that it has a specific example: "When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list ... For example, tokenize.py in the standard library takes this approach"Ailbert
A
28

Buffering. When it is efficient to fetch data in large chunks, but process it in small chunks, then a generator might help:

def bufferedFetch():
  while True:
     buffer = getBigChunkOfData()
     # insert some code to break on 'end of data'
     for i in buffer:    
          yield i

The above lets you easily separate buffering from processing. The consumer function can now just get the values one by one without worrying about buffering.

Alesha answered 19/9, 2008 at 15:14 Comment(2)
If getBigChuckOfData isn't lazy, then I don't understand what benefit yield has here. What is a use case for this function?Chloras
But the point is that, IIUC, bufferedFetch is lazyfying the call to getBigChunkOfData. If getBigChunkOfData was lazy already, then bufferedFetch would be useless. Each call to bufferedFetch() will return one buffer element, even though a BigChunk was already read in. And you don't need to explicitly keep count of the next element to return, because the mechanics of yield do just that implicitly.Palaeontology
B
22

I have found that generators are very helpful in cleaning up your code and by giving you a very unique way to encapsulate and modularize code. In a situation where you need something to constantly spit out values based on its own internal processing and when that something needs to be called from anywhere in your code (and not just within a loop or a block for example), generators are the feature to use.

An abstract example would be a Fibonacci number generator that does not live within a loop and when it is called from anywhere will always return the next number in the sequence:

def fib():
    first = 0
    second = 1
    yield first
    yield second

    while 1:
        next = first + second
        yield next
        first = second
        second = next

fibgen1 = fib()
fibgen2 = fib()

Now you have two Fibonacci number generator objects which you can call from anywhere in your code and they will always return ever larger Fibonacci numbers in sequence as follows:

>>> fibgen1.next(); fibgen1.next(); fibgen1.next(); fibgen1.next()
0
1
1
2
>>> fibgen2.next(); fibgen2.next()
0
1
>>> fibgen1.next(); fibgen1.next()
3
5

The lovely thing about generators is that they encapsulate state without having to go through the hoops of creating objects. One way of thinking about them is as "functions" which remember their internal state.

I got the Fibonacci example from Python Generators - What are they? and with a little imagination, you can come up with a lot of other situations where generators make for a great alternative to for loops and other traditional iteration constructs.

Backwoodsman answered 11/4, 2009 at 20:55 Comment(0)
H
21

The simple explanation: Consider a for statement

for item in iterable:
   do_stuff()

A lot of the time, all the items in iterable doesn't need to be there from the start, but can be generated on the fly as they're required. This can be a lot more efficient in both

  • space (you never need to store all the items simultaneously) and
  • time (the iteration may finish before all the items are needed).

Other times, you don't even know all the items ahead of time. For example:

for command in user_input():
   do_stuff_with(command)

You have no way of knowing all the user's commands beforehand, but you can use a nice loop like this if you have a generator handing you commands:

def user_input():
    while True:
        wait_for_command()
        cmd = get_command()
        yield cmd

With generators you can also have iteration over infinite sequences, which is of course not possible when iterating over containers.

Humor answered 19/9, 2008 at 15:15 Comment(2)
...and an infinite sequence could be one generated by repeatedly cycling over a small list, returning to the beginning after the end is reached. I use this for selecting colors in graphs, or producing busy throbbers or spinners in text.Modulation
@mataap: There's an itertool for that -- see cycles.Towns
O
13

My favorite uses are "filter" and "reduce" operations.

Let's say we're reading a file, and only want the lines which begin with "##".

def filter2sharps( aSequence ):
    for l in aSequence:
        if l.startswith("##"):
            yield l

We can then use the generator function in a proper loop

source= file( ... )
for line in filter2sharps( source.readlines() ):
    print line
source.close()

The reduce example is similar. Let's say we have a file where we need to locate blocks of <Location>...</Location> lines. [Not HTML tags, but lines that happen to look tag-like.]

def reduceLocation( aSequence ):
    keep= False
    block= None
    for line in aSequence:
        if line.startswith("</Location"):
            block.append( line )
            yield block
            block= None
            keep= False
        elif line.startsWith("<Location"):
            block= [ line ]
            keep= True
        elif keep:
            block.append( line )
        else:
            pass
    if block is not None:
        yield block # A partial block, icky

Again, we can use this generator in a proper for loop.

source = file( ... )
for b in reduceLocation( source.readlines() ):
    print b
source.close()

The idea is that a generator function allows us to filter or reduce a sequence, producing a another sequence one value at a time.

Outrange answered 19/9, 2008 at 15:13 Comment(4)
fileobj.readlines() would read the entire file to a list in memory, defeating the purpose of using generators. Since file objects are already iterable you can use for b in your_generator(fileobject): instead. That way your file will be read one line at a time, to avoid reading whole file.Amateurism
reduceLocation is pretty weird yield'ing a list, why not just yield each line? Also filter and reduce are builtins with expected behaviours (see help in ipython etc.), your usage of "reduce" is the same as filter.Gewirtz
Good point on the readlines(). I usually realize that files are first-class line iterators during unit testing.Outrange
Actually, the "reduction" is combining multiple individual lines into a composite object. Okay, it's a list, but it's still a reduction taken from the source.Outrange
P
11

A practical example where you could make use of a generator is if you have some kind of shape and you want to iterate over its corners, edges or whatever. For my own project (source code here) I had a rectangle:

class Rect():

    def __init__(self, x, y, width, height):
        self.l_top  = (x, y)
        self.r_top  = (x+width, y)
        self.r_bot  = (x+width, y+height)
        self.l_bot  = (x, y+height)

    def __iter__(self):
        yield self.l_top
        yield self.r_top
        yield self.r_bot
        yield self.l_bot

Now I can create a rectangle and loop over its corners:

myrect=Rect(50, 50, 100, 100)
for corner in myrect:
    print(corner)

Instead of __iter__ you could have a method iter_corners and call that with for corner in myrect.iter_corners(). It's just more elegant to use __iter__ since then we can use the class instance name directly in the for expression.

Peterec answered 27/9, 2014 at 12:40 Comment(1)
I adored the idea of passing similar class fields as a generatorNascent
M
9

Basically avoiding call-back functions when iterating over input maintaining state.

See here and here for an overview of what can be done using generators.

Metonym answered 19/9, 2008 at 15:9 Comment(0)
A
7

Since the send method of a generator has not been mentioned, here is an example:

def test():
    for i in xrange(5):
        val = yield
        print(val)

t = test()

# Proceed to 'yield' statement
next(t)

# Send value to yield
t.send(1)
t.send('2')
t.send([3])

It shows the possibility to send a value to a running generator. A more advanced course on generators in the video below (including yield from explination, generators for parallel processing, escaping the recursion limit, etc.)

David Beazley on generators at PyCon 2014

Appendicitis answered 28/4, 2014 at 7:21 Comment(0)
A
5

Some good answers here, however, I'd also recommend a complete read of the Python Functional Programming tutorial which helps explain some of the more potent use-cases of generators.

Arborization answered 16/3, 2015 at 14:17 Comment(0)
E
2

Piles of stuff. Any time you want to generate a sequence of items, but don't want to have to 'materialize' them all into a list at once. For example, you could have a simple generator that returns prime numbers:

def primes():
    primes_found = set()
    primes_found.add(2)
    yield 2
    for i in itertools.count(1):
        candidate = i * 2 + 1
        if not all(candidate % prime for prime in primes_found):
            primes_found.add(candidate)
            yield candidate

You could then use that to generate the products of subsequent primes:

def prime_products():
    primeiter = primes()
    prev = primeiter.next()
    for prime in primeiter:
        yield prime * prev
        prev = prime

These are fairly trivial examples, but you can see how it can be useful for processing large (potentially infinite!) datasets without generating them in advance, which is only one of the more obvious uses.

Eskimoaleut answered 19/9, 2008 at 15:14 Comment(3)
if not any(candidate % prime for prime in primes_found) should be if all(candidate % prime for prime in primes_found)Korwin
Yes, I meant to write "if not any(candidate % prime == 0 for prime in primes_found). Yours is slightly neater, though. :)Eskimoaleut
I guess you forgot to delete the 'not' from if not all(candidate % prime for prime in primes_found)Rusty
E
2

I use generators when our web server is acting as a proxy:

  1. The client requests a proxied url from the server
  2. The server begins to load the target url
  3. The server yields to return the results to the client as soon as it gets them
Evoke answered 19/9, 2008 at 15:17 Comment(0)
S
0

Also good for printing the prime numbers up to n:

def genprime(n=10):
    for num in range(3, n+1):
        for factor in range(2, num):
            if num%factor == 0:
                break
        else:
            yield(num)

for prime_num in genprime(100):
    print(prime_num)
Shrift answered 22/9, 2017 at 14:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.