how generators work in python
Asked Answered
C

1

12

I am novice in Python and programming. Generators are a bit too complicated to understand for new programmers. Here's my theory on generator functions in Python:

  1. Any function contains a yield statement will return a generator object

  2. A generator object is a stack contains state

  3. Each time I call .next method Python extracts the function's state and when it finds another yield statement it'll bind the state again and deletes the prior state:

Example:

 [ 
  [state1] # Stack contains states and states contain info about the function
  [state2] # State1 will be deleted when python finds the other yield? 
 ] 

This is of course might be like the stupidest theory on earth, but forgive me I am just new in the coding word.

My Questions:

  1. What Python internally makes to store the states ?

  2. Does yield statement adds a state to a stack if it exists ?

  3. What yield creates internally ? I understand yield creates a generator object, however, I wonder what generator objects contain that makes them work ? are they just a stack/list of states and we you use .next method to extract each state and Python will call the function with the indexed state automatically for instance ?

Cohobate answered 10/8, 2014 at 19:43 Comment(9)
Since this question seems about the internals, please see this question if you want to know what generators are and how to use them on a user level.Eagre
Also, #8390312Schoolmistress
Some useful reading: The PEP that introduced generators and the source for generator objects.Chelonian
@Schoolmistress IMO the question you marked as a dupe does not answer the OP's question. The OP is asking about the internals of generators, which that question really doesn't cover. Actually, I think the other question you provided a link to in the comments is closer to an accurate dupe (jsbueno's answer in particular).Chelonian
@dano: the OP says they have trouble understanding generators in general. The dupe question addresses exactly this issue.Schoolmistress
@Schoolmistress He does say that, but his specific questions are about what Python does internally (where is it storing the generator's state, what happens to the stack, etc). None of that is covered in the question you duped. It's odd, really; those aren't the kind of questions I'd expect a novice to be asking, but that's what he wants to know about.Chelonian
The implementation of a generator could vary from implementation to implementation; the source code for a particular implementation would be a good place to start. As is, the question is quite broad.Fissirostral
@dano: well, someone reopened the question, so if you have anything useful to contribute, feel free to post.Schoolmistress
I reopened the question after someone else voted to open it, and since I agreed that this was not a duplicated of the linked question. My whole point in my initial comment here was that this is not a duplicate of that question but is about the internals of generators instead. And that topic isn’t covered there at all.Eagre
C
23

Any function contains a yield statement will return a generator object

This is correct. The return value of a function containing a yield is a generator object. The generator object is an iterator, where each iteration returns a value that was yielded from the code backing the generator.

A generator object is a stack contains state

A generator object contains a pointer to a the current execution frame, along with a whole bunch of other stuff used to maintain the state of the generator. The execution frame is what contains the call stack for the code in the generator.

Each time I call .next method Python extracts the function's state and when it finds another yield statement it'll bind the state again and deletes the prior state

Sort of. When you call next(gen_object), Python evaluates the current execution frame:

gen_send_ex(PyGenObject *gen, PyObject *arg, int exc) {  // This is called when you call next(gen_object)
    PyFrameObject *f = gen->gi_frame;
    ...
    gen->gi_running = 1;
    result = PyEval_EvalFrameEx(f, exc);  // This evaluates the current frame
    gen->gi_running = 0; 

PyEval_EvalFrame is highest-level function used to interpret Python bytecode:

PyObject PyEval_EvalFrameEx(PyFrameObject f, int throwflag)

This is the main, unvarnished function of Python interpretation. It is literally 2000 lines long. The code object associated with the execution frame f is executed, interpreting bytecode and executing calls as needed. The additional throwflag parameter can mostly be ignored - if true, then it causes an exception to immediately be thrown; this is used for the throw() methods of generator objects.

It knows that when it hits a yield while evaluating the bytecode, it should return the value being yielded to the caller:

TARGET(YIELD_VALUE) {
    retval = POP();
    f->f_stacktop = stack_pointer;
    why = WHY_YIELD;
    goto fast_yield;
}

When you yield, the current value of the frame's value stack is maintained (via f->f_stacktop = stack_pointer), so that we can resume where we left off when next is called again. All non-generator functions set f_stacktop to NULL after they're done evaluating. So when you call next again on the generator object, PyEval_ExvalFrameEx is called again, using the same frame pointer as before. The pointer's state will be exactly the same as it was when it yielded during the previous, so execution will continue on from that point. Essentially the current state of the frame is "frozen". This is described in the PEP that introduced generators:

If a yield statement is encountered, the state of the function is frozen, and the value [yielded] is returned to .next()'s caller. By "frozen" we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time .next() is invoked, the function can proceed exactly as if the yield statement were just another external call.

Here is most of the state a generator object maintains (taken directly from its header file):

typedef struct {
    PyObject_HEAD
    /* The gi_ prefix is intended to remind of generator-iterator. */

    /* Note: gi_frame can be NULL if the generator is "finished" */
    struct _frame *gi_frame;

    /* True if generator is being executed. */
    char gi_running;

    /* The code object backing the generator */
    PyObject *gi_code;

    /* List of weak reference. */
    PyObject *gi_weakreflist;

    /* Name of the generator. */
    PyObject *gi_name;

    /* Qualified name of the generator. */
    PyObject *gi_qualname;
} PyGenObject;

gi_frame is the pointer to the current execution frame.

Note that all of this is CPython implementation-specific. PyPy/Jython/etc. could very well be implementing generators in a completely different way. I encourage you to read through the source for generator objects to learn more about CPython's implementation.

Chelonian answered 10/8, 2014 at 21:14 Comment(2)
That's a very interesting answer. I would love to know the process you went through to endup with such a detailed answer. Any tools/resources/methodology you could suggest for source code inspection?Handmade
@Handmade I believe I just spent a couple of hours reading through the Python source, documentation, and PEPs. I can't remember if I checked out the source code locally and used vim to go through it, or just used the web-viewer of the old Mercurial repo. I think it was the web viewer. A good IDE probably would have made it easier to jump through the source, though. I don't really have any special tricks, I just searched the code until I found the stuff that was related to generators, and then read/re-read, and consulted the docs until I could understand what it was doing.Chelonian

© 2022 - 2024 — McMap. All rights reserved.