Any function contains a yield statement will return a generator object
This is correct. The return value of a function containing a yield
is a generator object. The generator object is an iterator, where each iteration returns a value that was yield
ed from the code backing the generator.
A generator object is a stack contains state
A generator object contains a pointer to a the current execution frame, along with a whole bunch of other stuff used to maintain the state of the generator. The execution frame is what contains the call stack for the code in the generator.
Each time I call .next
method Python extracts the function's state and
when it finds another yield statement it'll bind the state again and
deletes the prior state
Sort of. When you call next(gen_object)
, Python evaluates the current execution frame:
gen_send_ex(PyGenObject *gen, PyObject *arg, int exc) { // This is called when you call next(gen_object)
PyFrameObject *f = gen->gi_frame;
...
gen->gi_running = 1;
result = PyEval_EvalFrameEx(f, exc); // This evaluates the current frame
gen->gi_running = 0;
PyEval_EvalFrame
is highest-level function used to interpret Python bytecode:
PyObject PyEval_EvalFrameEx(PyFrameObject f, int throwflag)
This is the main, unvarnished function of Python interpretation. It is
literally 2000 lines long. The code object associated with the
execution frame f is executed, interpreting bytecode and executing
calls as needed. The additional throwflag parameter can mostly be
ignored - if true, then it causes an exception to immediately be
thrown; this is used for the throw() methods of generator objects.
It knows that when it hits a yield
while evaluating the bytecode, it should return the value being yielded to the caller:
TARGET(YIELD_VALUE) {
retval = POP();
f->f_stacktop = stack_pointer;
why = WHY_YIELD;
goto fast_yield;
}
When you yield, the current value of the frame's value stack is maintained (via f->f_stacktop = stack_pointer
), so that we can resume where we left off when next
is called again. All non-generator functions set f_stacktop
to NULL
after they're done evaluating. So when you call next
again on the generator object, PyEval_ExvalFrameEx
is called again, using the same frame pointer as before. The pointer's state will be exactly the same as it was when it yielded during the previous, so execution will continue on from that point. Essentially the current state of the frame is "frozen". This is described in the PEP that introduced generators:
If a yield statement is encountered, the state of the function is
frozen, and the value [yielded] is returned to .next()'s
caller. By "frozen" we mean that all local state is retained,
including the current bindings of local variables, the instruction
pointer, and the internal evaluation stack: enough information is
saved so that the next time .next() is invoked, the function can
proceed exactly as if the yield statement were just another external
call.
Here is most of the state a generator object maintains (taken directly from its header file):
typedef struct {
PyObject_HEAD
/* The gi_ prefix is intended to remind of generator-iterator. */
/* Note: gi_frame can be NULL if the generator is "finished" */
struct _frame *gi_frame;
/* True if generator is being executed. */
char gi_running;
/* The code object backing the generator */
PyObject *gi_code;
/* List of weak reference. */
PyObject *gi_weakreflist;
/* Name of the generator. */
PyObject *gi_name;
/* Qualified name of the generator. */
PyObject *gi_qualname;
} PyGenObject;
gi_frame
is the pointer to the current execution frame.
Note that all of this is CPython implementation-specific. PyPy/Jython/etc. could very well be implementing generators in a completely different way. I encourage you to read through the source for generator objects to learn more about CPython's implementation.