yield statement implementation
Asked Answered
M

3

11

I want to know everything about the yield statement, in an easy to understand form.

I have read about the yield statement and its ease when implementing the iterator pattern. However, most of it is very dry. I would like to get under the covers and see how Microsoft handles return yield.

Also, when do you use yield break?

Modulate answered 12/4, 2009 at 21:50 Comment(0)
W
16

yield works by building a state machine internally. It stores the current state of the routine when it exits and resumes from that state next time.

You can use Reflector to see how it's implemented by the compiler.

yield break is used when you want to stop returning results. If you don't have a yield break, the compiler would assume one at the end of the function (just like a return; statement in a normal function)

Wenn answered 12/4, 2009 at 21:53 Comment(3)
what does it mean "current state of the routine" : processor register values, frame pointer etc ?Marijuana
take a look coroutinesLoran
@Tcraft Microsoft's canonical implementation does not use different stacks/segmented stacks/etc. They use a heap-allocated object to store the state.Wenn
C
11

As Mehrdad says, it builds a state machine.

As well as using Reflector (another excellent suggestion) you might find my article on iterator block implementation useful. It would be relatively simple if it weren't for finally blocks - but they introduce a whole extra dimension of complexity!

Cinchonidine answered 12/4, 2009 at 21:56 Comment(0)
C
9

Let's rewind a little bit: the yield keyword is translated as many others said to a state machine.

Actually this is not exactly like using a built-in implementation that would be used behind the scenes but rather the compiler rewriting the yield related code to a state machine by implementing of one the relevant interfaces (the return type of the method containing the yield keywords).

A (finite) state machine is just a piece of code that depending on where you are in the code (depending on the previous state, input) goes to another state action, and this is pretty much what is happening when you are using and yield with method return type of IEnumerator<T> / IEnumerator. The yield keyword is what going to create another action to move to the next state from the previous one, hence the state management is created in the MoveNext() implementation.

This is what exactly the C# compiler / Roslyn is going to do: check the presence of a yield keyword plus the kind of return type of the containing method, whether it's a IEnumerator<T>, IEnumerable<T>, IEnumerator or IEnumerable and then create a private class reflecting that method, integrating necessary variables and states.

If you are interested in the details of how the state machine and how the iterations are rewrited by by the compiler, you can check those links out on Github:

Trivia 1: the AsyncRewriter (used when you write async/await code also inherits from StateMachineRewriter since it also leverages a state machine behind.

As mentioned, the state machine is heavily reflected in the bool MoveNext() generated implementation in which there is a switch + sometimes some old fashioned goto based on a state field which represents the different paths of execution to different states in your method.

The code that is generated by the compiler from the user-code does not look that "good", mostly cause the compiler adds some weird prefixes and suffixes here and there

For example, the code:

public class TestClass 
{
    private int _iAmAHere = 0;
    
    public IEnumerator<int> DoSomething()
    {
        var start = 1;
        var stop = 42;
        var breakCondition = 34;
        var exceptionCondition = 41;
        var multiplier = 2;
        // Rest of the code... with some yield keywords somewhere below...

The variables and types related to that piece of code above will after compilation look like:

public class TestClass
{
    [CompilerGenerated]
    private sealed class <DoSomething>d__1 : IEnumerator<int>, IDisposable, IEnumerator
    {
        // Always present
        private int <>1__state;
        private int <>2__current;

        // Containing class
        public TestClass <>4__this;

        private int <start>5__1;
        private int <stop>5__2;
        private int <breakCondition>5__3;
        private int <exceptionCondition>5__4;
        private int <multiplier>5__5;

Regarding the state machine itself, let's take a look at a very simple example with a dummy branching for yielding some even / odd stuff.

public class Example
{
    public IEnumerator<string> DoSomething()
    {
        const int start = 1;
        const int stop = 42;

        for (var index = start; index < stop; index++)
        {
            yield return index % 2 == 0 ? "even" : "odd";
        }
    }
} 

Will be translated in the MoveNext as:

private bool MoveNext()
{
    switch (<>1__state)
    {
        default:
            return false;
        case 0:
            <>1__state = -1;
            <start>5__1 = 1;
            <stop>5__2 = 42;
            <index>5__3 = <start>5__1;
            break;
        case 1:
            <>1__state = -1;
            goto IL_0094;
        case 2:
            {
                <>1__state = -1;
                goto IL_0094;
            }
            IL_0094:
            <index>5__3++;
            break;
    }
    if (<index>5__3 < <stop>5__2)
    {
        if (<index>5__3 % 2 == 0)
        {
            <>2__current = "even";
            <>1__state = 1;
            return true;
        }
        <>2__current = "odd";
        <>1__state = 2;
        return true;
    }
    return false;
} 

As you can see this implementation is far from being straightforward but it does the job!

Trivia 2: What happens with the IEnumerable / IEnumerable<T> method return type?
Well, instead of just generating a class implementing the IEnumerator<T>, it will, generate a class that implement both IEnumerable<T> as well as the IEnumerator<T> so that the implementation of IEnumerator<T> GetEnumerator() will leverage the same generated class.

Warm reminder about the few interfaces that are implemented automatically when used a yield keyword:

public interface IEnumerable<out T> : IEnumerable
{
    new IEnumerator<T> GetEnumerator();
}

public interface IEnumerator<out T> : IDisposable, IEnumerator
{
    T Current { get; }
}

public interface IEnumerator
{
    bool MoveNext();

    object Current { get; }

    void Reset();
}

You can also check out this example with different paths / branching and the full implementation by the compiler rewriting.

This has been created with SharpLab, you can play with that tool to try different yield related execution paths and see how the compiler will rewrite them as a state machine in the MoveNext implementation.

About the second part of the question, ie, yield break, it has been answered here

It specifies that an iterator has come to an end. You can think of yield break as a return statement which does not return a value.

Chastitychasuble answered 12/1, 2019 at 19:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.