Why does IEumerator<T> affect the state of IEnumerable<T> even the enumerator never reached the end?
Asked Answered
T

1

11

I am curious why the following throws an error message (text reader closed exception) on the "last" assignment:

IEnumerable<string> textRows = File.ReadLines(sourceTextFileName);
IEnumerator<string> textEnumerator = textRows.GetEnumerator();

string first = textRows.First();
string last = textRows.Last();

However the following executes fine:

IEnumerable<string> textRows = File.ReadLines(sourceTextFileName);

string first = textRows.First();
string last = textRows.Last();

IEnumerator<string> textEnumerator = textRows.GetEnumerator();

What is the reason for the different behavior?

Timekeeper answered 26/12, 2012 at 10:53 Comment(8)
Actually, both codes crash on my machine...Willettewilley
@digEmAll, the second works fine for me, the first code breaks when I try to determine the last line in the text file.Timekeeper
@digEmAll: That's strange - the second code works fine for me, and I understand why it works fine. What problem are you seeing, and where?Praline
@JonSkeet, second code fails for me as well, with the same error and on the same line.Verret
@Andrei: Hmm. Which version of .NET are you using? Is this within a debugger?Praline
@JonSkeet, framework version is 4. Client profile if it matters. Error appears both with and without debugging.Verret
@Andrei: Odd. I'm using .NET 4.5, but I'm surprised to see a difference. Hmm.Praline
@JonSkeet: I'm using .net 4 as well (4.5 not installed on my PC), and looking at the decompiled code I'm not surprised that both do not work... maybe something has changed in 4.5 ...Willettewilley
P
12

You've discovered a bug in the framework, as far as I can tell. It's reasonably subtle, because of the interaction of a few things:

  • When you call ReadLines(), the file is actually opened. Personally, I think of this as a bug in itself; I'd expect and hope that it would be lazy - only opening the file when you try to start iterating over it.
  • When you call GetEnumerator() the first time on the return value of ReadLines, it will actually return the same reference.
  • When First() calls GetEnumerator(), it will create a clone. This will share the same StreamReader as textEnumerator
  • When First() disposes its clone, it will dispose of the StreamReader, and set its variable to null. This doesn't affect the variable within the original, which now refers to a disposed StreamReader
  • When Last() calls GetEnumerator(), it will create a clone of the original object, complete with disposes StreamReader. It then tries to read from that reader, and throws an exception.

Now compare this with your second version:

  • When First() calls GetEnumerator(), the original reference is returned, complete with open reader.
  • When First() then calls Dispose(), the reader will be disposed and the variable set to null
  • When Last() calls GetEnumerator(), a clone will be created - but because the value it's cloning has a null reference, a new StreamReader is created, so it's able to read the file with no problems. It then disposes of the clone, which closes the reader
  • When GetEnumerator() is called, a second clone of the original object, opening yet another StreamReader - again, no problems there.

So basically, the problem in the first snippet is that you're calling GetEnumerator() a second time (in First()) without having disposed of the first object.

Here's another example of the same problem:

using System;
using System.IO;
using System.Linq;

class Test
{
    static void Main()
    {
        var lines = File.ReadLines("test.txt");
        var query = from x in lines
                    from y in lines
                    select x + "/" + y;
        foreach (var line in query)
        {
            Console.WriteLine(line);
        }
    }
}

You could fix this by calling File.ReadLines twice - or by using a genuinely lazy implementation of ReadLines, like this:

using System.IO;
using System.Linq;

class Test
{
    static void Main()
    {
        var lines = ReadLines("test.txt");
        var query = from x in lines
                    from y in lines
                    select x + "/" + y;
        foreach (var line in query)
        {
            Console.WriteLine(line);
        }
    }

    static IEnumerable<string> ReadLines(string file)
    {
        using (var reader = File.OpenText(file))
        {
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                yield return line;
            }
        }
    }
}

In the latter code, a new StreamReader is opened each time GetEnumerator() is called - so the result is each pair of lines in test.txt.

Praline answered 26/12, 2012 at 11:11 Comment(18)
Thanks for the detailed explanation. Quite interesting, motivates me to dig a lot deeper under the hood as I did not know that First() or Last() disposes the internal iterator/enumerator.Timekeeper
@Freddy: Everything which uses an IEnumerator<T> should dispose of it.Praline
I see your point but in this case calling First() and Last() I wished it would not dispose of the underlying text reader. In that I am still a bit confused, on one side one has to manually dispose, on the other hand the object is disposed itself after calling First() or Last()...Timekeeper
@JonSkeet: I just had a look at .net 4.5 source code. They changed the yield-based code with an IEnumerator implementation trying to keep the same behaviour. Anyway, I've copied the code in a .net 4.0 project and the 2nd case works, while using .net 4 File.ReadLines it does not...Willettewilley
@Freddy: The problem isn't the disposal - it's that calling GetEnumerator should always create a new reader IMO.Praline
@JonSkeet, yes in that case everything would be clear. A new enumerator would create a new reader, while calling First(), Last() would not create nor dispose.Timekeeper
@Freddy: No, calling First() or Last() would create (via GetEnumerator()), then read, then dispose.Praline
@JonSkeet, ok got it, Last() or First() are methods after all, if they were properties or fields I would find it odd that a new reader was created. Would it not be much better to create an enumerator and hence a reader when an IEnumerable is instantiated, given the enumerator enumerates a stream? Then accessing the enumerator through Last() or First() or whatever would not interfere with the underlying stream.Timekeeper
@Freddy: Well it would have to interfere with the underlying stream - otherwise it couldn't get the data. But I would say that the enumerator enumerates the file, not a stream. This is the approach that LINQ takes all over the place - the query is just the representation of the query; no data is fetched (and GetEnumerator isn't called) until it's required. The whole problem here is that the reader is being created earlier than it should, IMO. If each call to GetEnumerator created a new reader, all would be well.Praline
@JonSkeet, fair points, though would not concurrent readers pointing to the same open file cause problems?Timekeeper
@Freddy: Nope - see the code at the end of my answer for an example which works fine.Praline
@JonSkeet, sorry about my wording, I was aware that it can work I guess I wanted to say that I would find a design where the reader sits with IEnumerable would make a lot more sense to me. The reader would only be created when GetEnumerator is called for the first time. Subsequent usage of IEnumerator would then not cause the creation of another reader. If I compared that with moving through a binaryStream just because I would set the stream position with Seek() does not mean I need to create a new binary reader or writer. Similarly with getting the first and last element of a IEnumerable...Timekeeper
...it's hard to comprehend for me why we would need several readers. At least I don't see the benefit because GetEnumerator() is sharing a relationship with IEnumerable. I am obviously aware the current design is different. Apologies for messing up the lingo at times I am a self taught programmer and that just since 1.5 years nowTimekeeper
@JonSkeet, thanks for the explanations, I can learn a lot from you.Timekeeper
@Freddy: No, each call to GetEnumerator() should create an independent iterator. So if you call GetEnumerator() multiple times, it should create multiple "cursors" through the data, whatever the data source is. That's why IEnumerator<T> exists in the first place.Praline
@JonSkeet, I see that but I have a hard time equating a text reader with a simple construct such as a "cursor"/iterator. To set up a comparison, a cursor/iterator for me is like a bookmark, a book is a data source. Why cloning/duplicating the books when a book can be read by one entity at a time anyway (due to I/O constraints here). Just because I tell the cursor/iterator to go somewhere why does that require setting up a new reader, makes very little to no sense to me.Timekeeper
@JonSkeet, yes I agree with your points but what I disagree with is why a new reader is created for each new enumerator. "Whatever the data source" -> Fully concur, then why having to create a full reading device (reader) when just trying to iterate over the same data source. But it gets into a discussion, it remains my opinion that one and exactly one reader should be created regardless of how many Iterators I "get" , how it could be implemented and whether it will eliminate all the pitfalls, one of which I fell into, I dont know. Thanks again for your deep analysis and code. Happy New Year.Timekeeper
@Freddy: To conform with normal expected behavior, something like File.ReadLines() should either return an object that creates a new reader with each call to GetEnumerator(), or else it should create an object whose enumerators have the logic to share a reader in thread-safe fashion and clean it up when everyone is done with it. The former approach is much simpler.Hastate

© 2022 - 2024 — McMap. All rights reserved.