TextReader.Peek behaviour and detecting end of stream/reader
Asked Answered
I

4

9

When I'm using a Text reader what is the best way to detect that I am actually at the end of my data? The usual way to do this is something like the following,

    while(reader.Peek() != -1)
    {
       ///do stuff
    }

However the msdn documentation here states the following

An integer representing the next character to be read, or -1 if no more characters are available or the reader does not support seeking.

So my question is how do you tell if you are really at the end of the readers data or the reader/underlying stream simply doesn't support seeking as the return value here seems to be ambiguous? if for example I have the following

    public void Parse(TextReader reader)
    {
         while(reader.Peek() != -1) //am I really at the end
         {
            //do stuff
         }
    }

    Parse(new StreamReader(new NetworkStream(....)));

as networkstream does not support seeking.

Or have I missed something?

Edit:

Just to clarify, I can easily implement this using the more specific StreamReader class, as I can check for EoS. However to keep things more general, I wanted to use TextReader so I am not tied to just StreamReader. However the semantics of Peek seem a little odd, why does it not just throw if seeking isnt supported, and to this end why isn't there an EoF property for TextReader?

Involucel answered 30/11, 2012 at 18:4 Comment(8)
Is there any reason why you need to use Peek instead of Read?Grussing
it is part of a state machine so I may or may not want to consume that byte in the current state. so peek is the only option I have here? without maintaining a seperate stack of unconsumed bytes.Involucel
How about checking reader.BaseStream.CanSeek then?Grussing
May I ask: Why do so many people use .Peek and not .EndOfStream to check if the End of the Stream is reached? Is there any advantage using .Peek?Gurnard
textreader doesn't expose basestream as it may not have a basestream (in the case of string reader). I think this is what really motivates this question.Involucel
@ColinBull: I meant streams, that support EOS. I often see code, where ppl read from a stream and use .Peek to determine if they reached the end. Is this just some kind of historical C&P or is there an idea behind it?Gurnard
@Gurnard - I think one scenario would be in an interactive console application that scans input from Console.In. If you wrap Console.In in a StreamReader, it will change the UI behavior and you could get into some messy coding just to get back to the default Console.In behavior. It's easier to just test for -1 in this case. But as you said, .EndOfStream should be preferred in most situations. I suspect there is some mental scarring in people that come from a Unix background.Callaghan
If you are curious what this was used for it is actually a HTML parser for an (F# type provider) [github.com/fsharp/FSharp.Data/blob/HtmlTypeProvider/src/Html/… I ended up keeping the -1 check but changing the way I maintained the parse state.Involucel
M
4

It really depends on what you're doing in the parse.

I would usually just Read, and see how much is read. I would suggest not reading a character at a time though:

char[] buffer = new char[1024 * 16];
int charsRead;
while ((charsRead = read.Read(buffer, 0, buffer.Length)) > 0)
{
    // Process buffer, only as far as charsRead
}
Minaminabe answered 1/12, 2012 at 9:2 Comment(10)
I'm not understanding what the problem with Read() would be, as the provided Textreader implementations are internally buffered. Or so my C# 3.0 book tells me. I do find it annoying that I have to first do tmp = Read() so I can check for -1, and then do c = (char) tmp to get what I want, but that seems safer/cleaner than checking whether c == char.MaxValue()Cecelia
It may be buffered, yes. It feels cleaner to me to read a buffer at a time though... why rely on buffering when you really don't need to? Unless you explicitly include buffering, you could easily get a nasty performance problem due to an implementation detail changing.Minaminabe
Hmm, writing all that extra code/logic just to reimplement what the framework already provides would seem to defeat the purpose of using the framework. (Feels like going back to C++.) Inside the outer loop you'd need another loop like this, right? for (int i=0; i < charsRead; i++) { char val = (char)buffer[i]; /* process val... */ } I'd rather focus on writing the program logic; i.e. val = (char) read.Read();Cecelia
@JCoombs: Usually not, actually - you'd usually be able to deal with the characters in bulk. It does depend on what you're doing, of course - but my experience is that it's often at least as easy (and sometimes easier) to use the "bulk" Read as the "one character at a time" Read. (Alternatively, read a line at a time via ReadLine - that's often easier than either.) And I'd definitely not just cast Read, as that's removing the information about whether you've actually reached the end of the stream. I'd focus on correctness first.Minaminabe
Interesting. I usually find I'm either dealing with one line of text or one character at a time. The nice thing about having a buffer would be being able to see ahead (like multiple Peek()), except I don't see how that would work because it would break when you happen to be toward the end of the buffer, right?Cecelia
@JCoombs: Well if you're reading lines, just use ReadLine. Often when I'm not reading lines, I'm copying the text from one place to another - so it's a matter of reading a chunk, then writing it to a TextWriter.Minaminabe
Right, my point exactly about ReadLine(). Ah, I see how your approach would work nicely for copying binary files chunk by chunk. I tend to work with text files. [EDIT: Oh sorry, you did say text.Cecelia
I think Peek is better because Read advances a character forward in the reader and that is unintended.Lucianolucias
@trinalbadger587: My point is that instead of using Peek and then reading, you just keep reading until it's doneMinaminabe
@JonSkeet Ah, ok. I agree. +1Lucianolucias
A
6

unless you are looking for a specific value using Peek() why not use .Read()

for example

string line;
System.IO.StreamReader file = new System.IO.StreamReader(strfn);
while((line = file.ReadLine()) != null)
{
  this.richTextBox1.AppendText(line+"\n");//you can replace this line to fit your UseCase
}

If you want a Cleaner example of how this could be done you could do something like what I have posted below it readable and you can plug in your own text file values and Debug this to see that it will work. Reading and Writing

string tempFile = Path.GetTempFileName();
using(var sr = new StreamReader("file.txt"))
{
  using(var sw = new StreamWriter(tempFile))
  {
    string line;
    while((line = sr.ReadLine()) != null)
    {
         if(line != "BlaBlaBla")
             sw.WriteLine(line);
    }
  }
}

Here is another option you could try

From a Stream, if you Read(buffer, offset, count) you'll get a non-positive result, and if you Peek() you'll get a negative result.

With a BinaryReader, the documentation suggests that PeekChar() should return negative:

Return Value

Type: System.Int32 The next available character, or -1 if no more characters are available or the stream does not support seeking.

are you sure this isn't a corrupt stream? i.e. the remaining data cannot form a complete char from the given encoding?

Abrahamsen answered 30/11, 2012 at 18:12 Comment(1)
Although as solution, The problem with this is that a line could be arbitrarily long. Imagine a 1 Tb file with no carriage returns and line feeds, this would try and load this into memory.Involucel
M
4

It really depends on what you're doing in the parse.

I would usually just Read, and see how much is read. I would suggest not reading a character at a time though:

char[] buffer = new char[1024 * 16];
int charsRead;
while ((charsRead = read.Read(buffer, 0, buffer.Length)) > 0)
{
    // Process buffer, only as far as charsRead
}
Minaminabe answered 1/12, 2012 at 9:2 Comment(10)
I'm not understanding what the problem with Read() would be, as the provided Textreader implementations are internally buffered. Or so my C# 3.0 book tells me. I do find it annoying that I have to first do tmp = Read() so I can check for -1, and then do c = (char) tmp to get what I want, but that seems safer/cleaner than checking whether c == char.MaxValue()Cecelia
It may be buffered, yes. It feels cleaner to me to read a buffer at a time though... why rely on buffering when you really don't need to? Unless you explicitly include buffering, you could easily get a nasty performance problem due to an implementation detail changing.Minaminabe
Hmm, writing all that extra code/logic just to reimplement what the framework already provides would seem to defeat the purpose of using the framework. (Feels like going back to C++.) Inside the outer loop you'd need another loop like this, right? for (int i=0; i < charsRead; i++) { char val = (char)buffer[i]; /* process val... */ } I'd rather focus on writing the program logic; i.e. val = (char) read.Read();Cecelia
@JCoombs: Usually not, actually - you'd usually be able to deal with the characters in bulk. It does depend on what you're doing, of course - but my experience is that it's often at least as easy (and sometimes easier) to use the "bulk" Read as the "one character at a time" Read. (Alternatively, read a line at a time via ReadLine - that's often easier than either.) And I'd definitely not just cast Read, as that's removing the information about whether you've actually reached the end of the stream. I'd focus on correctness first.Minaminabe
Interesting. I usually find I'm either dealing with one line of text or one character at a time. The nice thing about having a buffer would be being able to see ahead (like multiple Peek()), except I don't see how that would work because it would break when you happen to be toward the end of the buffer, right?Cecelia
@JCoombs: Well if you're reading lines, just use ReadLine. Often when I'm not reading lines, I'm copying the text from one place to another - so it's a matter of reading a chunk, then writing it to a TextWriter.Minaminabe
Right, my point exactly about ReadLine(). Ah, I see how your approach would work nicely for copying binary files chunk by chunk. I tend to work with text files. [EDIT: Oh sorry, you did say text.Cecelia
I think Peek is better because Read advances a character forward in the reader and that is unintended.Lucianolucias
@trinalbadger587: My point is that instead of using Peek and then reading, you just keep reading until it's doneMinaminabe
@JonSkeet Ah, ok. I agree. +1Lucianolucias
M
3

It should be reader.Read() == -1 no more or else character exists.

Martamartaban answered 30/11, 2012 at 18:6 Comment(0)
G
0

If you simply need to read and process all data till the end of your stream, then you should use Read directly, which returns -1 if no more characters are available.

int nextByte;
while ((nextByte = reader.ReadByte()) != -1)
    // Process nextByte here.

Edit: An alternative way of checking specifically whether the reader supports seeking is to check the underlying stream:

bool canSeek = reader.BaseStream.CanSeek;

If this returns true, then Peek should only return -1 when the end of the stream is reached.

Grussing answered 30/11, 2012 at 18:11 Comment(1)
He specifically asked about TextReader not StreamReader.Lucianolucias

© 2022 - 2024 — McMap. All rights reserved.