How do you reset a C# .NET TextReader cursor back to the start point?
Asked Answered
F

4

27

I have a method that takes either a StringReader instance (reading from the clipboard) or a StreamReader instance (reading from a file) and, at present, casts either one as a TextReader instance.

I need it to 'pre-read' some of the source input, then reset the cursor back to the start. I do not necessarily have the original filename. How to I do this?

There is mention of the Seek method of System.IO.Stream but this is not implemented in TextReader, although it is in StreamReader through the Basestream property. However StringReader does not have a BaseStream property

First answered 6/5, 2009 at 19:51 Comment(0)
S
42

It depends on the TextReader. If it's a StreamReader, you can use:

sr.BaseStream.Position = 0;
sr.DiscardBufferedData();

(Assuming the underlying stream is seekable, of course.)

Other implementations of TextReader may not have a concept of "rewinding", in the same way that IEnumerable<T> doesn't. In many ways you can think of TextReader as a glorified IEnumerable<char>. It has methods to read whole chunks of data at a time, read lines etc, but it's fundamentally a "forward reading" type.

EDIT: I don't believe StringReader supports any sort of rewinding - you'd be better off recreating the StringReader from the original string, if you can. If that's not feasible, you could always create your own TextReader class which proxies all the "normal" calls to another StringReader, but recreates that proxy instance when it needs to.

Salientian answered 6/5, 2009 at 19:54 Comment(5)
Thanks for the heads up on BaseStream, I've updated the question so that perhaps the problem can be fully resolved.First
Thanks for that, looks like I might have to rethink what is passed to the method using an overload.First
I had issues using this kind of reset: the string that is read after such a reset contains 3 hidden characters at the beginning that probably determine the encoding. Consequence is that doing comparing string read on reset streamreader and "expected string" surprisingly returns false.Prohibit
@sthiers: Not sure what you mean by "hidden characters" - but this is probably a UTF-BOM. You might want to post a new question.Salientian
@Jon: I wrote a unit test that clarifies my question #28858205Prohibit
C
12

If it is a StreamReader, and if that stream supports seeking, then:

        reader.BaseStream.Seek(0, SeekOrigin.Begin);
        reader.DiscardBufferedData();

However, this is not possible on arbitrary TextReaders. You could perhaps read all of it as a string, then you can use StringReader repeatedly?

Caritacaritas answered 6/5, 2009 at 19:55 Comment(3)
Thanks for the heads up on BaseStream, I've updated the question so that perhaps the problem can be fully resolved.First
If you initialize your text reader like so: TextReader reader = new StreamReader() then you can cast the reader to the base class and call the method after seeking: stream.Seek(offset, SeekOrigin.Begin); (reader as StreamReader).DiscardBufferedData();Reest
@MarcGravell absolutely, I was just clarifying for the benefit of future visitorsReest
D
3

I came across your post, and was rather diappointed that this isn't possible. I hacked up something quick which works: just avoid the stringreader altogether:

Stream stream = new MemoryStream((new System.Text.ASCIIEncoding().GetBytes(mystring)), false);
reader = new StreamReader(stream, new System.Text.ASCIIEncoding());

It's not pretty, and it uses ASCII (which was what I needed anyway). Note that with a different encoding this won't work, as you will seek the n'th byte which doesn't have be equal to the n'th character. If you do need that you could do something like

Stream stream = new MemoryStream((new System.Text.UTF32Encoding().GetBytes(mystring)), false);
reader = new StreamReader(stream, new System.Text.UTF32Encoding());

as UTF32 is the only fixed length unicode format.

You can now do

reader.BaseStream.Position = wherever; // or wherever * 4 for the UTF32 variety,
                                       // in your case, the beginning of the string,
                                       // which is always 0 obviously
reader.DiscardBufferedData();
Dort answered 4/5, 2012 at 13:49 Comment(10)
I was doing a search for "32 unicode seek c# streamreader" and you hit the nail on the head with that second snippet. +1 ... I'll go try it out!Plumley
Now I just need a 32-bit version of string and StringBuilder so I don't have to mess with surrogate pairs. (Hmm, even that doesn't quite solve parsing a mix of \n and Windows' \r\n newlines, the latter which are annoyingly similar to a surrogate pair.)Plumley
yeah, have fun with that ;) If you need seeking, you're in a bit of pain, so if you can avoid it, work around it. As soon as you have strings again, you are back in sanity-land, and surrogate pairs can be left to the String datatype which handles all the difficulty for you, and seeking a lot in a stream is a smell anyway. As far as \r goes, the best thing you can do with that IMO is to always skip/ignore it in any parsing.Dort
if you're looking to get the string out of that, you can just read it from the reader.Read methodsDort
Well, by "seek" I was also thinking of string methods such as Substring. I'm new to C# and .NET so I'm kind of feeling my way in the dark. Any helpful links to sites explaining how to work effectively with 32-bit strings?Plumley
Generally, you shouldn't be working with hand rolled 32-bit strings, but just with strings. The String datatype is UTF-16 under the hood, and should just work(tm) with surrogate pairs. Things like substring should treat a surrogate pair as a single character. What is your specific use case?Dort
Oh, if the string class methods will all treat surrogates as single characters, then I may be ok. Could casting s[i] to char be a problem? I guess I need to do some more reading; maybe 16-bit in .NET is not as bad as in Python. (What I read there is that the whole VM has to be launched with a special parameter in order to handle wide SMP characters as single characters, and that it then takes more memory.) My use case is parsing "SFM" data files (potentially of mixed enc) in which each field begins with \n, \, + ASCII field marker, followed by a space + field data (either legacy or UTF-8).Plumley
assuming legacy means ASCII-low you can just read it as UTF-8. there are no collisions. If it's cp1252 you have to resort to heuristics, and will need to go byte-by-byte, treating it as cp1252 if it's not valid UTF-8. That won't be pretty, but it won't be needing UTF-32. The field separators have the same byte value in UTF-8 as in ASCII too.Dort
I'm not worried about ASCII-low. Our legacy stuff is usually cp1252, utf8, or some other known codepage. We know which for each field because the field marker (e.g. the line begins with @"\lx ") is always ASCII and is associated with an encoding. So, my main concern was the surrogates (e.g. whether a pair can be cast to a single char, etc.). I'll test that out and ask a separate question if necessary. Thanks.Plumley
@Dort If seeking is a smell depends on what you do. Try implementing a pull reader for hierarchical data which also allows to skip nodes (nextSibling) etc. Say you only need a few things from the first nodes children, then you skip the rest and skip to the next sibling... for stuff like this you either need a "functional stream" I.e. immutable or you need a position and seek or you start creating requirements on the underlying text syntax just so your code looks pretty.Miki
D
-3

Seek and ye shall find.

Seek(0, SeekOrigin.Begin);

TextReader is derived from StreamReader. StreamReader contains a BaseStream property

Danidania answered 6/5, 2009 at 19:52 Comment(2)
"TextReader is derived from StreamReader" - other way around.Caritacaritas
Yeah, good point Marc. This against most peoples common sense but it it's true.Coming

© 2022 - 2024 — McMap. All rights reserved.