What character encoding is used by StreamReader.ReadToEnd()?
Asked Answered
S

3

6
  • What character encoding is used by StreamReader.ReadToEnd()?
  • What would be the reason to use (b) instead of (a) below?
  • Is there a risk of their being a character encoding problem if (a) is used instead of (b)?
  • Is there another method that is better than (a) and (b)?

(a)

Dim strWebResponse As String
Dim Request As HttpWebRequest = WebRequest.Create(Url)
Using Response As WebResponse = smsRequest.GetResponse()
    Using reader As StreamReader = New StreamReader(Response.GetResponseStream())
        strWebResponse = reader.ReadToEnd()
    End Using
End Using

(b)

Dim encoding As New UTF8Encoding()
Dim strWebResponse As String
Dim Request As HttpWebRequest = WebRequest.Create(Url)
Using Response As WebResponse = Request.GetResponse()
    Dim responseBuffer(Response.ContentLength - 1) As Byte
    Response.GetResponseStream().Read(responseBuffer, 0, Response.ContentLength - 1)
    strWebResponse = encoding.GetString(responseBuffer)
End Using
Secateurs answered 12/11, 2012 at 4:30 Comment(1)
Have you ever looked at here ? msdn.microsoft.com/en-us/library/…, pull this encoding and use it in streamreader object.Jaymejaymee
A
13

The standard encoding used by StreamReader is Encoding.Default, which will vary from machine to machine depending on your version of Windows and the locale that you have set. Encoding.UTF8.

I have trouble remembering what the defaults are, so I prefer to use the StreamReader constructor that lets me specify the encoding. For example:

Using reader As StreamReader = New StreamReader(Response.GetResponseStream(), Encoding.UTF8)

See the constructor documentation for more info.

If you use that constructor in your example a, the results will be the same as for your example b.

Should you use UTF-8? That depends on the page you're downloading. If the page you're downloading was encoded with UTF-8 then, yes, you should use UTF-8. UTF-8 is supposed to be the default if no character set is defined in the HTTP headers. But you need to check the Content-Type header to determine if the page uses some other encoding. For example, the Content-Type header might read:

 application/xml; charset=ISO-8859-2

You would have to examine the ContentType property of the HttpWebResponse, check to see if there is a charset field, and set the encoding properly based on that.

Or, just use UTF-8 and hope for the best.

Amphistylar answered 12/11, 2012 at 4:44 Comment(6)
Do you agree with the first answer that UTF8 encoding should be used?Secateurs
No! The default encoding is not Encoding.Default but UTF-8, as specified in, well, the constructor documentation :). I agree it was confusing that Microsoft named that encoding as Default, when its not actually the default in .Net. Presumably the justification is that It's the default for older non-Unicode native Windows programs.Finedraw
@MarkJ: You're right. I was mistaken. An old version of the documentation used to say (incorrectly) that it used Encoding.Default. See informit.com/guides/content.aspx?g=dotnet&seqNum=163 for details. I'll make the correction.Amphistylar
@JimMischel & @MarkJ: where does it say that it defaults to UTF-8? All I can see is: The character encoding is set by the encoding parameter, and the buffer size is set to 1024 bytes. The StreamReader object attempts to detect the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the user-provided encoding is used. See the Encoding.GetPreamble method for more information.Secateurs
@CJ7: See msdn.microsoft.com/en-us/library/yhfzs7at.aspx. "This constructor initializes the encoding to UTF8Encoding..."Amphistylar
@JimMischel: my mistake, I was looking at the (Stream, Encoding) constructor.Secateurs
H
0

yes b is good because UTF-8 will work with any ASCII document
UTF8 is Unicode encoding type.
More importantly its backwards compatible with ASCII,& the standard default for XML and HTML

Hanzelin answered 12/11, 2012 at 4:33 Comment(1)
Can I make StreamReader.ReadToEnd use UTF8?Secateurs
J
0

I found a solution, It's not pretty but it works.
First you'll have to set your StreamReader to DetectEncoding as true,
Then you put some special character on your page.

StreamReader reader = new StreamReader(responseStream, System.Text.Encoding.Default, true);

<%@ Page Title="FTP - Verificação" Language="C#"

Janson answered 27/1, 2014 at 12:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.