Why does EscapeDataString behave differently between .NET 4 and 4.5? The outputs are
Uri.EscapeDataString("-_.!~*'()") => "-_.!~*'()"
Uri.EscapeDataString("-_.!~*'()") => "-_.%21~%2A%27%28%29"
The documentation
By default, the EscapeDataString method converts all characters except for RFC 2396 unreserved characters to their hexadecimal representation. If International Resource Identifiers (IRIs) or Internationalized Domain Name (IDN) parsing is enabled, the EscapeDataString method converts all characters, except for RFC 3986 unreserved characters, to their hexadecimal representation. All Unicode characters are converted to UTF-8 format before being escaped.
For reference, unreserved characters are defined as follows in RFC 2396:
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
(" | ")"
And in RFC 3986:
ALPHA / DIGIT / "-" / "." / "_" / "~"
The source code
It looks like whether each character of EscapeDataString is escaped is determined roughly like this
is unicode above \x7F
? PERCENT ENCODE
: is a percent symbol
? is an escape char
? LEAVE ALONE
: PERCENT ENCODE
: is a forced character
? PERCENT ENCODE
: is an unreserved character
? PERCENT ENCODE
It's at that final check "is an unreserved character" where the choice between RFC2396 and RFC3986 is made. The source code of the method verbatim is
internal static unsafe bool IsUnreserved(char c)
{
if (Uri.IsAsciiLetterOrDigit(c))
{
return true;
}
if (UriParser.ShouldUseLegacyV2Quirks)
{
return (RFC2396UnreservedMarks.IndexOf(c) >= 0);
}
return (RFC3986UnreservedMarks.IndexOf(c) >= 0);
}
And that code refers to
private static readonly UriQuirksVersion s_QuirksVersion =
(BinaryCompatibility.TargetsAtLeast_Desktop_V4_5
// || BinaryCompatibility.TargetsAtLeast_Silverlight_V6
// || BinaryCompatibility.TargetsAtLeast_Phone_V8_0
) ? UriQuirksVersion.V3 : UriQuirksVersion.V2;
internal static bool ShouldUseLegacyV2Quirks {
get {
return s_QuirksVersion <= UriQuirksVersion.V2;
}
}
Confusion
It seems contradictory that the documentation says the output of EscapeDataString depends on whether IRI/IDN parsing is enabled, whereas the source code says the output is determined by the value of TargetsAtLeast_Desktop_V4_5
. Could someone clear this up?