Uri.EscapeDataString weirdness
Asked Answered
C

1

7

Why does EscapeDataString behave differently between .NET 4 and 4.5? The outputs are

  • Uri.EscapeDataString("-_.!~*'()") => "-_.!~*'()"

  • Uri.EscapeDataString("-_.!~*'()") => "-_.%21~%2A%27%28%29"

The documentation

By default, the EscapeDataString method converts all characters except for RFC 2396 unreserved characters to their hexadecimal representation. If International Resource Identifiers (IRIs) or Internationalized Domain Name (IDN) parsing is enabled, the EscapeDataString method converts all characters, except for RFC 3986 unreserved characters, to their hexadecimal representation. All Unicode characters are converted to UTF-8 format before being escaped.

For reference, unreserved characters are defined as follows in RFC 2396:

unreserved    = alphanum | mark

mark          = "-" | "_" | "." | "!" | "~" | "*" | "'" |
                (" | ")"

And in RFC 3986:

ALPHA / DIGIT / "-" / "." / "_" / "~"

The source code

It looks like whether each character of EscapeDataString is escaped is determined roughly like this

is unicode above \x7F
  ? PERCENT ENCODE
  : is a percent symbol
    ? is an escape char
      ? LEAVE ALONE
      : PERCENT ENCODE
    : is a forced character
      ? PERCENT ENCODE
      : is an unreserved character
        ? PERCENT ENCODE

It's at that final check "is an unreserved character" where the choice between RFC2396 and RFC3986 is made. The source code of the method verbatim is

    internal static unsafe bool IsUnreserved(char c)
    {
        if (Uri.IsAsciiLetterOrDigit(c))
        {
            return true;
        }
        if (UriParser.ShouldUseLegacyV2Quirks)
        {
            return (RFC2396UnreservedMarks.IndexOf(c) >= 0);
        }
        return (RFC3986UnreservedMarks.IndexOf(c) >= 0);
    }

And that code refers to

    private static readonly UriQuirksVersion s_QuirksVersion = 
        (BinaryCompatibility.TargetsAtLeast_Desktop_V4_5
             // || BinaryCompatibility.TargetsAtLeast_Silverlight_V6
             // || BinaryCompatibility.TargetsAtLeast_Phone_V8_0
             ) ? UriQuirksVersion.V3 : UriQuirksVersion.V2;

    internal static bool ShouldUseLegacyV2Quirks {
        get {
            return s_QuirksVersion <= UriQuirksVersion.V2;
        }
    }

Confusion

It seems contradictory that the documentation says the output of EscapeDataString depends on whether IRI/IDN parsing is enabled, whereas the source code says the output is determined by the value of TargetsAtLeast_Desktop_V4_5. Could someone clear this up?

Canorous answered 25/7, 2014 at 18:44 Comment(1)
This question helped me figure out why I wasn't getting the expected behavior under .Net 4.5, although I didn't realize it at first. I should have read your comments on the source code more carefully! Thanks! My stackoverflow question: #41007373Ananthous
P
-1

A lot of changes has been done in 4.5 comparing to 4.0 in terms of system functions and how it behaves. U can have a look at this thread

Why does Uri.EscapeDataString return a different result on my CI server compared to my development machine?

or

U can directly go to the following link

http://msdn.microsoft.com/en-us/library/hh367887(v=vs.110).aspx

All this has been with the input from the users around the world.

Presentationism answered 16/11, 2014 at 6:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.