What's the difference between EscapeUriString and EscapeDataString?
Asked Answered
A

6

261

If only deal with url encoding, I should use EscapeUriString?

Annelid answered 9/12, 2010 at 9:23 Comment(1)
Always escape each individual value using Uri.EscapeDataString(), as explained in @Livven's answer. With other approaches, the system simply does not have enough information to produce the intended result for every possible input.Keneth
R
136

Use EscapeDataString always (for more info about why, see Livven's answer below)

Edit: removed dead link to how the two differ on encoding

Roots answered 9/12, 2010 at 9:29 Comment(11)
I'm not sure that link actually provides more information as it's regarding unescaping rather then esacaping.Stroboscope
It's basically the same difference. If you actually read the article, there's a table around the middle that actually escapes (not unescapes) to show the differences (comparing with URLEncode too).Roots
It's still not clear to me -- what if I'm not escaping a whole URI but only part of it -- (i.e. the data for a query string parameter)? Am I escaping data for the URI, or does EscapeDataString imply something completely different?Technocracy
... did some testing looks like I want EscapeDataString for a URI parameter. I tested with the string "I heart C++" and EscapeUriString did not encode the "+" characters, it just left them as is, EscapeDataString correctly converted them to "%2B".Technocracy
@Technocracy yes, that's on the article I linked at. If you want to be more specific, you should be using HttpUtility.UrlEncode if what you are encoding is a URL... that will also change spaces into + (which is correct for a URL, more so than %20 -although both will work-) and will use the more correct lowercase too. As documentation states, EscapeUriString does not convert RFC2396 reserved characters (that includes +, but also others: more info here )Roots
I'm not encoding URL or URIs, I'm encoding data that goes into the value of a query string parameter of a URL (again, that data is not a URL or URI). As far as personal preference goes: using "+" for " " in a URL is evil, because some functions (as you mention) will randomly leave them in -- and on the server side, it can be ambiguous -- where as "%20" and "%2B" are explicit -- there's no chance to get the decoding wrong.Technocracy
Yeah, well, it's a matter of standards (there's RFC's that define these kind of encodings). The problem is that browsers have historically been pretty loose on their support of encodings. The functions are not "randomly" encoding or decoding... they follow some standards or not, and it's usually documented :-)Roots
Here's a sample of running it and the other encoding methods that shows differences dotnetfiddle.net/12IFw1Unfriended
This is a bad answer. You should never use EscapeUriString, it doesn't make any sense. See Livven's answer below (and upvote it).Danika
By StackOverflow standards, this is a terrible answer. It doesn't actually explain the difference, gives confusing (and incorrect) advice, and leaves everything up to an external link. If that link becomes dead in the future, this answer will no longer be valid or correct.Mold
I have updated the answer to link to the obviously more correct answer below. Also removed the dead linkRoots
C
351

I didn't find the existing answers satisfactory so I decided to dig a little deeper to settle this issue. Surprisingly, the answer is very simple:

There is (almost) no valid reason to ever use Uri.EscapeUriString. If you need to percent-encode a string, always use Uri.EscapeDataString.*

* See the last paragraph for a valid use case.

Why is this? According to the documentation:

Use the EscapeUriString method to prepare an unescaped URI string to be a parameter to the Uri constructor.

This doesn't really make sense. According to RFC 2396:

A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics.

While the quoted RFC has been obsoleted by RFC 3986, the point still stands. Let's verify it by looking at some concrete examples:

  1. You have a simple URI, like this:

     http://example.org/
    

Uri.EscapeUriString won't change it.

  1. You decide to manually edit the query string without regard for escaping:

     http://example.org/?key=two words
    

Uri.EscapeUriString will (correctly) escape the space for you:

    http://example.org/?key=two%20words
  1. You decide to manually edit the query string even further:

     http://example.org/?parameter=father&son
    

However, this string is not changed by Uri.EscapeUriString, since it assumes the ampersand signifies the start of another key-value pair. This may or may not be what you intended.

  1. You decide that you in fact want the key parameter to be father&son, so you fix the previous URL manually by escaping the ampersand:

     http://example.org/?parameter=father%26son
    

However, Uri.EscapeUriString will escape the percent character too, leading to a double encoding:

    http://example.org/?parameter=father%2526son

As you can see, using Uri.EscapeUriString for its intended purpose makes it impossible to use & as part of a key or value in a query string instead of as a separator between multiple key-value pairs.

This is because, in an attempt at making it suitable for escaping full URIs, it ignores reserved characters and only escapes characters that are neither reserved nor unreserved, which, BTW, is contrary to the documentation. This way you don't end up with something like http%3A%2F%2Fexample.org%2F, but you do end up with the issues illustrated above.


In the end, if your URI is valid, it does not need to be escaped to be passed as a parameter to the Uri constructor, and if it's not valid then calling Uri.EscapeUriString isn't a magic solution either. Actually, it will work in many if not most cases, but it is by no means reliable.

You should always construct your URLs and query strings by gathering the key-value pairs and percent-encoding and then concatenating them with the necessary separators. You can use Uri.EscapeDataString for this purpose, but not Uri.EscapeUriString, since it doesn't escape reserved characters, as mentioned above.

Only if you cannot do that, e.g. when dealing with user-provided URIs, does it make sense to use Uri.EscapeUriString as a last resort. But the previously mentioned caveats apply – if the user-provided URI is ambiguous, the results may not be desirable.

Comatose answered 9/12, 2015 at 21:19 Comment(14)
Wow, thank you for finally clarifying this issue. The previous two answers were not very helpful.Navaho
Exactly right. EscapeUriString (like EscapeUrl's default behavior in Win32) was created by someone who didn't understand URIs or escaping. It's a misguided attempt to create something that takes a malformed URI and sometimes turn it into the intended version. But it doesn't have the information it needs to do this reliably. It also frequently gets used in place of EscapeDataString which is also very problematic. I wish EscapeUriString did not exist. Every use of it is wrong.Danika
nicely explained +1 it is way better than accepted link only answerFosse
This answer needs more attention. It is the correct way to do it. The other answers have scenarios where they do not produce the intended results.Keneth
I will be an alternate voice of reason here. Coming from JavaScript where there are two distinct functions encodeURI and encodeURIComponent, this answer and some of the comments like "I wish EscapeUriString did not exist" appear mis-guided...Kling
...Sure encodeURI/Uri.EscapeUriString is not needed as often as encodeURIComponent/Uri.EscapeDataString (since when are you deaing with blind urls that must be used in a uri context), but that does not mean it doesn't have its place.Kling
Point #3: "it assumes the ampersand signifies the start of another key-value pair" is a bit misleading. Key-value pair syntax is a web framework thing, not a URI thing. I think it's more accurate to say spaces are escaped (point #2) because they are illegal in a URI; ampersands are not, because they are not.Pankey
@CrescentFresh You haven't actually explained where encodeURI/Uri.EscapeUriString are needed. Can you give a single use case where encodeURIComponent/Uri.EscapeDataString are not the best solution for the problem?Evonevonne
Uri.EscapeDataString worked for me too. I was previously using WebUtility.HtmlEncode(str) to escape form input, however this was causing exceptions on the server of this form "A potentially dangerous Request.Form value was detected from the client". One example is for single quotes - encoded to &39; by HTMLEncode, but correctly (and safely) encoded to %27 by Uri.EscapeDataString.Spermophyte
@CrescentFresh You're right, a valid use would be as a best-effort when dealing with user-provided URIs. I added that to the answer. Are there any other you could think of?Comatose
@Comatose - even as a best-effort solution for user-provided URI's EscapeUriString is probably not a good idea. It's not clearly documented; and whatever processing you need to do for user-provided uri's is likely going to exceed that method anyhow. e.g. let's say your user enters google.com/?q=bla bla - EscapeUriString isn't going to do anything useful, unlike most browsers, that will. The tiny niche for implementing a browser url bar is so specialized, .net simply shouldn't have a method for that, and even if you're going that - don't use Uri.EscapeUriString. It's still not good enough.Encamp
@Comatose Then there's the fact that even when EscapeUriString does "something" - what destination server won't do that better? If it's comprehensible enough to best-effort escape, then let the target server deal with it. Finally consider that the "real-world" use case for EscapeUriString is simply making a bug by accident. Best be clearly about it's usefulness therefore - just don't use it. Ever.Encamp
@CrescentFresh People don't use EscapeUriString correctly - github.com/search?p=99&q=EscapeUriString&type=Code for some additional reason's why you should be 100% clear it's just not a good idea to ever use this. Nobody is using it correctly. Can you find even one case where it's at least clearly harmless and has any reasonable effect whatsoever? I can see a ton that are clearly wrong, and bet you could find a few exploitable security holes just on the basis of those search results. Don't use it; it's dangerous and useless - even as a best-effort fallback.Encamp
After wondering why it seemed "&" was not encoded, then it seemed like it was being encoded, I think I mixed up these two, causing a bit of panic. Looking closely at documentation, it seems like Uri.EscapeString is now marked Obsolete. @BrandonPaddock, seems like this is close to what you were hoping.Working
R
136

Use EscapeDataString always (for more info about why, see Livven's answer below)

Edit: removed dead link to how the two differ on encoding

Roots answered 9/12, 2010 at 9:29 Comment(11)
I'm not sure that link actually provides more information as it's regarding unescaping rather then esacaping.Stroboscope
It's basically the same difference. If you actually read the article, there's a table around the middle that actually escapes (not unescapes) to show the differences (comparing with URLEncode too).Roots
It's still not clear to me -- what if I'm not escaping a whole URI but only part of it -- (i.e. the data for a query string parameter)? Am I escaping data for the URI, or does EscapeDataString imply something completely different?Technocracy
... did some testing looks like I want EscapeDataString for a URI parameter. I tested with the string "I heart C++" and EscapeUriString did not encode the "+" characters, it just left them as is, EscapeDataString correctly converted them to "%2B".Technocracy
@Technocracy yes, that's on the article I linked at. If you want to be more specific, you should be using HttpUtility.UrlEncode if what you are encoding is a URL... that will also change spaces into + (which is correct for a URL, more so than %20 -although both will work-) and will use the more correct lowercase too. As documentation states, EscapeUriString does not convert RFC2396 reserved characters (that includes +, but also others: more info here )Roots
I'm not encoding URL or URIs, I'm encoding data that goes into the value of a query string parameter of a URL (again, that data is not a URL or URI). As far as personal preference goes: using "+" for " " in a URL is evil, because some functions (as you mention) will randomly leave them in -- and on the server side, it can be ambiguous -- where as "%20" and "%2B" are explicit -- there's no chance to get the decoding wrong.Technocracy
Yeah, well, it's a matter of standards (there's RFC's that define these kind of encodings). The problem is that browsers have historically been pretty loose on their support of encodings. The functions are not "randomly" encoding or decoding... they follow some standards or not, and it's usually documented :-)Roots
Here's a sample of running it and the other encoding methods that shows differences dotnetfiddle.net/12IFw1Unfriended
This is a bad answer. You should never use EscapeUriString, it doesn't make any sense. See Livven's answer below (and upvote it).Danika
By StackOverflow standards, this is a terrible answer. It doesn't actually explain the difference, gives confusing (and incorrect) advice, and leaves everything up to an external link. If that link becomes dead in the future, this answer will no longer be valid or correct.Mold
I have updated the answer to link to the obviously more correct answer below. Also removed the dead linkRoots
B
63

The plus (+) characters can reveal a lot about the difference between these methods. In a simple URI, the plus character means "space". Consider querying Google for "happy cat":

https://www.google.com/?q=happy+cat

That's a valid URI (try it), and EscapeUriString will not modify it.

Now consider querying Google for "happy c++":

https://www.google.com/?q=happy+c++

That's a valid URI (try it), but it produces a search for "happy c", because the two pluses are interpreted as spaces. To fix it, we can pass "happy c++" to EscapeDataString and voila*:

https://www.google.com/?q=happy+c%2B%2B

*)The encoded data string is actually "happy%20c%2B%2B"; %20 is hex for the space character, and %2B is hex for the plus character.

If you're using UriBuilder as you should be, then you'll only need EscapeDataString to properly escape some of the components of your entire URI. @Livven's answer to this question further proves that there really is no reason to use EscapeUriString.

Bricklayer answered 14/1, 2015 at 12:53 Comment(2)
Thanks. What about when you have a absolute URI string that you need to encode, for example "https://www.google.com/?q=happy c++". Looks like I manually need to split on "?", or is there a better way?Abysm
If you're passing the entire URL as a parameter to another URL, then use EscapeDataString. If the URL you provided is the actual URL, then yes you want to just split on ?.Bricklayer
P
14

Comments in the source address the difference clearly. Why this info isn't brought forward via XML documentation comments is a mystery to me.

EscapeUriString:

This method will escape any character that is not a reserved or unreserved character, including percent signs. Note that EscapeUriString will also do not escape a '#' sign.

EscapeDataString:

This method will escape any character that is not an unreserved character, including percent signs.

So the difference is in how they handle reserved characters. EscapeDataString escapes them; EscapeUriString does not.

According to the RFC, the reserved characters are: :/?#[]@!$&'()*+,;=

For completeness, the unreserved characters are alphanumeric and -._~

Both methods escape characters that are neither reserved nor unreserved.

I disagree with the general notion that EscapeUriString is evil. I think a method that escapes only illegal characters (such as spaces) and not reserved characters is useful. But it does have a quirk in how it handles the % character. Percent-encoded characters (% followed by 2 hex digits) are legal in a URI. I think EscapeUriString would be far more useful if it detected this pattern and avoided encoding % when it's immediately proceeded by 2 hex digits.

Pankey answered 4/12, 2017 at 14:53 Comment(0)
V
7

A simple example

var data = "example.com/abc?DEF=あいう\x20えお";

Console.WriteLine(Uri.EscapeUriString(data));
Console.WriteLine(Uri.EscapeDataString(data));
Console.WriteLine(System.Net.WebUtility.UrlEncode(data));
Console.WriteLine(System.Web.HttpUtility.UrlEncode(data));

/*
=>
example.com/abc?DEF=%E3%81%82%E3%81%84%E3%81%86%20%E3%81%88%E3%81%8A
example.com%2Fabc%3FDEF%3D%E3%81%82%E3%81%84%E3%81%86%20%E3%81%88%E3%81%8A
example.com%2Fabc%3FDEF%3D%E3%81%82%E3%81%84%E3%81%86+%E3%81%88%E3%81%8A
example.com%2fabc%3fDEF%3d%e3%81%82%e3%81%84%e3%81%86+%e3%81%88%e3%81%8a
*/
Vogel answered 2/9, 2018 at 4:13 Comment(0)
F
0

I was using an encrypted string as Url Parameter (eg http://example.com/Test/myencryptedkey/param2/param3) so none of the C# encryption methods could provide a safe url parameter. I ended up using the following pattern:

During Encryption: Uri.EscapeDataString(myencryptedkey).Replace('%', '~');

During Decryption: Uri.UnescapeDataString(myencryptedkey.Replace('~', '%'));

Notice that during encryption the replacement takes place after the EscapeDataString() while during decryption the replacement is happening before UnescapeDataString();

Flurry answered 1/7, 2021 at 8:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.