TL;DR: do not use accepted version as It's completely broken in relation to handling unicode characters, and never use internal API
I've actually found weird double encoding issue with the accepted solution:
So, If you're dealing with characters which need to be encoded, accepted solution leads to double encoding:
- query parameters are auto encoded by using
NameValueCollection
indexer (and this uses UrlEncodeUnicode
, not regular expected UrlEncode
(!))
- Then, when you call
uriBuilder.Uri
it creates new Uri
using constructor which does encoding one more time (normal url encoding)
- That cannot be avoided by doing
uriBuilder.ToString()
(even though this returns correct Uri
which IMO is at least inconsistency, maybe a bug, but that's another question) and then using HttpClient
method accepting string - client still creates Uri
out of your passed string like this: new Uri(uri, UriKind.RelativeOrAbsolute)
Small, but full repro:
var builder = new UriBuilder
{
Scheme = Uri.UriSchemeHttps,
Port = -1,
Host = "127.0.0.1",
Path = "app"
};
NameValueCollection query = HttpUtility.ParseQueryString(builder.Query);
query["cyrillic"] = "кирилиця";
builder.Query = query.ToString();
Console.WriteLine(builder.Query); //query with cyrillic stuff UrlEncodedUnicode, and that's not what you want
var uri = builder.Uri; // creates new Uri using constructor which does encode and messes cyrillic parameter even more
Console.WriteLine(uri);
// this is still wrong:
var stringUri = builder.ToString(); // returns more 'correct' (still `UrlEncodedUnicode`, but at least once, not twice)
new HttpClient().GetStringAsync(stringUri); // this creates Uri object out of 'stringUri' so we still end up sending double encoded cyrillic text to server. Ouch!
Output:
?cyrillic=%u043a%u0438%u0440%u0438%u043b%u0438%u0446%u044f
https://127.0.0.1/app?cyrillic=%25u043a%25u0438%25u0440%25u0438%25u043b%25u0438%25u0446%25u044f
As you may see, no matter if you do uribuilder.ToString()
+ httpClient.GetStringAsync(string)
or uriBuilder.Uri
+ httpClient.GetStringAsync(Uri)
you end up sending double encoded parameter
Fixed example could be:
var uri = new Uri(builder.ToString(), dontEscape: true);
new HttpClient().GetStringAsync(uri);
But this uses obsolete Uri
constructor
P.S on my latest .NET on Windows Server, Uri
constructor with bool doc comment says "obsolete, dontEscape is always false", but actually works as expected (skips escaping)
So It looks like another bug...
And even this is plain wrong - it send UrlEncodedUnicode to server, not just UrlEncoded what server expects
Update: one more thing is, NameValueCollection actually does UrlEncodeUnicode, which is not supposed to be used anymore and is incompatible with regular url.encode/decode (see NameValueCollection to URL Query?).
So the bottom line is: never use this hack with NameValueCollection query = HttpUtility.ParseQueryString(builder.Query);
as it will mess your unicode query parameters. Just build query manually and assign it to UriBuilder.Query
which will do necessary encoding and then get Uri using UriBuilder.Uri
.
Prime example of hurting yourself by using code which is not supposed to be used like this