.Net Uri Encoding RFC 2396 vs RFC 3986
Asked Answered
P

3

10

First, some quick background... As part of an integration with a third party vendor, I have a C# .Net web application that receives a URL with a bunch of information in the query string. That URL is signed with an MD5 hash and a shared secret key. Basically, I pull in the query string, remove their hash, perform my own hash on the remaining query string, and make sure mine matches the one that was supplied.

I'm retrieving the Uri in the following way...

Uri uriFromVendor = new Uri(Request.Url.ToString());
string queryFromVendor = uriFromVendor.Query.Substring(1); //Substring to remove question mark

My issue is stemming from query strings that contain special characters like an umlaut (ü). The vendor is calculating their hash based on the RFC 2396 representation which is %FC. My C# .Net app is calculating it's hash based on the RFC 3986 representation which is %C3%BC. Needless to say, our hashes don't match, and I throw my errors.

Strangely, the documentation for the Uri class in .Net says that it should follow RFC 2396 unless otherwise set to RFC 3986, but I don't have the entry in my web.config file that they say is required for this behavior.

How can I force the Uri constructor to use the RFC 2396 convention?

Failing that, is there an easy way to convert the RFC 3986 octet pairs to RFC 2396 octets?

Pinetum answered 11/8, 2011 at 23:44 Comment(3)
Nothing to do with your question, but why are you creating a new Uri here? You can just do string queryFromVendor = Request.Url.Query.Substring(1);Polymerization
Which version of .NET are you using?Eliciaelicit
I'm using .Net 4.0. Also, curiously, when I tried using Request.Url.Query the umlaut was coming through as %ufffd which is the � character.Pinetum
P
3

Nothing to do with your question, but why are you creating a new Uri here? You can just do string queryFromVendor = Request.Url.Query.Substring(1); – atticae

+1 for atticae! I went back to try removing the extraneous Uri I was creating and suddenly, the string had the umlaut encoded as UTF-8 instead of UTF-16.

At first, I didn't think this would work. Somewhere along the line, I had tried retrieving the url using Request.QueryString, but this was causing the umlaut to come through as %ufffd which is the � character. In the interest of taking a fresh perspective, I tried atticae's suggestion and it worked.

I'm pretty sure the answer has to do with something I read here.

C# uses UTF-16 in all its strings, with tools to encode when it comes to dealing with streams and files that bring us onto...

ASP.NET uses UTF-8 by default, and it's hard to think of a time when it isn't a good choice...

My problems stemmed from here...

Uri uriFromVendor = new Uri(Request.Url.ToString());

By taking the Request.Url uri and creating another uri, it was encoding as the C# standard UTF-16. By using the original uri, it remained in the .Net standard UTF-8.

Thanks to all for your help.

Pinetum answered 12/8, 2011 at 16:52 Comment(0)
P
1

I'm wondering if this is a bit of a red herring:

I say this because FC is the UTF16 representation of the u with umlaut; C2BC is the UTF8 representation.

I wonder if one of the System.Text.Encoding methods to convert the source data into a normal .Net string might help.

This question might be of interest too: Encode and Decode rfc2396 URLs

Perdita answered 12/8, 2011 at 0:18 Comment(0)
P
1

I don't know about the standard encoding for Uri constructors, but if everything else fails you could always decode the URL yourself and encode it in whatever encoding you like.

The HttpUtility-Class has an UrlDecode() and UrlEncode() method, which lets you specify the System.Text.Encoding as second parameter.

For example:

string decodedQueryString = HttpUtility.UrlDecode(Request.Url.Query.Substring(1));
string encodedQueryString = HttpUtility.UrlEncode(decodedQueryString, System.Text.Encoding.GetEncoding("utf-16"));
// calc hash here
Polymerization answered 12/8, 2011 at 0:34 Comment(4)
It looks like the UrlEncode method gets a little overzealous. the output has encoded everything in the string... c%00o%00n%00f%00i%00r%00m%00a%00t%00i%00o%00...Pinetum
Note: It did manage to convert the umlaut back to %FC though. Is there a way to be a bit more selective so it doesn't encode the ampersands, pluses, and %00's?Pinetum
Ok, first of all: The %00 result are a result of utf-16 encoding, which I just used as an example here. Replace it with whatever encoding you want. If you want ü to be converted to %FC, maybe you are looking for latin1: System.Text.Encoding.GetEncoding("latin1")Polymerization
Your problem seems to be solved now, just for the sake of completeness: You could circumvent the encoding of & and = chars by splitting the query string, encoding each piece and putting it back together. But as I said, the whole answer is merely a workaround for the problem.Polymerization

© 2022 - 2024 — McMap. All rights reserved.