UTF-8 characters mangled in HTTP Basic Auth username
Asked Answered
T

7

32

I'm trying to build a web service using Ruby on Rails. Users authenticate themselves via HTTP Basic Auth. I want to allow any valid UTF-8 characters in usernames and passwords.

The problem is that the browser is mangling characters in the Basic Auth credentials before it sends them to my service. For testing, I'm using 'カタカナカタカナカタカナカタカナカタカナカタカナカタカナカタカナ' as my username (no idea what it means - AFAIK it's some random characters our QA guy came up with - please forgive me if it is somehow offensive).

If I take that as a string and do username.unpack("h*") to convert it to hex, I get: '3e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a8' That seems about right for 32 kanji characters (3 bytes/6 hex digits per).

If I do the same with the username that's coming in via HTTP Basic auth, I get: 'bafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaac'. It's obviously much shorter. Using the Firefox Live HTTP Headers plugin, here's the actual header that's being sent:

Authorization: Basic q7+ryqu/q8qrv6vKq7+ryqu/q8qrv6vKq7+ryqu/q8o6q7+ryqu/q8qrv6vKq7+ryqu/q8qrv6vKq7+ryqu/q8o=

That looks like that 'bafbba...' string, with the high and low nibbles swapped (at least when I paste it into Emacs, base 64 decode, then switch to hexl mode). That might be a UTF16 representation of the username, but I haven't gotten anything to display it as anything but gibberish.

Rails is setting the content-type header to UTF-8, so the browser should be sending in that encoding. I get the correct data for form submissions.

The problem happens in both Firefox 3.0.8 and IE 7.

So... is there some magic sauce for getting web browsers to send UTF-8 characters via HTTP Basic Auth? Am I handling things wrong on the receiving end? Does HTTP Basic Auth just not work with non-ASCII characters?

Turnery answered 31/3, 2009 at 19:19 Comment(2)
Trivia: The "random characters" are not offensive. They are Japanese, and say "katakana" (8 times) in the Katakana script en.wikipedia.org/wiki/Katakana which is usually used for spelling non-Japanese words and sounds. (Which is odd, because "katakana" is a Japanese word so isn't usually spelled in katakana :-)Romanticism
Trivia addendum: I have seen it written in katakana a lot. Initially I put it down to people trying to be poetic, but I just looked it up in Jisho and it says that it's "usually written in kana".Cottony
M
60

I want to allow any valid UTF-8 characters in usernames and passwords.

Abandon all hope. Basic Authentication and Unicode don't mix.

There is no standard(*) for how to encode non-ASCII characters into a Basic Authentication username:password token before base64ing it. Consequently every browser does something different:

  • Opera uses UTF-8;
  • IE uses the system's default codepage (which you have no way of knowing, other than it's never UTF-8), and silently mangles characters that don't fit into to it using the Windows ‘guess a random character that looks a bit like the one you wanted or maybe just not’ secret recipe;
  • Mozilla uses only the lower byte of character codepoints, which has the effect of encoding to ISO-8859-1 and mangling the non-8859-1 characters irretrievably... except when doing XMLHttpRequests, in which case it uses UTF-8;
  • Safari and Chrome encode to ISO-8859-1, and fail to send the authorization header at all when a non-8859-1 character is used.

*: some people interpret the standard to say that either:

  • it should be always ISO-8859-1, due to that being the default encoding for including raw 8-bit characters directly included in headers;
  • it should be encoded using RFC2047 rules, somehow.

But neither of these proposals are on topic for inclusion in a base64-encoded auth token, and the RFC2047 reference in the HTTP spec really doesn't work at all since all the places it might potentially be used are explicitly disallowed by the ‘atom context’ rules of RFC2047 itself, even if HTTP headers honoured the rules and extensions of the RFC822 family, which they don't.

In summary: ugh. There is little-to-no hope of this ever being fixed in the standard or in the browsers other than Opera. It's just one more factor driving people away from HTTP Basic Authentication in favour of non-standard and less-accessible cookie-based authentication schemes. Shame really.

Manton answered 31/3, 2009 at 22:33 Comment(6)
I happen to disagree that Opera does it somehow right. You can't change the encoding unilaterally.Lair
Not so much ‘right’ as “what the OP wanted it to do”. Although since none of the alternatives are ‘right’, UTF-8 is at least as good as any other possible option.Manton
At least UTF-8 won't mangle some characters :) Thanks very much for this answer (it expands on Julian's - they both answer the question nicely). I did a lot of Googling and couldn't find a solid discussion of this. Time to go change my specs.Turnery
There is A New Hope: The new RFC 7617 allows servers to request UTF-8 encoding, resolving the ambiguity. A compliant client will then respond accordingly. – Of course, this doesn’t mean all client software will immediately implement RFC 7617; it’s likely to take years before this issue can be called “mostly resolved”Discern
@chirlu: Indeed! We have Julian to thank for that. Crossing fingers for implementation now...Manton
Oh right, I hadn’t made the connection – thank you, @Julian!Discern
L
5

It's a known shortcoming that Basic authentication does not provide support for non-ISO-8859-1 characters.

Some UAs are known to use UTF-8 instead (Opera comes to mind), but there's no interoperability for that either.

As far as I can tell, there's no way to fix this, except by defining a new authentication scheme that handles all of Unicode. And getting it deployed.

Lair answered 31/3, 2009 at 20:19 Comment(0)
A
3

HTTP Digest authentication is no solution for this problem, either. It suffers from the same problem of the client being unable to tell the server what character set it's using and the server being unable to correctly assume what the client used.

Axinomancy answered 20/8, 2010 at 20:16 Comment(0)
E
0

Have you tested using something like curl to make sure it's not a Firefox issue? The HTTP Auth RFC is silent on ASCII vs. non-ASCII, but it does say the value passed in the header is the username and the password separated by a colon, and I can't find a colon in the string that Firefox is reporting sending.

Enlightenment answered 31/3, 2009 at 19:41 Comment(2)
There's a colon there, once you base64 decode it. It ends up being 32 16 bit characters (at least Emacs thinks they're characters), colon, then the same 16 bit characters (I used the same string for password). I tried it with IE and got the same thing, so it's not just a Firefox thing.Turnery
I was just using some OS X dashboard widget to do the conversion, but it definitely wasn't finding a colon after base64 decoding. It must have been trying to use MacRoman or something.Enlightenment
C
0

If you are coding for Windows 8.1, note that the sample in the documentation for HttpCredentialsHeaderValue is (wrongly) using UTF-16 encoding. Reasonably good fix is to switch to UTF-8 (as ISO-8859-1 is not supported by CryptographicBuffer.ConvertStringToBinary).

See http://msdn.microsoft.com/en-us/library/windows/apps/windows.web.http.headers.httpcredentialsheadervalue.aspx.

Cosetta answered 23/10, 2013 at 17:30 Comment(0)
E
0

Here a workaround we used today to circumvent the issue of non-ascii characters in the password of a colleague:

curl -u "USERNAME:`echo -n 'PASSWORT' | iconv -f ISO-8859-1 -t UTF-8`" 'URL'

Replace USERNAME, PASSWORD and URL with your values. This example uses shell command substitution to transform the password character encoding to UTF-8 before executing the curl command.

Note: I used here a ` ... ` evaluation instead of ${ ... } because it doesn't fail if the password contains a ! character... [shells love ! characters ;-)]

Illustration of what happens with non-ASCII characters:

echo -n 'zz<zz§zz$zz-zzäzzözzüzzßzz' | iconv -f ISO-8859-1 -t UTF-8
Everhart answered 19/11, 2021 at 9:28 Comment(0)
P
-2

I might be a total ignorant, but I came to this post while looking for a problem while sending a UTF8 string as a header inside an ajax call.

I could solve my problem by encoding in Base64 the string right before sending it. That means that you could with some simple JS convert the form to base64 right before submittting and that way it can be conevrted back on the server side.

This simple tools allowed me to have utf8 strings send as simple ASCII. I found that thanks to this simple sentence:

base64 (this encoding is designed to make binary data survive transport through transport layers that are not 8-bit clean). http://www.webtoolkit.info/javascript-base64.html

I hope this helps somehow. Just trying to give back a little bit to the community!

Pluviometer answered 29/9, 2011 at 3:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.