Short answer: iso-8859-1 unless encoded-words are used in accordance with RFC2047 (MIME).
Longer explanation:
RFC2617, section 2 (HTTP Authentication) defines basic-credentials:
basic-credentials = base64-user-pass
base64-user-pass = <base64 encoding of user-pass,
except not limited to 76 char/line>
user-pass = userid ":" password
userid = *<TEXT excluding ":">
password = *TEXT
The spec should not be read without referring to RFC2616 (HTTP 1.1) for definitions in BNF (like the one above):
This specification is a companion to the HTTP/1.1 specification 2.
It uses the augmented BNF section 2.1 of that document, and relies on
both the non-terminals defined in that document and other aspects of
the HTTP/1.1 specification.
RFC2616, section 2.1 defines TEXT (emphasis mine):
The TEXT rule is only used for descriptive field contents and values
that are not intended to be interpreted by the message parser. Words
of *TEXT MAY contain characters from character sets other than
ISO-8859-1 only when encoded according to the rules of RFC 2047.
TEXT = <any OCTET except CTLs, but including LWS>
So it's definitely iso-8859-1 unless you detect some other encoding according to RFC2047 (MIME pt. 3) rules:
// Username: Mike
// Password T€ST
Mike:=?iso-8859-15?q?T€ST?=
In this case the euro sign in the word would be encoded as 0xA4
according to iso-8859-15. It is my understanding that you should check for these encoded word delimiters, and then decode the words inside based on the specified encoding. If you don't, you will think the password is =?iso-8859-15?q?T¤ST?=
(notice that 0xA4
would be decoded to ¤
when interpreted as iso-8859-1).
This is my understanding, I can't find more explicit confirmation than these RFCs. And some of it seems contradictory. For example, one of the 4 stated goals of RFC2047 (MIME, pt. 3) is to redefine:
the format of messages to allow for ... textual header information in
character sets other than US-ASCII.
But then RFC2616 (HTTP 1.1) defines a header using the TEXT rule which defaults to iso-8859-1. Does that mean that every word in this header should be an encoded-word (i.e. the =?...?=
form)?
Also relevant, no current browser does this. They use utf-8 (Chrome, Opera), iso-8859-1 (Safari), the system code page (IE) or something else (like only the most significant bit from utf-8 in the case of Firefox).
Edit: I just realized this answer looks at the issue more from the server-side perspective.