Can punycode-encoded email addresses clash with "real" addresses?
Asked Answered
S

5

13

The problem is this: I'm using a third-party Email delivery service that doesn't accept mail addresses with non-ASCII characters in the name part, like mü[email protected] .

Encoding such an address with Punycode:

http://en.wikipedia.org/wiki/Punycode

http://idnaconv.phlymail.de/index.php?decoded=m%C3%BCller%40example.com&idn_version=2008&encode=Encode+%3E%3E&lang=de

yields this address:

[email protected]

And sending mail to it via the service seems to work.

However, I'm not sure if someone couldn't register "[email protected]" directly, thus receiving Emails meant for "mü[email protected]".

Is this clashing possible ? Are there other solutions for this problem ?

UPDATE

Thanks for the answers. Here's a summary of what we learned:

  • Punycoding the local part of the email address works, and you can send and receive from such an encoded address (of course)
  • However, there are no guarantees at all that providers or mail clients will understand the encoding, or do it automatically. Clashes are therefore possible, and the whole idea not a good one :)
  • One should simply do what everyone else does, which is to not allow or accept non-ASCII name parts, as per specification
  • And finally, it turns out the third-party service prohibits such shenanigans anyway.
Spermatophore answered 21/9, 2011 at 9:25 Comment(9)
Wouldn't this be true for anything punycoded? If someone really wanted the domain xn--stjrt-ira.xxx, for example, it would "clash" with the punydecoded version of the name. What is the actual problem that you perceive?Flux
Well, I would hope that registrars would be aware of this and will automatically combine the encoded and non-encoded versions of a domain, so no spoofing (intentional or not) is possible. I'd like to know if this is true, and if it also applies for email providers.Spermatophore
How do you see that spoofing comes into this? You mean the case where you register xn--mller-kva@ with the malicious intent of receiving e-mail destined for müller, who does not have this address? Do e-mail clients even punycode addresses?Flux
I've tested it with a mailinator.com address, and it received a mail sent to the encoded address. And yes, that's what I'm trying to find out, whether or not this problem is one. "Spoofing" may also be unintentional, I guess, if someone likes cryptic addresses with lots of dashes...Spermatophore
You mean it sent a mail to xn--mller-kva@ successfully? Why wouldn't it? :) But did you find an e-mail client that actually punycodes e-mail addresses? Because that's the only way the problem can arise, right?Flux
according to RFC 5322 you can't have umlauts in the local part of an email address anyway, so the address would ALWAYS need to be "puncycoded" to work correctly, wouldn't it? en.wikipedia.org/wiki/Email_address#SyntaxQuadrennial
@bzlm: That doesn't really matter for my use case, as I'm only sending out mails (newsletters and such).Spermatophore
@Gryphius: Well, some users tried to sign up for a newsletter with an umlaut in their email local part, so I guess at least some providers allow umlauts. Maybe it's all handled encoded by them under the covers...Spermatophore
...but probably not by punycode. :)Flux
Q
2

did a few tests.. umlauts in the local part seem to work in certain setups. neither my MUA (claws) nor the outbound relay (exim) nor the receiving MTA (postfix) complained or did any punycode conversion. providers like gmail and hotmail however don't allow the umlauts at all ( tested webmail and direct incoming and outgoing smtp). I didn't find any documentation about this case that suggests punycoding local parts.so, since it's not documented and no one does it there is no clashing problem :-)

conclusion: you probably shouldn't accept umlauts in the local part in the first place and not even try to send an email to those addresses. (if the big players don't do it and it's not documented/supported by RFC, why should you?)

Quadrennial answered 21/9, 2011 at 13:4 Comment(0)
H
11

Non-ASCII characters are not allowed in the local part of email addresses. Period. Punycode is ONLY FOR DOMAINS, not for local parts of email addresses.

However, it is very likely that the IETF adopts a standard that makes internationalized local parts possible. This standard, however, will probably not be based on punycode.

Hawkins answered 27/10, 2011 at 19:45 Comment(2)
"Non-ASCII characters are not allowed in the local part of email addresses" since this answer was written the situation has changed see tools.ietf.org/html/rfc6531Melt
Yes. And unless there is a standard way to approach this, there is yet no proper way to approach it!Susansusana
O
4

I got bored and was researching this tonight, and apparently this is now codified in the Extended SMTP standard, specifically SMTPUTF8 as per RFC 6531. See http://en.wikipedia.org/wiki/Extended_SMTP#SMTPUTF8

My brief experiment using emoji mailbox names returned the following error when sending via Gmail:

local-part of envelope contains utf8 but remote server did not offer SMTPUTF8

This is the same regardless whether I used the emoji or punycode version of the address.

Own answered 26/5, 2015 at 6:12 Comment(0)
D
3

You can encode sections of mail header fields into different character encodings using a format like the following: =?UTF-8?B?w6HDq8O0?= This allows you to embed things like umlauts but I'm pretty sure it doesn't work for the actual address part.

There's not reason why you cannot use these characters to form your address. RFC5322 defines the characters that may appear in the address part in Section 3.4 and all the characters you use above are valid. However as the other comment added it's all a little fruitless if the mail clients that you are sending to cannot parse this format.

Some SMTP servers might 'accidentally' allow umlauts but since they're not within the supported character ranges I wouldn't risk it.

Delinquent answered 22/9, 2011 at 8:24 Comment(0)
A
3

The only standard way to send non us-ascii characters in the local-part of a email address is through rfc6532 (Internationalized email headers) and rfc6531 (SMTP Extension for SMTPUTF8).

As far as I know there is no standard way to encode non us-ascii chars in a local part of a email address notably:

  • Puny code is for domain names only, not the local part. But you can have a local part which happens to look like the puny encoding of some string but it should be displayed in it's puny encoded form. If a mail program decides to display it after puny decoding it it's non standard behavior.

  • The encoded word encoding mechanism mentioned in one of the answers (the =?utf-8?Q?foobar?= thing) is not applicable to the local part of a mail address, only to the display name of a mailbox (which is something different, but related i.e. the thing your mail program might display instead of the mail address).

In the end this means that [email protected] and [email protected] are two completely unrelated email addresses which just would have the same meaning if they would have been domains (but they are not so they can collide).

Theoretically you could hope that by now (2019) all mail servers support SMTPUTF8 and all client support internationalized mails, but sadly I would not count on it if it's important.

Btw. it happens that the local part of a email address is the only thing in the mail standard(s) where you might want to have non us-ascii chars and there is no way to encode it (as far as I know). All other parts either have encoded word, puny, percent, base64, quoted-printable or some other form of encoding mechanism.

Antonio answered 2/4, 2019 at 12:45 Comment(1)
This is no longer true. Please read this: en.wikipedia.org/wiki/International_emailSapor
Q
2

did a few tests.. umlauts in the local part seem to work in certain setups. neither my MUA (claws) nor the outbound relay (exim) nor the receiving MTA (postfix) complained or did any punycode conversion. providers like gmail and hotmail however don't allow the umlauts at all ( tested webmail and direct incoming and outgoing smtp). I didn't find any documentation about this case that suggests punycoding local parts.so, since it's not documented and no one does it there is no clashing problem :-)

conclusion: you probably shouldn't accept umlauts in the local part in the first place and not even try to send an email to those addresses. (if the big players don't do it and it's not documented/supported by RFC, why should you?)

Quadrennial answered 21/9, 2011 at 13:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.