Are chinese characters allowed entered in URLs?
Asked Answered
P

2

11

Are chinese characters allowed to be entered in URLs?

As tested, chinese characters are able to be entered in URLs, and it will convert to punycode as well and send out the request as well too, and reach to the related page.

But for currently, is there anybody else will do validation for website URLs to be allowed chinese character as well?

Plaster answered 25/8, 2011 at 3:24 Comment(1)
check this out http://🚚.la – Arawn
D
12

Punycode exists to be able to use non-Latin scripts in non-supported software. So whilst I like my site http://見.香港/ I can enter http://xn--nw2a.xn--j6w193g/ if I cannot enter the Unicode original form.

Some website developers program overly defensively, for example with Google Apps you cannot use punycode domains at all due to aggressive white listing that has not updated with ICANN standards.

UPDATE: Stackoverflow now supports Unicode domain names and thus comments below are outdated. The unusual domain name is the punycode, i.e. encoded, version of Unicode for systems that do not directly support Unicode.

xn--nw2a = 見
xn--j6w193g = 香港

As of 2022/1/1, Stackoverflow has a feature that interprets punycode domains as their Unicode form in preview, but not when saved. This is not really appropriate for a code platform which may be discussing punycode, but would be fine for other sites in the exchange.

Screenshot of preview function in stackoverflow:

Screenshot of stackoverflow edit preview with punycode domain

Durware answered 25/8, 2011 at 3:51 Comment(6)
As a perfect example see Stackoverflow itself does not parse Unicode domain names. – Durware
Do not parse unicode domain name ? So is it mean that it is not necessary to validate chinese input for URLs? Just validate normal way as allow alphanumeric, hyphen, underscore and dot only as well ? – Plaster
@Plaster you cannot click my Unicode link above. Ideally you should support Unicode URLs. There is no rulebook saying you MUST, but you must consider these are new features and will take time to be commonly handled correctly. – Durware
yes I cannot click on your Unicode link above, but can click on your punycode link above as well as it direct me to 見.香港 then. – Plaster
User require to enter chinese url. – Plaster
@Steve-o, Why would you choose to use such a weird URL like xn--nw2a.xn--j6w193g when with the same cost you can get a "proper" domain name? – Mckeown
R
1

All non-ascii characters that presents in domain name will (should) be converted to puny-code. It is browser's business to display it as a hieroglyphs

Ralph answered 25/8, 2011 at 3:27 Comment(1)
The important historical note is that many browsers disable automatic rendition of Unicode due to the security implications of similar looking Unicode entities. – Durware

© 2022 - 2024 — McMap. All rights reserved.