You have two standards concerning URLs in links (<a href
).
The first standard is RFC 1866 (HTML 2.0) where in "3.2.1. Data Characters" you can read the characters which need to be escaped when used as the value for an HTML attribute. (Attributes themselves do not allow special characters at all, e.g. <a hr&ef="http://...
is not allowed, nor is <a hr&ef="http://...
.)
Later this has gone into the HTML 4 standard, the characters you need to escape are:
< to <
> to >
& to &
" to "e;
' to '
The other standard is RFC 3986 "Generic URI standard", where URLs are handled (this happens when the browser is about to follow a link because the user clicked on the HTML element).
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
It is important to escape those characters so the client knows whether they represent data or a delimiter.
Example unescaped:
https://example.com/?user=test&password&te&st&goto=https://google.com
Example, a fully legitimate URL
https://example.com/?user=test&password&te%26st&goto=https%3A%2F%2Fgoogle.com
Example fully legitimate URL in the value of an HTML attribute:
https://example.com/?user=test&password&te%26st&goto=https%3A%2F%2Fgoogle.com
Also important scenarios:
JavaScript code as a value:
<img src="..." onclick="window.location.href = "https://example.com/?user=test&password&te%26st&goto=https%3A%2F%2Fgoogle.com";">...</a>
(Yes, ;;
is correct.)
JSON as a value:
<a href="..." data-analytics="{"event": "click"}">...</a>
Escaped things inside escaped things, double encoding, URL inside URL inside parameter, etc,...
http://x.com/?passwordUrl=http%3A%2F%2Fy.com%2F%3Fuser%3Dtest&password=""123
I am posting a new answer because I find zneak's answer does not have enough examples, does not show HTML and URI handling as different aspects and standards and has some minor things missing.