System.out.println(
new URI("http", "example.com", "/servlet", "a=x%20y", null));
The result is http://example.com/servlet?a=x%2520y
, where the query parameter value differs from the supplied one. Strange, but this does follow the Javadoc:
"The percent character ('%') is always quoted by these constructors."
We can pass the decoded string, a=x y
and then we get a reasonable(?) result a=x%20y
.
But what if the query parameter value contains an "&" character? This happens for example if the value is an URL itself with query parameters. Look at this (wrong) query string:
a=b&c
. The ampersand must be escaped here (a=b%26c
), otherwise this can be considered as a query parameter a=b
and some garbage (c
). If I pass this to an URI constructor, it encodes it, and returns a wrong URL: ...?a=b%2526c
This issue seems to render java.util.URI useless. Am I missing something here?
Summary of answers
java.net.URI does know about the existence of the query part of an URI, but it does not understand the internals of the query part, which can differ for each scheme. For example java.net.URI does not understand the internal structure of the HTTP query part. This would not be a problem, if java.net.URI considered query as an opaque string, and did not alter it. But it tries to apply some generic percent-encoding algorithm, which breaks HTTP URLs.
Therefore I cannot use the URI class to reliably assemble an URL from its parts, despite there are constructors for it. I would also mention that as of Java 7, the implementation of the relativize operation is quite limited, only works if one URL is the prefix of another one. These two functionality (and its leaner interface for these purposes) were the reason why I was interested in java.net.URI, but neither of them works for me.
At the end I used java.net.URL for parsing, and wrote code to assemble an URL from parts and to relativize two URLs. I also checked the Apache HttpClient URIBuilder class, and although it does understand the internals of an HTTP query string, but as of 4.3, it has the same problem with encoding like java.net.URI when dealing with the query part as a whole.
a=b&c
is syntactically valid, but it does not mean what is obviously intended: a query parameter nameda
with a valueb&c
. The ampersand must be escaped, but than URI returns a messed up URL in toString(). Lets see a more realistic example, we pass a relative URLmypage?hello=world
in thereturn
parameter. The full, valid URL is:http://example.com/some?return=mypage%3Fhello%3Dworld
. What should I pass in the java.net.URI multi-argument constructors, to get back this full URL? – Zellazelle