java.net.URI and percent in query parameter value
Asked Answered
R

4

7
System.out.println(
    new URI("http", "example.com", "/servlet", "a=x%20y", null));

The result is http://example.com/servlet?a=x%2520y, where the query parameter value differs from the supplied one. Strange, but this does follow the Javadoc:

"The percent character ('%') is always quoted by these constructors."

We can pass the decoded string, a=x y and then we get a reasonable(?) result a=x%20y.

But what if the query parameter value contains an "&" character? This happens for example if the value is an URL itself with query parameters. Look at this (wrong) query string: a=b&c. The ampersand must be escaped here (a=b%26c), otherwise this can be considered as a query parameter a=b and some garbage (c). If I pass this to an URI constructor, it encodes it, and returns a wrong URL: ...?a=b%2526c

This issue seems to render java.util.URI useless. Am I missing something here?

Summary of answers

java.net.URI does know about the existence of the query part of an URI, but it does not understand the internals of the query part, which can differ for each scheme. For example java.net.URI does not understand the internal structure of the HTTP query part. This would not be a problem, if java.net.URI considered query as an opaque string, and did not alter it. But it tries to apply some generic percent-encoding algorithm, which breaks HTTP URLs.

Therefore I cannot use the URI class to reliably assemble an URL from its parts, despite there are constructors for it. I would also mention that as of Java 7, the implementation of the relativize operation is quite limited, only works if one URL is the prefix of another one. These two functionality (and its leaner interface for these purposes) were the reason why I was interested in java.net.URI, but neither of them works for me.

At the end I used java.net.URL for parsing, and wrote code to assemble an URL from parts and to relativize two URLs. I also checked the Apache HttpClient URIBuilder class, and although it does understand the internals of an HTTP query string, but as of 4.3, it has the same problem with encoding like java.net.URI when dealing with the query part as a whole.

Roxane answered 11/11, 2013 at 22:17 Comment(0)
G
1

The query string

a=b&c

is not wrong in a URI. The RFC on URI Generic Syntax states

The query component is a string of information to be interpreted by the resource.

  query         = *uric

Within a query component, the characters ";", "/", "?", ":", "@",
"&", "=", "+", ",", and "$" are reserved.

The character & in the query string is very much valid (uric represents reserved, mark, and alphanumeric characters). The RFC also states

Many URI include components consisting of or delimited by, certain
special characters. These characters are called "reserved", since
their usage within the URI component is limited to their reserved
purpose. If the data for a URI component would conflict with the
reserved purpose, then the conflicting data must be escaped before
forming the URI.

Because the & is valid but reserved, it is up to the user to determine if it is meant to be encoded or not.

What you call a query parameter is not a feature of a URI and therefore the URI class has no reason to (and shouldn't) support it.

Related:

Gassman answered 11/11, 2013 at 22:36 Comment(6)
Yes, a=b&c is syntactically valid, but it does not mean what is obviously intended: a query parameter named a with a value b&c. The ampersand must be escaped, but than URI returns a messed up URL in toString(). Lets see a more realistic example, we pass a relative URL mypage?hello=world in the return parameter. The full, valid URL is: http://example.com/some?return=mypage%3Fhello%3Dworld. What should I pass in the java.net.URI multi-argument constructors, to get back this full URL?Zellazelle
@HontváriJózsefLevente Query parameters are relevant in an HTTP context. But URI is not only relevant in an HTTP context. Query parameters are interpreted by an HTTP server. In a URI they mean nothing and you'll therefore not be able to do any special formatting with the URI class.Gassman
It is not necessary for java.net.URI to understand the internals of the query part. For example it would be enough if its multi-argument constructors don't alter the perfectly valid query string I pass to them.Zellazelle
@HontváriJózsefLevente Which perfectly valid query string did you pass to it and it changed it? a=x%20y is not a valid query string. Note that the RFC states Under normal circumstances, the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts. So the a=x%20y becomes a=x%2520y. The javadoc states that, aside from some minor deviations, a java.net.URI instance represents a URI reference.Gassman
& can be both a separator within the query component and a data character. In the latter case it must be percent-encoded. Because URI does not understand the internals of the query component, it cannot decide if the ampersand is a separator or data character. Therefore, as you wrote, it is up to the user, i.e. my code, to decide which. Now if I percent-encode ampersands, which are not separators but data characters, then URI.toString() returns a bad string. I still do not know what should I pass to the URI multi-argument constructors to get back the example URL I wrote above.Zellazelle
@HontváriJózsefLevente It must only be encoded in the context of an HTTP request. The URI class doesn't know in which context you want to use it so it doesn't encode it, because that is not its job. You cannot use the URI constructor to do what you want.Gassman
N
1

The only workaround I found was to use the single-argument constructors and methods. Note that you must use URI#getRawQuery() to avoid decoding %26. For example:

URI uri = new URI("http://a/?b=c%26d&e");
// uri.getRawQuery() equals "b=c%26d&e"

uri = new URI(new URI(uri.getScheme(), uri.getAuthority(),
        uri.getPath(), null, null) + "?f=g%26h&i");
// uri.getRawQuery() equals "f=g%26h&i"

uri = uri.resolve("?j=k%26l&m");
// uri.getRawQuery() equals "j=k%26l&m"
// uri.toString() equals "http://a/?j=k%26l&m"
Newell answered 27/1, 2016 at 21:41 Comment(0)
T
0

Single working solution known for me is reflection (see https://blog.stackhunter.com/2014/03/31/encode-special-characters-java-net-uri/)

URI uri = new URI("http", null, "example.com", -1, "/accounts", null, null);
Field field = URI.class.getDeclaredField("query");
field.setAccessible(true);
field.set(uri, encodedQueryString);
//clear cached string representation
field = URI.class.getDeclaredField("string");
field.setAccessible(true);
field.set(uri, null);
Toupee answered 11/4, 2016 at 21:16 Comment(0)
D
-1

Use URLEncoder.encode() method, in your case for example:

URLEncoder.encode("a=x%20y", "ISO-8859-1");
Diella answered 11/11, 2013 at 22:21 Comment(1)
I would use some java.net.URI specific features: constructors with query and authority parameter and the relativize operation.Zellazelle

© 2022 - 2024 — McMap. All rights reserved.