Encoding URL query parameters in Java
Asked Answered
M

9

132

How does one encode query parameters to go on a url in Java? I know, this seems like an obvious and already asked question.

There are two subtleties I'm not sure of:

  1. Should spaces be encoded on the url as "+" or as "%20"? In chrome if I type in "http://google.com/foo=?bar me" chrome changes it to be encoded with %20
  2. Is it necessary/correct to encode colons ":" as %3B? Chrome doesn't.

Notes:

  • java.net.URLEncoder.encode doesn't seem to work, it seems to be for encoding data to be form submitted. For example, it encodes space as + instead of %20, and encodes colon which isn't necessary.
  • java.net.URI doesn't encode query parameters
Magistrate answered 16/3, 2011 at 18:38 Comment(2)
This question looks useful: #444612Magistrate
the structure of the query part is server-dependent, though most expect application/x-www-form-urlencoded key/value pairs. See here for more: illegalargumentexception.blogspot.com/2009/12/…Siva
L
150

java.net.URLEncoder.encode(String s, String encoding) can help too. It follows the HTML form encoding application/x-www-form-urlencoded.

URLEncoder.encode(query, "UTF-8");

On the other hand, Percent-encoding (also known as URL encoding) encodes space with %20. Colon is a reserved character, so : will still remain a colon, after encoding.

Lauber answered 16/3, 2011 at 18:49 Comment(13)
I mentioned that I didn't think that does url encoding, instead it encodes data to be submitted via a form. comments?Magistrate
That's because URLEncoder is conformed to application/x-www-form-urlencoded MIME format (which is a valid HTML form encoding). I'm assuming that's not what you're looking for.Lauber
Right, so doesn't that disqualify your answer? Or, are you saying its output is still valid, just stricter than necessary?Magistrate
@Alex Black, I just updated my comment. I'm assuming you're looking for encoding to conform to URI as specified in RFC2396.Lauber
Btw, if you're using HttpClient 4, you don't need to as HttpClient does it for you.Lauber
Yes, RFC2396 looks like the encoding I want. It looks to me like URLEncoder.encode is for w3.org/TR/html401/interact/forms.html#form-content-typeMagistrate
@Elite: I can't ever seem to figure out what you mean :) Yes, I am using HttpClient 4, so far its not doing it for me. Are you saying there is a method in it that does?Magistrate
No, If you're doing an HttpGet then encoding is necessary, but generally, passing parameters with HttpParams, HttpClient 4 knows how to encode them.Lauber
I ended up using URLEncoder.encode and replacing "+" with "%20"Magistrate
Because one of the (3rd party) sites I am sending HTTP requests to does not decode "+" to " ", but it does decode "%20" to " ".Magistrate
It encodes slashes to "%2F", shouldn't it leave the URL slashes as they are?Umbilicate
@Umbilicate No, it shouldn't. You are supposed to give it parameter value only and not the whole URL. Consider example http://example.com/?url=http://example.com/?q=c&sort=name. Should it encode &sort=name or not? There is no way to distinguish value from the URL. That is the exact reason why you need value encoding in the first place.Bimbo
But actually, slash is a legal character in querystring parameter values.Screed
T
24

Unfortunately, URLEncoder.encode() does not produce valid percent-encoding (as specified in RFC 3986).

URLEncoder.encode() encodes everything just fine, except space is encoded to "+". All the Java URI encoders that I could find only expose public methods to encode the query, fragment, path parts etc. - but don't expose the "raw" encoding. This is unfortunate as fragment and query are allowed to encode space to +, so we don't want to use them. Path is encoded properly but is "normalized" first so we can't use it for 'generic' encoding either.

Best solution I could come up with:

return URLEncoder.encode(raw, StandardCharsets.UTF_8).replaceAll("\\+", "%20");

If replaceAll() is too slow for you, I guess the alternative is to roll your own encoder...

EDIT: I had this code in here first which doesn't encode "?", "&", "=" properly:

//don't use - doesn't properly encode "?", "&", "="
new URI(null, null, null, raw, null).toString().substring(1);
Tremor answered 30/7, 2015 at 11:49 Comment(4)
+ is a perfectly valid encoding of a space.Rainey
@LawrenceDol it's true but sometimes + may be interpreted incorrectly - take a look at C# blogs.msdn.microsoft.com/yangxind/2006/11/08/…Basis
This. I compared various alternatives against Javascript's encodeURIComponent method output, and this was the only exact match for the ones I tried (queries with spaces, Turkish and German special characters).Outrider
Ahmet+Mehmet Demir => Ahmet%2BMehmet+Demir , According to my understanding the only problem here is MIME type application/x-www-form-urlencoded. In such cases space is encoded to + char, if the intention was searching two entries in a web form, like google search by a GET request. URI RFC allows + char as a valid char. So, it doesn't need to be escaped normally.Vicious
B
16

EDIT: URIUtil is no longer available in more recent versions, better answer at Java - encode URL or by Mr. Sindi in this thread.


URIUtil of Apache httpclient is really useful, although there are some alternatives

URIUtil.encodeQuery(url);

For example, it encodes space as "+" instead of "%20"

Both are perfectly valid in the right context. Although if you really preferred you could issue a string replace.

Borsch answered 16/3, 2011 at 18:41 Comment(7)
I would have to agree. Use HttpClient, you will be much happier.Cutter
That look promising, got a link by chance? I'm googling but finding many.Magistrate
This method doesn't seem to be present in HttpClient 4.1? hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/…Magistrate
@Alex, hmm that's annoying, I've always used that routine with good results. One idea is to grab the source code from the 3 release since they now obviously didn't want to maintain it anymore.Dermatoglyphics
Along time ago I copied the class from the old HTTP commons (and altered it so it was a single class) and put it on gist: gist.github.com/agentgt/3011049Epley
You bet this is annoying. Currently, there is a URLEncodedUtils.encodeFormFields which is a private static method. Wouldn't it be reasonable to this method be declared as public?Malvoisie
URIUtil.encodeWithinQuery is what you would use an encode an individual query parameter, which is what the original question seemed to be asking.Thorax
G
11

It is not necessary to encode a colon as %3B in the query, although doing so is not illegal.

URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
query       = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

It also seems that only percent-encoded spaces are valid, as I doubt that space is an ALPHA or a DIGIT

look to the URI specification for more details.

Germainegerman answered 16/3, 2011 at 18:50 Comment(4)
But doing so can change the meaning of the URI, since the interpretation of the query string is up to the server. If you are producing a application/x-www-form-urlencoded query string, either is fine. If you are fixing up a URL that the user typed/pasted in, : should be left alone.Benbow
@Benbow You are right, if colon is being used as a general delimiter (page 12 of the RFC); however, if it is not being used as a general delimiter, then both encodings should resolve identically.Germainegerman
You also have to be careful as URLs are not really a subset of URI: adamgent.com/post/25161273526/urls-are-not-a-subset-of-urisEpley
A colon is %3A not %3B (thats a semicolon), for anybody who is manually encodingTitbit
D
4

The built in Java URLEncoder is doing what it's supposed to, and you should use it.

A "+" or "%20" are both valid replacements for a space character in a URL. Either one will work.

A ":" should be encoded, as it's a separator character. i.e. http://foo or ftp://bar. The fact that a particular browser can handle it when it's not encoded doesn't make it correct. You should encode them.

As a matter of good practice, be sure to use the method that takes a character encoding parameter. UTF-8 is generally used there, but you should supply it explicitly.

URLEncoder.encode(yourUrl, "UTF-8");
Danit answered 16/3, 2011 at 19:41 Comment(4)
+ is only a representation of space in application/x-www-form-urlencoded; it is not guaranteed to work even when restricted to HTTP. Similarly, : is valid in a query string and should not be converted to %3B; a server can choose to interpret them differently.Benbow
this method also encode whole url slashes and other characters which are part e.g http:// to http%3A%2F%2F which is not correctDipole
@ToKra you are not supposed to encode the http:// part. The method is for query parameters and encoded form data. If, however, you wanted to pass the URL of another website as a query parameter, THEN you would want to encode it to avoid confusing the URL parser.Chavarria
@tc My reading of w3.org/TR/html4/interact/forms.html#h-17.13.3.3 is that all GET form data is encoded as application/x-www-form-urlencoded content type. Doesn't that mean is must work for HTTP?Chavarria
A
3

I just want to add anther way to resolve this problem.

If your project depends on spring web, you can use their utils.

import org.springframework.web.util.UriUtils

import java.nio.charset.StandardCharsets

UriUtils.encode('vip:104534049:5', StandardCharsets.UTF_8)

Output:

vip%3A104534049%3A5

Aphesis answered 13/10, 2021 at 2:51 Comment(0)
P
0
String param="2019-07-18 19:29:37";
param="%27"+param.trim().replace(" ", "%20")+"%27";

I observed in case of Datetime (Timestamp) URLEncoder.encode(param,"UTF-8") does not work.

Psychotomimetic answered 29/11, 2019 at 8:30 Comment(0)
C
0

The white space character " " is converted into a + sign when using URLEncoder.encode. This is opposite to other programming languages like JavaScript which encodes the space character into %20. But it is completely valid as the spaces in query string parameters are represented by +, and not %20. The %20 is generally used to represent spaces in URI itself (the URL part before ?).

Clarino answered 17/2, 2022 at 5:55 Comment(0)
B
-3

if you have only space problem in url. I have used below code and it work fine

String url;
URL myUrl = new URL(url.replace(" ","%20"));

example : url is

www.xyz.com?para=hello sir

then output of muUrl is

www.xyz.com?para=hello%20sir

Burnet answered 23/9, 2019 at 6:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.