How to encode URL to avoid special characters in Java? [duplicate]
Asked Answered
H

5

41

i need java code to encode URL to avoid special characters such as spaces and % and & ...etc

Handpick answered 31/12, 2010 at 17:12 Comment(2)
this kind of issue is precisely the reason why some site never, ever, use characters that need escaping/encoding in their URLs. StackOverflow seems to be such a site (just look at the clean URL). I think people linking to files in URLs that have, say, spacing characters in their name (and people using spacing characters in files for that matter) deserve to be shot. And those arguing with this fact even more so ;)Ammeter
(Duplicate suggested by @fmucar in their answer below).Affenpinscher
E
75

URL construction is tricky because different parts of the URL have different rules for what characters are allowed: for example, the plus sign is reserved in the query component of a URL because it represents a space, but in the path component of the URL, a plus sign has no special meaning and spaces are encoded as "%20".

RFC 2396 explains (in section 2.4.2) that a complete URL is always in its encoded form: you take the strings for the individual components (scheme, authority, path, etc.), encode each according to its own rules, and then combine them into the complete URL string. Trying to build a complete unencoded URL string and then encode it separately leads to subtle bugs, like spaces in the path being incorrectly changed to plus signs (which an RFC-compliant server will interpret as real plus signs, not encoded spaces).

In Java, the correct way to build a URL is with the URI class. Use one of the multi-argument constructors that takes the URL components as separate strings, and it'll escape each component correctly according to that component's rules. The toASCIIString() method gives you a properly-escaped and encoded string that you can send to a server. To decode a URL, construct a URI object using the single-string constructor and then use the accessor methods (such as getPath()) to retrieve the decoded components.

Don't use the URLEncoder class! Despite the name, that class actually does HTML form encoding, not URL encoding. It's not correct to concatenate unencoded strings to make an "unencoded" URL and then pass it through a URLEncoder. Doing so will result in problems (particularly the aforementioned one regarding spaces and plus signs in the path).

Edbert answered 31/12, 2010 at 17:50 Comment(6)
I am trying to write it mysekf .... does it correct ? public static String encode(String str) { StringBuilder sb=new StringBuilder(); for(int i=0;i<str.length();i++){ switch(str.charAt(i)){ case ' ': sb.append("%20");break; case '!': sb.append("%21");break; case '"': sb.append("%22");break; default: sb.append(str.charAt(i)); } } return sb.toString(); }Handpick
@adham, no, correct URL encoding is much more than just escaping those three characters. That code may produce correct results for some URLs, but there are many that it won't handle correctly. Just use the URI class.Edbert
yes i know there many other characters .. but plz can you givs a simplw example for URI classHandpick
From testing, it seems that you still have to manually replaces spaces in the query with '+' signs. Have I made a mistake somewhere?Jeramie
@EdwardFalk, you're right, URI doesn't seem to convert spaces to '+' in a query string. I couldv'e sworn it did, because I learned about the URI class while resolving space/plus problems in my own code, but that was several years ago and I don't remember the details. Since the plus signs are really part of the application/x-www-form-urlencoded MIME type and not the URI syntax itself, I guess it'd make sense to run (only) the query string through a URLEncoder before putting it into a URI. I'll need to rethink and maybe rewrite my answer, though.Edbert
I think you're exactly right.Jeramie
G
13

I also spent quite some time with this issue, so that's my solution:

String urlString2Decode = "http://www.test.com/äüö/path with blanks/";
String decodedURL = URLDecoder.decode(urlString2Decode, "UTF-8");
URL url = new URL(decodedURL);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String decodedURLAsString = uri.toASCIIString();
Gushy answered 17/1, 2018 at 23:8 Comment(1)
Idea Intellij adds suggestion to convert "UTF-8" to StandardCharsets.UTF_8 and does it automatically on code reformat. It then breaks the code - error java: incompatible types: java.nio.charset.Charset cannot be converted to java.lang.String appears. I fixed it by using StandardCharsets.UTF_8.name() instead of "UTF-8"Hobby
F
7

If you don't want to do it manually use Apache Commons - Codec library. The class you are looking at is: org.apache.commons.codec.net.URLCodec

String final url = "http://www.google.com?...."
String final urlSafe = org.apache.commons.codec.net.URLCodec.encode(url);
Featurelength answered 31/12, 2010 at 17:17 Comment(3)
Why did this answer get downvoted?Locular
Just a note. URLCodec is meant as a replacement for URLEncoder/URLDecoder. It is NOT MEANT to encode an entire url, which is what the example shows. Meaning even http:// turns into http%3A%2F%2F. Unless that is really what you want to do.Marketplace
Both are giving me same "wrong" result URLCodec urlCodec = new URLCodec(); System.out.println(urlCodec.encode("FOO BAR")); // FOO+BAR System.out.println(URLEncoder.encode("FOO BAR", "UTF-8")); FOO+BARBerdichev
T
2

Here is my solution which is pretty easy:

Instead of encoding the url itself i encoded the parameters that I was passing because the parameter was user input and the user could input any unexpected string of special characters so this worked for me fine :)

String review="User input"; /*USER INPUT AS STRING THAT WILL BE PASSED AS PARAMTER TO URL*/
try {
    review = URLEncoder.encode(review,"utf-8");
    review = review.replace(" " , "+");
} catch (UnsupportedEncodingException e) {
    e.printStackTrace();
}
String URL = "www.test.com/test.php"+"?user_review="+review;
Turaco answered 19/9, 2016 at 7:33 Comment(2)
Sorry, I accidentaly did that but this solution works.Turaco
I am also using the same but this only generates the %2c or some character while encoding the string.Whitten
G
1

I would echo what Wyzard wrote but add that:

  • for query parameters, HTML encoding is often exactly what the server is expecting; outside these, it is correct that URLEncoder should not be used
  • the most recent URI spec is RFC 3986, so you should refer to that as a primary source

I wrote a blog post a while back about this subject: Java: safe character handling and URL building

Gauthier answered 31/12, 2010 at 20:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.