How to do URL decoding in Java?
Asked Answered
A

11

399

In Java, I want to convert this:

https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type

To this:

https://mywebsite/docs/english/site/mybook.do&request_type

This is what I have so far:

class StringUTF 
{
    public static void main(String[] args) 
    {
        try{
            String url = 
               "https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do" +
               "%3Frequest_type%3D%26type%3Dprivate";

            System.out.println(url+"Hello World!------->" +
                new String(url.getBytes("UTF-8"),"ASCII"));
        }
        catch(Exception E){
        }
    }
}

But it doesn't work right. What are these %3A and %2F formats called and how do I convert them?

Anglo answered 26/5, 2011 at 12:0 Comment(4)
@Stephen .. Why can't a url be UTF-8 encoded String .. ?Anglo
The problem is that just because the URL can be UTF-8, the question really has nothing to do with UTF-8. I've edited the question suitably.Shelli
It could be (in theory) but the string in your example is not a UTF-8 encoded String. It is a URL-encoded ASCII string. Hence the title is misleading.Cusec
It is also worth noting that all the characters in the url string are ASCII, and this is also true after the string has been URL decoded. '%' is an ASCII char and %xx represents an ASCII char if xx is less than (hexadecimal) 80.Cusec
D
758

This does not have anything to do with character encodings such as UTF-8 or ASCII. The string you have there is URL encoded. This kind of encoding is something entirely different than character encoding.

Try something like this:

try {
    String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8.name());
} catch (UnsupportedEncodingException e) {
    // not going to happen - value came from JDK's own StandardCharsets
}

Java 10 added direct support for Charset to the API, meaning there's no need to catch UnsupportedEncodingException:

String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8);

Note that a character encoding (such as UTF-8 or ASCII) is what determines the mapping of characters to raw bytes. For a good intro to character encodings, see this article.

Dabchick answered 26/5, 2011 at 12:4 Comment(10)
The methods on URLDecoder are static so you don't have to create a new instance of it.Fortress
@whataheck URL encoding is used because in some places you can't use all kinds of characters in an URL, so that some characters are escaped using a %xx code as Stephen C explains in a comment on your question above.Dabchick
Method you provided is marked as deprecated. Why is that and what is alternative?Helicon
@Helicon Only the version where you don't specify the character encoding (the second parameter, "UTF-8") is deprecated according to the Java 7 API documentation. Use the version with two parameters.Dabchick
If using java 1.7+ you can use the static version of the "UTF-8" string: StandardCharsets.UTF_8.name() from this package: java.nio.charset.StandardCharsets. Relevant to this: linkClyburn
For character encoding,this makes a great article too balusc.blogspot.in/2009/05/unicode-how-to-get-characters-right.htmlAnglo
Be careful with this. As noted here: blog.lunatech.com/2009/02/03/… This is not about URLs, but for HTML form encoding.Merridie
Useful for grails and gsp as well ... <g:message code="yourMessageID" default="" args="${URLDecoder.decode(yourVariable, "UTF-8")}"/>Sideways
this needs to be wrapped in a try/catch block.. read more about checked exceptions (this one) vs unchecked #6116396Trimming
Doesn't work if there is a '+' in url. See bugs.openjdk.java.net/browse/JDK-8179507Diverting
H
75

The string you've got is in application/x-www-form-urlencoded encoding.

Use URLDecoder to convert it to Java String.

URLDecoder.decode( url, "UTF-8" );
Homolographic answered 26/5, 2011 at 12:2 Comment(0)
S
56

This has been answered before (although this question was first!):

"You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data)."

As URL class documentation states:

The recommended way to manage the encoding and decoding of URLs is to use URI, and to convert between these two classes using toURI() and URI.toURL().

The URLEncoder and URLDecoder classes can also be used, but only for HTML form encoding, which is not the same as the encoding scheme defined in RFC2396.

Basically:

String url = "https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type";
System.out.println(new java.net.URI(url).getPath());

will give you:

https://mywebsite/docs/english/site/mybook.do?request_type
Savor answered 9/5, 2013 at 3:7 Comment(8)
In Java 1.7 the URLDecoder.decode(String, String) overload is not deprecated. You must be referring to the URLDecoder.decode(String) overload without the encoding. You might want to update your post for clarification.Cuneal
This answer is misleading; that block quote has nothing to do with the deprecation. The Javadoc of the deprecated method states, and I actually quote @deprecated The resulting string may vary depending on the platform's default encoding. Instead, use the decode(String,String) method to specify the encoding.Sacerdotalism
@Klever, not for me. I believe you're using URL instead of URI, but you haven't provided enough information to reproduce your results.Savor
getPath() for URIs only returns the path part of the URI, as noted above.Chancy
@Chancy - Perhaps you could provide some information so we can replicate your behaviour? I'm using SUN JDK 1.8.0_73 - and it still works today.Savor
Unless I'm mistaken, the "path" is known to be that part of a URI after the authority part (see: en.wikipedia.org/wiki/Uniform_Resource_Identifier for definition of path) - it seems to me the behaviour I am seeing is the standard/correct behaviour. I'm using java 1.8.0_101 (on Android Studio). I'd be curious to see what you get as "getAuthority()" is called. Even this article/example seems to indicate that path is only the /public/manual/appliances part of their URI:quepublishing.com/articles/article.aspx?p=26566&seqNum=3Chancy
@Chancy The code in the post actually does print the output that it shows (at least for me). I think the reason for this is that, because of the URL encoding, the URI constructor is actually treating the entire string, (https%3A%2F...), as just the path of a URI; there is no authority, or query, etc. This can be tested by calling the respective get methods on the URI object. If you pass the decoded text to the URI constructor: new URI("https://mywebsite/do....."), then calling getPath() and other methods will give correct results.Aged
@Chancy The escaped text isn't a valid URL, and so it has no "query" portion or any of the other properties of a URL. A better way to think about it would be like this: When you want to put a / character into a url's path, without it acting as a syntactical character, you have to escape it. Since everything in that string is escaped, it's all a part of the url's path.Aged
F
16

%3A and %2F are URL encoded characters. Use this java code to convert them back into : and /

String decoded = java.net.URLDecoder.decode(url, "UTF-8");
Fortress answered 26/5, 2011 at 12:3 Comment(2)
it not convert %2C too, it's (,)Trifid
this needs to be wrapped in a try/catch block.. read more about checked exceptions (this one) vs unchecked #6116396Trimming
S
6

I use apache commons

String decodedUrl = new URLCodec().decode(url);

The default charset is UTF-8

Selfdenial answered 10/8, 2014 at 12:31 Comment(0)
P
6
public String decodeString(String URL)
    {

    String urlString="";
    try {
        urlString = URLDecoder.decode(URL,"UTF-8");
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block

        }

        return urlString;

    }
Perigordian answered 16/6, 2015 at 7:12 Comment(1)
Could you please elaborate more your answer adding a little more description about the solution you provide?Taxpayer
A
5
 try {
        String result = URLDecoder.decode(urlString, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
Adp answered 14/11, 2014 at 17:44 Comment(0)
C
2
import java.io.UnsupportedEncodingException;
import java.net.URISyntaxException;

public class URLDecoding { 

    String decoded = "";

    public String decodeMethod(String url) throws UnsupportedEncodingException
    {
        decoded = java.net.URLDecoder.decode(url, "UTF-8"); 
        return  decoded;
//"You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data)."
    }

    public String getPathMethod(String url) throws URISyntaxException 
    {
        decoded = new java.net.URI(url).getPath();  
        return  decoded; 
    }

    public static void main(String[] args) throws UnsupportedEncodingException, URISyntaxException 
    {
        System.out.println(" Here is your Decoded url with decode method : "+ new URLDecoding().decodeMethod("https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type")); 
        System.out.println("Here is your Decoded url with getPath method : "+ new URLDecoding().getPathMethod("https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest")); 

    } 

}

You can select your method wisely :)

Calomel answered 5/6, 2014 at 6:35 Comment(0)
F
2

If it is integer value, we have to catch NumberFormatException also.

try {
        Integer result = Integer.valueOf(URLDecoder.decode(urlNumber, "UTF-8"));
    } catch (NumberFormatException | UnsupportedEncodingException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
Fructiferous answered 30/5, 2021 at 12:42 Comment(0)
P
1

Using java.net.URI class:

public String getDecodedURL(String encodedUrl) {
    try {
        URI uri = new URI(encodedUrl);
        return uri.getScheme() + ":" + uri.getSchemeSpecificPart();
    } catch (Exception e) {
        return "";
    }
}

Please note that exception handling can be better, but it's not much relevant for this example.

Polyhydric answered 14/4, 2020 at 13:20 Comment(0)
D
-1

I was having this problem too and came here as an answer. But I used the code of the friend whose question was approved, it didn't work. I tried something different and it worked, so I'm sharing the following line of code in case it helps.

URLDecoder.decode(URLDecoder.decode(url, StandardCharsets.UTF_8)))
Disney answered 30/9, 2022 at 11:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.