Validating URL in Java
Asked Answered
A

12

119

I wanted to know if there is any standard APIs in Java to validate a given URL? I want to check both if the URL string is right i.e. the given protocol is valid and then to check if a connection can be established.

I tried using HttpURLConnection, providing the URL and connecting to it. The first part of my requirement seems to be fulfilled but when I try to perform HttpURLConnection.connect(), 'java.net.ConnectException: Connection refused' exception is thrown.

Can this be because of proxy settings? I tried setting the System properties for proxy but no success.

Let me know what I am doing wrong.

Asher answered 21/10, 2009 at 11:38 Comment(2)
There seem to be 2 questions here; URL validation and finding the cause of a ConnectExceptionAllodium
Since this is the first google hit for java url validator, there are indeed to questions here, how to validate the url (from looking at the string) and how to check if the url is reachable (via an http connection, for example).Fickle
B
173

For the benefit of the community, since this thread is top on Google when searching for
"url validator java"


Catching exceptions is expensive, and should be avoided when possible. If you just want to verify your String is a valid URL, you can use the UrlValidator class from the Apache Commons Validator project.

For example:

String[] schemes = {"http","https"}; // DEFAULT schemes = "http", "https", "ftp"
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("ftp://foo.bar.com/")) {
   System.out.println("URL is valid");
} else {
   System.out.println("URL is invalid");
}
Barfly answered 22/2, 2011 at 13:37 Comment(11)
That URLValidator class is marked deprecated. The recommended URLValidator is in the routines package: commons.apache.org/validator/apidocs/org/apache/commons/…Multiangular
is there a similar library for validating email addresses as well?Burrstone
I fail to see how this is standard APIAstrolabe
UrlValidator has its own set of known issues. Is there an alternate library that is being maintained more actively?Dynamotor
@AlexAverbuch: can you please outline what the issues are with UrlValidator? It's not very helpful to just say they exist but not say what they are.Jim
try domains such as something.london something.anothercity that are out nowExponent
@AlexAverbuch: It seems that Commons Validator issues are getting fixed in a rather timely manner: issues.apache.org/jira/browse/… If you find any other issue, please report it, thanks!Botts
We use security scanning software to identify security vulnerabilities in third party libraries, and unfortunately commons-validator contains commons-beanutils which is identified red (security vulnerability). Is there another (slimmer) library / API ?Fickle
@Fickle the org.apache.commons.validator.routines.UrlValidator doesn't uses the beanutils (at least in latest 1.5.1 version). Perhaps you can just exclude the beanutils dependency?Barfly
A question about the commons library: Why aren't these functions simple static functions? Why do I need to create a UrlValidator object to validate 1 URL? What utility do they get out having that "state"?Fatness
@ParthMehrotra I'm 4 years late, but the main reason for this is that you can mock the validation in tests, and also you can register the validator as a bean and configure it only once.Toxophilite
A
41

The java.net.URL class is in fact not at all a good way of validating URLs. MalformedURLException is not thrown on all malformed URLs during construction. Catching IOException on java.net.URL#openConnection().connect() does not validate URL either, only tell wether or not the connection can be established.

Consider this piece of code:

    try {
        new URL("http://.com");
        new URL("http://com.");
        new URL("http:// ");
        new URL("ftp://::::@example.com");
    } catch (MalformedURLException malformedURLException) {
        malformedURLException.printStackTrace();
    }

..which does not throw any exceptions.

I recommend using some validation API implemented using a context free grammar, or in very simplified validation just use regular expressions. However I need someone to suggest a superior or standard API for this, I only recently started searching for it myself.

Note It has been suggested that URL#toURI() in combination with handling of the exception java.net. URISyntaxException can facilitate validation of URLs. However, this method only catches one of the very simple cases above.

The conclusion is that there is no standard java URL parser to validate URLs.

Andvari answered 11/5, 2011 at 14:18 Comment(2)
Have you found a solution to this problem??Oftentimes
@bi0s.kidd0 There are several libraries that can be used, but we decided to roll our own. It's not complete, but can parse what we are interested in, including URLs containing either domains or IPs (both v4 and v6). github.com/jajja/arachneAndvari
P
33

You need to create both a URL object and a URLConnection object. The following code will test both the format of the URL and whether a connection can be established:

try {
    URL url = new URL("http://www.yoursite.com/");
    URLConnection conn = url.openConnection();
    conn.connect();
} catch (MalformedURLException e) {
    // the URL is not in a valid form
} catch (IOException e) {
    // the connection couldn't be established
}
Plaything answered 21/10, 2009 at 11:47 Comment(9)
Note there are multiple ways of checking for malformed urls / problems. For example, if you will be using your url for a new HttpGet(url), then you can catch the IllegalArgumentException HttpGet(...) throws if there's a malformed url. And HttpResponse will throws stuff at you too if there's a problem with getting the data.Nada
Connection validates only host availability. Has nothing to do with validness of URL.Symbolic
MalformedURLException is not a safe strategy to test the valid form of a URL. This answer is misleading.Andvari
@Martin: can you elaborate why it isn't safe?Photic
@JeroenVannevel I already have, in an answer to the OP question. The fact that the constructor throws MalformedURLException does not mean that the format is validated.Andvari
@Martin: sorry, missed that post!Photic
This is very, very expensive. openConnection/connect will actually try to connect to the http resource. This must be one of the most expensive ways I have ever seen to verify an URL.Illative
Moreover for any Android Developers coming along, the solution should be used on Background thread (or AsyncTask) otherwise you will get the exception android.os.NetworkOnMainThreadExceptionBattue
This needs an actual connection... the URL can still be VALID if there is no connection...Brawley
A
25

Using only standard API, pass the string to a URL object then convert it to a URI object. This will accurately determine the validity of the URL according to the RFC2396 standard.

Example:

public boolean isValidURL(String url) {

    try {
        new URL(url).toURI();
    } catch (MalformedURLException | URISyntaxException e) {
        return false;
    }

    return true;
}
Astrolabe answered 27/7, 2013 at 5:30 Comment(4)
Note that this string->url->uri validation scheme reports that these test cases are valid: "http://.com" "com." "ftp://::::@example.com" "http:/test.com" "http:test.com" "http:/:" So while this is standard API, the validation rules it applies may not be what one expects.Olnee
@Olnee Are you saying that the RFC2396 spec is faulty, or the JAVA URI implementation does not honor RFC2396 standard specs?Brawley
@Brawley no. I was suggesting that RFC2396 allows a range of valid formats that one might find surprising and more permissive than anticipated, so depending on one's requirements additional validation steps may be desirable.Olnee
@Olnee So, you are saying that the RFC2396 "standard" is permissive...Brawley
S
11

There is a way to perform URL validation in strict accordance to standards in Java without resorting to third-party libraries:

boolean isValidURL(String url) {
  try {
    new URI(url).parseServerAuthority();
    return true;
  } catch (URISyntaxException e) {
    return false;
  }
}

The constructor of URI checks that url is a valid URI, and the call to parseServerAuthority ensures that it is a URL (absolute or relative) and not a URN.

Starchy answered 8/8, 2017 at 15:57 Comment(9)
The exception is thrown "If the authority component of this URI is defined but cannot be parsed as a server-based authority according to RFC 2396". While this is much better than most other proposals, it cannot validate a URL.Andvari
@Martin, You forgot about the validation in the constructor. As I wrote, the combination of the URI constructor call and the parseServerAuthority call validates the URL, not parseServerAuthority alone.Starchy
You can find examples on this page that are incorrectly validated by your suggestion. Refer to documentation, and if it's not designed for your intended use, please don't promote to exploit it.Andvari
@Martin, Can you be more specific? Which examples in your opinion are incorrectly validated by this method?Starchy
@Asu And this is a valid URL according to RFC 2396! https is both the schema and the host there.Starchy
@Starchy With two of "://"?Hexa
@Hexa yes. The second :// comes after the host, : introduces the port number, which can be empty according to the syntax. // is a part of the path with an empty segment, which is also valid. If you enter this address in your browser it will try to open it (but most probably won't find the server named https ;)).Starchy
Sigh.. Good point. Completely counter-intuitive that they let the port be empty after the colon.Hexa
So, the URI Java implementation honors the RFC2396 standard spec?Brawley
B
8

Use the android.webkit.URLUtil on android:

URLUtil.isValidUrl(URL_STRING);

Note: It is just checking the initial scheme of URL, not that the entire URL is valid.

Boreal answered 18/12, 2015 at 15:41 Comment(2)
Only if you are working on an android application ofcourse.Extrabold
It only checks if url starts from correct prefix: http://, https//, about:, etcBettis
G
1

Just important to point that the URL object handle both validation and connection. Then, only protocols for which a handler has been provided in sun.net.www.protocol are authorized (file, ftp, gopher, http, https, jar, mailto, netdoc) are valid ones. For instance, try to make a new URL with the ldap protocol:

new URL("ldap://myhost:389")

You will get a java.net.MalformedURLException: unknown protocol: ldap.

You need to implement your own handler and register it through URL.setURLStreamHandlerFactory(). Quite overkill if you just want to validate the URL syntax, a regexp seems to be a simpler solution.

Globetrotter answered 4/2, 2011 at 10:50 Comment(0)
C
0

Are you sure you're using the correct proxy as system properties?

Also if you are using 1.5 or 1.6 you could pass a java.net.Proxy instance to the openConnection() method. This is more elegant imo:

//Proxy instance, proxy ip = 10.0.0.1 with port 8080
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("10.0.0.1", 8080));
conn = new URL(urlString).openConnection(proxy);
Checkbook answered 21/10, 2009 at 11:47 Comment(1)
Why would this be elegant or even correct? It uses expensive resources when it works, and it does not work for a correct URL is not available for connection when tested.Andvari
A
0

For basic need you can use:

private boolean isValidUrl(String urlString){
        try {
            URL url = new URL(urlString);
            url.toURI();
            return Patterns.WEB_URL.matcher(urlString).matches();
        } catch (MalformedURLException | URISyntaxException e) {
            return false;
        }
    }

If you want to ensure url is reachable, you may need to do network call in background thread. I used rxjava:

new SingleFromCallable<>(new Callable<Boolean>() {
            @Override
            public Boolean call() {
                return URLIsReachable(urlString);
            }
        })
                .subscribeOn(Schedulers.io())
                .observeOn(AndroidSchedulers.mainThread())
                .subscribe(new SingleObserver<Boolean>() {
                    @Override
                    public void onSubscribe(@io.reactivex.rxjava3.annotations.NonNull Disposable d) {}

                    @Override
                    public void onSuccess(@io.reactivex.rxjava3.annotations.NonNull Boolean isValidUrl) {
                        loadingDialog.dismiss();
                        if(isValidUrl) // do your stuff
                    }

                    @Override
                    public void onError(@io.reactivex.rxjava3.annotations.NonNull Throwable e) {
                        loadingDialog.dismiss();
                    }
                });
private boolean URLIsReachable(String urlString) {
        try {
            URL url = new URL(urlString);
            HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
            int responseCode = urlConnection.getResponseCode();
            urlConnection.disconnect();
            return responseCode == 200;
        } catch (IOException e) {
            e.printStackTrace();
            return false;
        }
    }
Aquiver answered 21/12, 2023 at 8:44 Comment(0)
A
-1

I think the best response is from the user @b1nary.atr0phy. Somehow, I recommend combine the method from the b1nay.atr0phy response with a regex to cover all the possible cases.

public static final URL validateURL(String url, Logger logger) {

        URL u = null;
        try {  
            Pattern regex = Pattern.compile("(?i)^(?:(?:https?|ftp)://)(?:\\S+(?::\\S*)?@)?(?:(?!(?:10|127)(?:\\.\\d{1,3}){3})(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))\\.?)(?::\\d{2,5})?(?:[/?#]\\S*)?$");
            Matcher matcher = regex.matcher(url);
            if(!matcher.find()) {
                throw new URISyntaxException(url, "La url no está formada correctamente.");
            }
            u = new URL(url);  
            u.toURI(); 
        } catch (MalformedURLException e) {  
            logger.error("La url no está formada correctamente.");
        } catch (URISyntaxException e) {  
            logger.error("La url no está formada correctamente.");  
        }  

        return u;  

    }
Asbestos answered 27/10, 2019 at 10:34 Comment(1)
There are a couple of problems with this regex: 1. URLs without the prefix are invalid, (e.g. "stackoverflow.com"), this also includes URLs with two suffixes if they're missing the prefix (e.g. "amazon.co.uk"). 2. IPs are always invalid (e.g. "127.0.0.1"), no matter if they use the prefix or not. I'd suggest using "((http|https|ftp)://)?((\\w)*|([0-9]*)|([-|_])*)+([\\.|/]((\\w)*|([0-9]*)|([-|_])*))+" (source). The only downside to this regex is that e.g. "127.0..0.1" and "127.0" are valid.Colter
F
-1

This is what I use to validate CDN urls (must start with https, but that's easy to customise). This will also not allow using IP addresses.

public static final boolean validateURL(String url) {  
    var regex = Pattern.compile("^[https:\\/\\/(www\\.)?a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)");
    var matcher = regex.matcher(url);
    return matcher.find();
}
Fraternity answered 19/1, 2022 at 18:31 Comment(0)
A
-3

Thanks. Opening the URL connection by passing the Proxy as suggested by NickDK works fine.

//Proxy instance, proxy ip = 10.0.0.1 with port 8080
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("10.0.0.1", 8080));
conn = new URL(urlString).openConnection(proxy);

System properties however doesn't work as I had mentioned earlier.

Thanks again.

Regards, Keya

Asher answered 21/10, 2009 at 11:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.