HTTPURLConnection Doesn't Follow Redirect from HTTP to HTTPS
Asked Answered
G

5

115

I can't understand why Java's HttpURLConnection does not follow an HTTP redirect from an HTTP to an HTTPS URL. I use the following code to get the page at https://httpstat.us/:

import java.net.URL;
import java.net.HttpURLConnection;
import java.io.InputStream;

public class Tester {

    public static void main(String argv[]) throws Exception{
        InputStream is = null;

        try {
            String httpUrl = "http://httpstat.us/301";
            URL resourceUrl = new URL(httpUrl);
            HttpURLConnection conn = (HttpURLConnection)resourceUrl.openConnection();
            conn.setConnectTimeout(15000);
            conn.setReadTimeout(15000);
            conn.connect();
            is = conn.getInputStream();
            System.out.println("Original URL: "+httpUrl);
            System.out.println("Connected to: "+conn.getURL());
            System.out.println("HTTP response code received: "+conn.getResponseCode());
            System.out.println("HTTP response message received: "+conn.getResponseMessage());
       } finally {
            if (is != null) is.close();
        }
    }
}

The output of this program is:

Original URL: http://httpstat.us/301
Connected to: http://httpstat.us/301
HTTP response code received: 301
HTTP response message received: Moved Permanently

A request to http://httpstat.us/301 returns the following (shortened) response (which seems absolutely right!):

HTTP/1.1 301 Moved Permanently
Cache-Control: private
Content-Length: 21
Content-Type: text/plain; charset=utf-8
Location: https://httpstat.us

Unfortunately, Java's HttpURLConnection does not follow the redirect!

Note that if you change the original URL to HTTPS (https://httpstat.us/301), Java will follow the redirect as expected!?

Geomorphic answered 10/12, 2009 at 21:38 Comment(1)
Hi, I edited your question for clarity and to point out the the redirect to HTTPS in particular is the problem. Also, I changed the bit.ly domain to a different one, as use bit.ly is blacklisted in questions. Hope you don't mind, feel free to re-edit.Augustineaugustinian
T
135

Redirects are followed only if they use the same protocol. (See the followRedirect() method in the source.) There is no way to disable this check.

Even though we know it mirrors HTTP, from the HTTP protocol point of view, HTTPS is just some other, completely different, unknown protocol. It would be unsafe to follow the redirect without user approval.

For example, suppose the application is set up to perform client authentication automatically. The user expects to be surfing anonymously because he's using HTTP. But if his client follows HTTPS without asking, his identity is revealed to the server.

Transonic answered 10/12, 2009 at 22:5 Comment(9)
Thanks. I've just found confiramtion: bugs.sun.com/bugdatabase/view_bug.do?bug_id=4620571 . Namely: "After discussion among Java Networking engineers, it is felt that we shouldn't automatically follow redirect from one protocol to another, for instance, from http to https and vise versa, doing so may have serious security consequences. Thus the fix is to return the server responses for redirect. Check response code and Location header field value for redirect information. It's the application's responsibility to follow the redirect."Geomorphic
But does it follow redirect from http to http or https to https? Even that would be wrong. Isn't it?Atavistic
@Enigma You can configure that behavior globally or on a per-instance basis. By default, it does follow redirects if the schema doesn't change.Transonic
@Transonic That only applies to redirecting across the same protocol, right?Slug
@JoshuaDavis Yes, it only applies to redirects to the same protocol. An HttpURLConnection won't automatically follow redirects to a different protocol, even if the redirect flag is set.Transonic
Seems like this is not only true for change of protocol, but as well when method changes. I just found out that a redirect after POST is not followed automatically (Java SE 7).Africah
Java Networking engineers could offer a setFollowTransProtocol(true) option because if we need it we will program it anyway. FYI web browsers, curl and wget and may more follow redirects from HTTP to HTTPS and vice-versa.Prognostication
Nobody sets up auto-login on HTTPS and then expects HTTP to be "anonymous". That's nonsensical. It's perfectly safe and normal to follow redirects from HTTP to HTTPS (not the other way around). This is just a typically bad Java API.Breebreech
Edited to clarify that HTTPUrlConnection will not follow cross-protocol redirects - it' s in the code :-).Augustineaugustinian
I
68

HttpURLConnection by design won't automatically redirect from HTTP to HTTPS (or vice versa). Following the redirect may have serious security consequences. SSL (hence HTTPS) creates a session that is unique to the user. This session can be reused for multiple requests. Thus, the server can track all of the requests made from a single person. This is a weak form of identity and is exploitable. Also, the SSL handshake can ask for the client's certificate. If sent to the server, then the client's identity is given to the server.

As erickson points out, suppose the application is set up to perform client authentication automatically. The user expects to be surfing anonymously because he's using HTTP. But if his client follows HTTPS without asking, his identity is revealed to the server.

The programmer has to take extra steps to ensure that credentials, client certificates or SSL session id will not be sent before redirecting from HTTP to HTTPS. The default is to send these. If the redirection hurts the user, do not follow the redirection. This is why automatic redirect is not supported.

With that understood, here's the code which will follow the redirects.

  URL resourceUrl, base, next;
  Map<String, Integer> visited;
  HttpURLConnection conn;
  String location;
  int times;

  ...
  visited = new HashMap<>();

  while (true)
  {
     times = visited.compute(url, (key, count) -> count == null ? 1 : count + 1);

     if (times > 3)
        throw new IOException("Stuck in redirect loop");

     resourceUrl = new URL(url);
     conn        = (HttpURLConnection) resourceUrl.openConnection();

     conn.setConnectTimeout(15000);
     conn.setReadTimeout(15000);
     conn.setInstanceFollowRedirects(false);   // Make the logic below easier to detect redirections
     conn.setRequestProperty("User-Agent", "Mozilla/5.0...");

     switch (conn.getResponseCode())
     {
        case HttpURLConnection.HTTP_MOVED_PERM:
        case HttpURLConnection.HTTP_MOVED_TEMP:
           location = conn.getHeaderField("Location");
           location = URLDecoder.decode(location, "UTF-8");
           base     = new URL(url);               
           next     = new URL(base, location);  // Deal with relative URLs
           url      = next.toExternalForm();
           continue;
     }

     break;
  }

  is = conn.openStream();
  ...
Incontrovertible answered 25/9, 2014 at 18:59 Comment(6)
This is only one solution that works for more than 1 redirects. Thank you!Photoperiod
This works beautifully for multiple redirects (HTTPS API -> HTTP -> HTTP image)! Perfect simple solution.Spine
@Incontrovertible - thanks for the details, but I still don't buy it. For instance, if's under the control of the client whether any credentials or client certs are sent. If it hurts, don't do it (in this case, do not follow the redirect).Thoer
I only don't understand the location = URLDecoder.decode(location... part. This decodes a working encoded relative part (with space=+ in my case) into a non-working one. After I removed it, it was OK for me.Gd
@Gd I am not sure why you do not need it but I do.Incontrovertible
Niek is right, location = URLDecoder.decode(location, "UTF-8"); must remove, it will cause error if your URL contain multi-byte character. In my case, the file name「LR-001A-序.mp3」is my original url for download, it become 「LR-001B-%E5%BA%8F.mp3」 while 「location = conn.getHeaderField("Location");」, it is correct if you take the string as URL for next connection, it become 「LR-001B-?.mp3」after 「location = URLDecoder.decode(location, "UTF-8");」, it is wrong, you will get 404 finally.Invertebrate
B
7

As mentioned by some of you above, the setFollowRedirect and setInstanceFollowRedirects only work automatically when the redirected protocol is same . ie from http to http and https to https.

setFolloRedirect is at class level and sets this for all instances of the url connection, whereas setInstanceFollowRedirects is only for a given instance. This way we can have different behavior for different instances.

I found a very good example here http://www.mkyong.com/java/java-httpurlconnection-follow-redirect-example/

Borderland answered 1/10, 2013 at 6:12 Comment(0)
E
6

Another option can be to use Apache HttpComponents Client:

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
</dependency>

Sample code:

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("https://media-hearth.cursecdn.com/avatars/330/498/212.png");
CloseableHttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream is = entity.getContent();
Ease answered 6/7, 2018 at 16:16 Comment(0)
D
-5

HTTPUrlConnection is not responsible for handling the response of the object. It is performance as expected, it grabs the content of the URL requested. It is up to you the user of the functionality to interpret the response. It is not able to read the intentions of the developer without specification.

Dichasium answered 10/12, 2009 at 21:41 Comment(2)
Why it has setInstanceFollowRedirects in this case? ))Geomorphic
My guess is that it was a suggested feature to add in later, it makes sense.. my comment was more of reflected toward... the class is designed to go and grab web content and bring it back... people may want to get non HTTP 200 messages.Dichasium

© 2022 - 2024 — McMap. All rights reserved.