Java file download hangs
Asked Answered
B

2

5

I have a web interface which is used to download a file. When the request comes in, my glassfish server streams the file from a web service and then writes the content to the outputstream. My code works fine except when the file size becomes very large (like more than 200 MB), it hangs showing 0% donwloaded in the browser and the file is never downloaded.

When I move flush() method inside the while loop it works fine for large files as well. I am not sure if putting flush() in a loop is a problem. Not sure how this thing actually works. My code is as follows :

HttpURLConnection conn = (HttpURLConnection) downloadUri.toURL().openConnection();
        conn.setDoOutput(true);
        conn.setRequestMethod("GET");
        conn.setRequestProperty("Content-Type", "application/pdf");
        if (conn.getResponseCode() == 200) {
            ServletOutputStream output;
            try (InputStream inputStream = conn.getInputStream()) {
                HttpServletResponse response = (HttpServletResponse) FacesContext.getCurrentInstance().getExternalContext().getResponse();
                response.setContentType("application/octet-stream");
                response.setHeader("Content-Length", conn.getHeaderField("Content-Length"));
                response.setHeader("Content-Disposition", "attachment; filename=\"" + abbr + ".pdf\"");
                output = response.getOutputStream();
                byte[] buffer = new byte[1024];
                int bytesRead;                    
                while ((bytesRead = inputStream.read(buffer)) != -1) {
                    output.write(buffer, 0, bytesRead);                        
                }
            }                 
            output.flush();
            output.close();

Any thoughts?. Thank you for looking into this.

Babbage answered 5/9, 2014 at 16:1 Comment(10)
The method output.flush() forces the stream to output whatever it has buffered. If you don't flush the downloaded bytes will be stored in output until it flushes itself. Not sure when that happens.Renatorenaud
It is Friday evening but maybe a header "Connection: Keep-Alive" might help. 1024 is tiny. Check that Content-Length is correct, does not yield null or -1 for > 2 GB. You could omit it for testing. Compression interesting?Sunbow
Have you tested with different browsers? Does the download bar show % with smaller files?Renatorenaud
I would not specify the content length. it's possible that value is not what you want to return (e.g. depending on chunked encoding, etc). instead, let the servlet handle the length for you.Hyaluronidase
Maybe the server gets waiting too long when flush() is outside the loop and closes the connection? The loop doesn't realize the connection is closes, but continues to try to read from the inputStream eternally.Renatorenaud
how long does it take to download one of these files?Hyaluronidase
@Hyaluronidase : Like 2 to 3 min. I am in the network which hosts all the servers. Not sure how the performance will be for external users.Babbage
did you try putting a proxy in the middle (like charles proxy) to see what happens?Hyaluronidase
No I did not. Also I don't have privileges to do that.Babbage
to do what? charles proxy runs as a user application. i assume you can control the proxy settings of the server since you are developing it.Hyaluronidase
K
7

The flush() method instructs the stream to actually send the output down the stream pipe.

Various stream implementation can, for various performance reasons, cache the output and not write to the underlying stream right away.

For example to save IO operations on disk which are expensive from a performance point of view.

There is no problem in flushing a stream, if not for performances, which in this case is what you want : the stream seems to be stuck until you flush it, so you want it to actually send stuff to the client.

Maybe you can play with the size of your buffer, with something bigger than 1024, to see what fits better.

EDIT :

The problem of flushing in a loop or not in a loop is relatively not relevant.

You can call flush whenever you want, as said it will call the underlying OS stream, whether this is a performance hit or no depends on the situation.

For example, you could value the 200MB of ram in which the stream is buffering the file more important, also performance-wise, than the IO operation.

Or much more simply value the user experience of seeing the file actually downloading more important than the eventual performance hit you might maybe experience, if you manage to measure it.

As said, the larger is your buffer, the less the problem of the loop is. Suppose, as an extreme example, your buffer is 100 megabyte, then an 80 megabyte file will get only one flush, which it would get anyway at the end of the request.

Having 1k of buffer is probably too small, 4k better, 16k fine, it's a tradeoff between IO calls and RAM consumption.

The stream should do it's proper work itself, if however you're seeing that a 200MB file get's fully cached unless you call flush, then obviously the stream is probably optimizing performances but giving a bad user experience, so obviously you need it in the loop.

Kynan answered 5/9, 2014 at 16:8 Comment(10)
I want to know where exactly to put the flush() method. Putting it in a loop is fine? My files size can go upto 1GB. I will try to play with buffer size but I was wondering if there is a standard implementation that I can use.Babbage
putting it in a loop is perfectly fine.Kynan
@Babbage You could use a buffer up to 4096 (4*1024) bytes (4 KiB) pretty safely.Svoboda
because you may be forcing the network layer to send data more frequently than it might otherwise.Hyaluronidase
With some many different answers in this thread, I am not sure which one is correct.Babbage
Both answers are right, I'll not cry all the night if you choose Mark's one :)Kynan
Your answer was better in my opinion. I added some more info to mine a few minutes ago as I posted from my phone. Default buffer size seems to be 8KB from looking at the source code.Jeffereyjefferies
@Hyaluronidase is really the king of this thread. Really good points were made by them on both answers. Buffers and flushing do exist for a reason. It's good to know as much as you can about them if you plan to use them.Jeffereyjefferies
@MarkLalor If i understand correctly, you agree that putting flush() in a loop is not a good idea?Babbage
If you can avoid it, I'd avoid it. Are you saving the file to the same computer that the script is running on? That should be fine. If you're writing the data to a file over a network each time that might not be the best. I recently did exactly this with no problems for a hobby project with no problems and hundreds of tests (my method public void writeLine() is called in a loop).Jeffereyjefferies
J
3

As David said, flushing forces the download to go through, instead of holding all the unwritten data in memory.

Flushing in a loop is not a problem per se, but you probably don't want to flush every single time the loop runs. Just make sure you're not writing too much (like 200MB) of data before flushing.

Edit: jtahlborn made a good point. BufferedWriter automatically flushes. The other answer on here is better but for those curious, this is what goes on when you write!

That's the source of the public void write(char, int, int) of BufferedWriter. You can see on line 182, it will automatically flush after the buffer size has been reached (default 8,192)

Jeffereyjefferies answered 5/9, 2014 at 16:10 Comment(5)
So the question is when should i call flush(). Calling it in loop is fine?Babbage
I am not sure about complete file being stored in OutputStream. The reason we use a buffer is because we dont want to have the entire file stored in memory which affects the performance. Also I think the outputstream auto flushes after it hits a limit.Babbage
the stream handles flushing automatically, you shouldn't need to call it in the loop at all. it shouldn't matter how much data is written before flushing.Hyaluronidase
@Hyaluronidase I'd flush after I finish writing the file to send those final bytes down.Svoboda
@ColeJohnson - doesn't really matter either way. close() will flush() as well.Hyaluronidase

© 2022 - 2024 — McMap. All rights reserved.