Seeking out the optimum size for BufferedInputStream in Java
Asked Answered
N

2

7

I was profiling my code that was loading a binary file. The load time was something around 15 seconds.

The majority of my load time was coming from the methods that were loading binary data.

I had the following code to create my DataInputStream:

is = new DataInputStream(
     new GZIPInputStream(
     new FileInputStream("file.bin")));

And I changed it to this:

is = new DataInputStream(
     new BufferedInputStream(
     new GZIPInputStream(
     new FileInputStream("file.bin"))));

So after I did this small modification the loading code went from 15 seconds to 4.

But then I found that BufferedInputStream has two constructors. The other constructor lets you explicitly define the buffer size.

I've got two questions:

  1. What size is chosen in BufferedInputStream and is it ideal? If not, how can I find the optimum size for the buffer? Should I write a quick bit of code that does a binary search?
  2. Is this the best way I can use BufferedInputStream? I originally had it within the GZIPInputStream but there was negligable benefit. I'm assuming what the code is doing now is every time that the file buffer needs to be filled, the GZIP input stream goes through and decodes x bytes (where x is the size of the buffer). Would it be worth just omitting the GZIPInputStream entirely? It's definitely not needed, but my file size is decreased dramatically when using it.
Nail answered 14/12, 2010 at 10:22 Comment(0)
U
9

Both the GZIPInputStream and the BufferedInputStream use an internal buffer. That is why using a BufferedInputStream inside the GZIPInputStream doesn't provide any benefit. The problem with the GZIPInputStream is that it doesn't buffer the output that it generates, thus your current version is much faster.

The default buffersize for the BufferedInputStream is 8kb, so you can try and increase or decrease that to see if it helps. I doubt that the exact number matters much, so you can simply multiply or divide by two.

If the file is small, you can also try to buffer it completely. This should give you the best performance in theory. You could also try to increase the buffer size of the GZIPInputStream (by default 512 bytes), since this might speed up reading from disk.

Unfaithful answered 14/12, 2010 at 10:36 Comment(1)
I suggest you try a buffer of 64K for the GZIPInputStream when reading from a disk. I use 1 MB, which is likely to be more than needed. ;)Strung
O
4
  1. Don't bother with a coded binary search. Just try some values by hand and compare the timings (you can do a manual binary search if you like). You'll most likely find that a very wide range of buffer sizes will give you close-to-optimal performance, so pick the smallest that does the trick.

  2. What you have is the correct order:

    is = new DataInputStream(
         new BufferedInputStream(
         new GZIPInputStream(
         new FileInputStream("file.bin"))));
    

    There is little point in putting a BufferedInputStream inside the GZIPInputStream since the latter already buffers its input (but not the output.)

    Removing GZIPInputStream might be a win, but will most likely be detrimental to performance if the data has to be read from disk and is not resident in the filesystem cache. The reason is that reading from disk is very slow and decompressing gzip is very fast. Therefore it is generally cheaper to read less data from disk and decompress it in memory than it is to read more data from disk.

Overweary answered 14/12, 2010 at 10:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.