At what point does wrapping a FileOutputStream with a BufferedOutputStream make sense, in terms of performance?
Asked Answered
G

2

53

I have a module that is responsible for reading, processing, and writing bytes to disk. The bytes come in over UDP and, after the individual datagrams are assembled, the final byte array that gets processed and written to disk is typically between 200 bytes and 500,000 bytes. Occassionally, there will be byte arrays that, after assembly, are over 500,000 bytes, but these are relatively rare.

I'm currently using the FileOutputStream's write(byte\[\]) method. I'm also experimenting with wrapping the FileOutputStream in a BufferedOutputStream, including using the constructor that accepts a buffer size as a parameter.

It appears that using the BufferedOutputStream is tending toward slightly better performance, but I've only just begun to experiment with different buffer sizes. I only have a limited set of sample data to work with (two data sets from sample runs that I can pipe through my application). Is there a general rule-of-thumb that I might be able to apply to try to calculate the optimal buffer sizes to reduce disk writes and maximize the performance of the disk writing given the information that I know about the data I'm writing?

Genova answered 3/1, 2012 at 13:25 Comment(0)
B
36

BufferedOutputStream helps when the writes are smaller than the buffer size e.g. 8 KB. For larger writes it doesn't help nor does it make it much worse. If ALL your writes are larger than the buffer size or you always flush() after every write, I would not use a buffer. However if a good portion of your writes are less that the buffer size and you don't use flush() every time, its worth having.

You may find increasing the buffer size to 32 KB or larger gives you a marginal improvement, or make it worse. YMMV


You might find the code for BufferedOutputStream.write useful

/**
 * Writes <code>len</code> bytes from the specified byte array
 * starting at offset <code>off</code> to this buffered output stream.
 *
 * <p> Ordinarily this method stores bytes from the given array into this
 * stream's buffer, flushing the buffer to the underlying output stream as
 * needed.  If the requested length is at least as large as this stream's
 * buffer, however, then this method will flush the buffer and write the
 * bytes directly to the underlying output stream.  Thus redundant
 * <code>BufferedOutputStream</code>s will not copy data unnecessarily.
 *
 * @param      b     the data.
 * @param      off   the start offset in the data.
 * @param      len   the number of bytes to write.
 * @exception  IOException  if an I/O error occurs.
 */
public synchronized void write(byte b[], int off, int len) throws IOException {
    if (len >= buf.length) {
        /* If the request length exceeds the size of the output buffer,
           flush the output buffer and then write the data directly.
           In this way buffered streams will cascade harmlessly. */
        flushBuffer();
        out.write(b, off, len);
        return;
    }
    if (len > buf.length - count) {
        flushBuffer();
    }
    System.arraycopy(b, off, buf, count, len);
    count += len;
}
Brian answered 3/1, 2012 at 13:30 Comment(4)
Something I haven't found yet - what is the default buffer size of the BufferedOutputStream in Java 6? You mention 8KB - is that the default in Java? The Javadocs for 1.4.2 say the buffer is 512 bytes, meaning most of what I write tends to fall between 200 and 400 bytes per array. However, this information is removed from the Java 6 documentation.Genova
@Thomas - looking at the source code, the default size is 8192. I'd assume they removed the default size specification to be able to change it when a new "most sensible default" appears. If having a specific buffer size is important, you'll probably want to specify it explicitly.Disconnect
@Disconnect Thanks. I always forget that I can look at the Java source code.Genova
My other question is if a write that is greater than the buffer size is worse performing than a non-buffered write. I can't think of a reason why it would be significantly worse, although the greater over the buffer size it is, the more times the buffer gets full, written, and full again. So I might need to experiment with that as well.Genova
I
0

I have lately been trying to explore IO performance. From what I have observed, directly writing to a FileOutputStream has led to better results; which I have attributed to FileOutputStream's native call for write(byte[], int, int). Moreover, I have also observed that when BufferedOutputStream's latency begins to converge towards that of direct FileOutputStream, it fluctuates a lot more i.e. it can abruptly even double-up (I haven't yet been able to find out why).

P.S. I am using Java 8 and will not be able to comment right now on whether my observations will hold for previous java versions.

Here's the code I tested, where my input was a ~10KB file

public class WriteCombinationsOutputStreamComparison {
    private static final Logger LOG = LogManager.getLogger(WriteCombinationsOutputStreamComparison.class);

public static void main(String[] args) throws IOException {

    final BufferedInputStream input = new BufferedInputStream(new FileInputStream("src/main/resources/inputStream1.txt"), 4*1024);
    final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    int data = input.read();
    while (data != -1) {
        byteArrayOutputStream.write(data); // everything comes in memory
        data = input.read();
    }
    final byte[] bytesRead = byteArrayOutputStream.toByteArray();
    input.close();

    /*
     * 1. WRITE USING A STREAM DIRECTLY with entire byte array --> FileOutputStream directly uses a native call and writes
     */
    try (OutputStream outputStream = new FileOutputStream("src/main/resources/outputStream1.txt")) {
        final long begin = System.nanoTime();
        outputStream.write(bytesRead);
        outputStream.flush();
        final long end = System.nanoTime();
        LOG.info("Total time taken for file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]");
        if (LOG.isDebugEnabled()) {
            LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8")));
        }
    }

    /*
     * 2. WRITE USING A BUFFERED STREAM, write entire array
     */

    // changed the buffer size to different combinations --> write latency fluctuates a lot for same buffer size over multiple runs
    try (BufferedOutputStream outputStream = new BufferedOutputStream(new FileOutputStream("src/main/resources/outputStream1.txt"), 16*1024)) {
        final long begin = System.nanoTime();
        outputStream.write(bytesRead);
        outputStream.flush();
        final long end = System.nanoTime();
        LOG.info("Total time taken for buffered file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]");
        if (LOG.isDebugEnabled()) {
            LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8")));
        }
    }
}
}

OUTPUT:

2017-01-30 23:38:59.064 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for file write, writing entire array [nanos=100990], [bytesWritten=11059]

2017-01-30 23:38:59.086 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for buffered file write, writing entire array [nanos=142454], [bytesWritten=11059]
Incommensurable answered 30/1, 2017 at 15:53 Comment(3)
I ran similar tests and I can confirm that using a BufferedOutputStream makes writing files not faster but slower, most likely because the data being written is already cached at multiple levels on its way from the JVM through the OS to the physical medium.Elvinelvina
@GOTO Thanks for confirming. Are there any resources you might be aware of, that can help me dig deeper into how IO and internal caches work?Incommensurable
Not really. If it helps googling, the file caching components are called Cache Manager in Windows and Page Cache in Linux. Hard disks and other storage devices also come with different sorts of I/O caches (though the basics are probably the same).Elvinelvina

© 2022 - 2024 — McMap. All rights reserved.