Should I buffer the InputStream or the InputStreamReader?
Asked Answered
A

4

31

What are the differences (if any) between the following two buffering approaches?

Reader r1 = new BufferedReader(new InputStreamReader(in, "UTF-8"), bufferSize);
Reader r2 = new InputStreamReader(new BufferedInputStream(in, bufferSize), "UTF-8");
Arrowhead answered 11/8, 2010 at 14:4 Comment(0)
P
34

r1 is more efficient. The InputStreamReader itself doesn't have a large buffer. The BufferedReader can be set to have a larger buffer than InputStreamReader. The InputStreamReader in r2 would act as a bottleneck.

In a nut: you should read the data through a funnel, not through a bottle.


Update: here's a little benchmark program, just copy'n'paste'n'run it. You don't need to prepare files.

package com.stackoverflow.q3459127;

import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;

public class Test {

    public static void main(String... args) throws Exception {

        // Init.
        int bufferSize = 10240; // 10KB.
        int fileSize = 100 * 1024 * 1024; // 100MB.
        File file = new File("/temp.txt");

        // Create file (it's also a good JVM warmup).
        System.out.print("Creating file .. ");
        BufferedWriter writer = null;
        try {
            writer = new BufferedWriter(new FileWriter(file));
            for (int i = 0; i < fileSize; i++) {
                writer.write("0");
            }
            System.out.printf("finished, file size: %d MB.%n", file.length() / 1024 / 1024);
        } finally {
            if (writer != null) try { writer.close(); } catch (IOException ignore) {}
        }

        // Read through funnel.
        System.out.print("Reading through funnel .. ");
        Reader r1 = null;        
        try {
            r1 = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"), bufferSize);
            long st = System.nanoTime();
            for (int data; (data = r1.read()) > -1;);
            long et = System.nanoTime();
            System.out.printf("finished in %d ms.%n", (et - st) / 1000000);
        } finally {
            if (r1 != null) try { r1.close(); } catch (IOException ignore) {}
        }

        // Read through bottle.
        System.out.print("Reading through bottle .. ");
        Reader r2 = null;        
        try {
            r2 = new InputStreamReader(new BufferedInputStream(new FileInputStream(file), bufferSize), "UTF-8");
            long st = System.nanoTime();
            for (int data; (data = r2.read()) > -1;);
            long et = System.nanoTime();
            System.out.printf("finished in %d ms.%n", (et - st) / 1000000);
        } finally {
            if (r2 != null) try { r2.close(); } catch (IOException ignore) {}
        }

        // Cleanup.
        if (!file.delete()) System.err.printf("Oops, failed to delete %s. Cleanup yourself.%n", file.getAbsolutePath());
    }

}

Results at my Latitude E5500 with a Seagate Momentus 7200.3 harddisk:

Creating file .. finished, file size: 99 MB.
Reading through funnel .. finished in 1593 ms.
Reading through bottle .. finished in 7760 ms.
Poulterer answered 11/8, 2010 at 14:9 Comment(4)
If the underlying InputStream was a FileInputStream, would the two Readers perform differing amounts of disk reads throughout an entire read process?Arrowhead
I checked it using perfmon, I don't see noticeable differences. I'll update the answer soon to include a benchmark code snippet.Poulterer
Big like for the package name :)Asben
Why not buffer the disk reads as well? Without doing this doesn't the inputStream have to make read calls to the source for each byte? I don't see how BDKosher's concern about disk reads didn't prove true, seems like there should be fewer disk reads with buffered InputStream. BufferedReader reader = new BufferedReader(new InputStreamReader(new BufferedInputSream(inputStream), "UTF-8"));Rm
O
5

r1 is also more convenient when you read line-based stream as BufferedReader supports readLine method. You don't have to read content into a char array buffer or chars one by one. However, you have to cast r1 to BufferedReader or use that type explicitly for the variable.

I often use this code snippet:

BufferedReader br = ...
String line;
while((line=br.readLine())!=null) {
  //process line
}
Owing answered 11/8, 2010 at 14:47 Comment(0)
Y
3

In response to Ross Studtman's question in the comment above (but also relevant to the OP):

BufferedReader reader = new BufferedReader(new InputStreamReader(new BufferedInputSream(inputStream), "UTF-8"));

The BufferedInputStream is superfluous (and likely harms performance due to extraneous copying). This is because the BufferedReader requests characters from the InputStreamReader in large chunks by calling InputStreamReader.read(char[], int, int), which in turn (through StreamDecoder) calls InputStream.read(byte[], int, int) to read a large block of bytes from the underlying InputStream.

You can convince yourself that this is so by running the following code:

new BufferedReader(new InputStreamReader(new ByteArrayInputStream("Hello world!".getBytes("UTF-8")) {

    @Override
    public synchronized int read() {
        System.err.println("ByteArrayInputStream.read()");
        return super.read();
    }

    @Override
    public synchronized int read(byte[] b, int off, int len) {
        System.err.println("ByteArrayInputStream.read(..., " + off + ", " + len + ')');
        return super.read(b, off, len);
    }

}, "UTF-8") {

    @Override
    public int read() throws IOException {
        System.err.println("InputStreamReader.read()");
        return super.read();
    }

    @Override
    public int read(char[] cbuf, int offset, int length) throws IOException {
        System.err.println("InputStreamReader.read(..., " + offset + ", " + length + ')');
        return super.read(cbuf, offset, length);
    }

}).read(); // read one character from the BufferedReader

You will see the following output:

InputStreamReader.read(..., 0, 8192)
ByteArrayInputStream.read(..., 0, 8192)

This demonstrates that the BufferedReader requests a large chunk of characters from the InputStreamReader, which in turn requests a large chunk of bytes from the underlying InputStream.

Yogi answered 7/12, 2014 at 20:10 Comment(2)
And if you use the BufferedInputStream it requests data from the InputStream in large chunks, and supples the smaller requests of the Readers out of its buffer. It is not 'superfluous'.Thirtyeight
@EJP: The BufferedInputStream in my example snippet (first code block in my answer) is superfluous because the BufferedReader requests large blocks from the InputStreamReader, which in turn requests large blocks from the underlying InputStream. The insertion of a BufferedInputStream between the InputStreamReader and the underlying InputStream merely adds overhead without buying any performance gain.Yogi
I
2

FWIW, if you're opening a file in Java 8, you can use the Files.newBufferedReader(Path). I don't know how the performance compares to the other solutions described here, but at least it pushes the decision of what construct to buffer into the JDK.

Impenetrability answered 2/10, 2015 at 19:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.