How do an InputStream, InputStreamReader and BufferedReader work together in Java?
Asked Answered
F

3

17

I am studying Android development (I'm a beginner in programming in general) and learning about HTTP networking and saw this code in the lesson:

private String readFromStream(InputStream inputStream) throws IOException {
  StringBuilder output = new StringBuilder();
  if (inputStream != null) {
    InputStreamReader inputStreamReader = new InputStreamReader(inputStream, Charset.forName("UTF-8"));
    BufferedReader reader = new BufferedReader(inputStreamReader);
    String line = reader.readLine();
    while (line != null) {
      output.append(line);
      line = reader.readLine();
    }
  }
  return output.toString();
}

I don't understand exactly what InputStream, InputStreamReader and BufferedReader do. All of them have a read() method and also readLine() in the case of the BufferedReader.Why can't I only use the InputStream or only add the InputStreamReader? Why do I need to add the BufferedReader? I know it has to do with efficiency but I don't understand how.

I've been researching and the documentation for the BufferedReader tries to explain this but I still don't get who is doing what:

In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. For example,

BufferedReader in = new BufferedReader(new FileReader("foo.in")); will buffer the input from the specified file. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.

So, I understand that the InputStream can only read one byte, the InputStreamReader a single character, and the BufferedReader a whole line and that it also does something about efficiency which is what I don't get. I would like to have a better understanding of who is doing what, so as to understand why I need all three of them and what the difference would be without one of them.

I've researched a lot here and elsewhere on the web and don't seem to find any explanation about this that I can understand, almost all tutorials just repeat the documentation info. Here are some related questions that maybe begin to explain this but don't go deeper and solve my confusion: Q1, Q2, Q3, Q4. I think it may have to do with this last question's explanation about system calls and returning. But I would like to understand what is meant by all this.

Could it be that the BufferedReader's readLine() calls the InputStreamReader's read() method which in turn calls the InputStream's read() method? And the InputStream returns bytes converted to int, returning a single byte at a time, the InputStreamReader reads enough of these to make a single character and converts it to int and returns a single character at a time, and the BufferedReader reads enough of these characters represented as integers to make up a whole line? And returns the whole line as a String, returning only once instead of several times? I don't know, I'm just trying to get how things work.

Lots of thanks in advance!

Friedman answered 31/3, 2017 at 18:7 Comment(2)
#32175721 ?Tewell
Thanks for the suggestion @RC. I had already seen that question and mentioned it in my own question. I am looking for something a bit more specific as to what is happening between the three of them though.Friedman
P
33

This Streams in Java concepts and usage link, give a very nice explanations.

This

Streams, Readers, Writers, BufferedReader, BufferedWriter – these are the terminologies you will deal with in Java. There are the classes provided in Java to operate with input and output. It is really worth to know how these are related and how it is used. This post will explore the Streams in Java and other related classes in detail. So let us start:

Let us define each of these in high level then dig deeper.

Streams
Used to deal with byte level data

Reader/Writer
Used to deal with character level. It supports various character encoding also.

BufferedReader/BufferedWriter
To increase performance. Data to be read will be buffered in to memory for quick access.

While these are for taking input, just the corresponding classes exists for output as well. For example, if there is an InputStream that is meant to read stream of byte, and OutputStream will help in writing stream of bytes.

InputStreams
There are many types of InputStreams java provides. Each connect to distinct data sources such as byte array, File etc.

For example FileInputStream connects to a file data source and could be used to read bytes from a File. While ByteArrayInputStream could be used to treat byte array as input stream.

OutputStream
This helps in writing bytes to a data source. For almost every InputStream there is a corresponding OutputStream, wherever it makes sense.


UPDATE

What is Buffered Stream?

Here I'm quoting from Buffered Streams, Java documentation (With a technical explanation):

Buffered Streams

Most of the examples we've seen so far use unbuffered I/O. This means each read or write request is handled directly by the underlying OS. This can make a program much less efficient, since each such request often triggers disk access, network activity, or some other operation that is relatively expensive.

To reduce this kind of overhead, the Java platform implements buffered I/O streams. Buffered input streams read data from a memory area known as a buffer; the native input API is called only when the buffer is empty. Similarly, buffered output streams write data to a buffer, and the native output API is called only when the buffer is full.

Sometimes I'm losing my hair reading a technical documentation. So, here I quote the more humane explanation from https://yfain.github.io/Java4Kids/:

In general, disk access is much slower than the processing performed in memory; that’s why it’s not a good idea to access the disk a thousand times to read a file of 1,000 bytes. To minimize the number of times the disk is accessed, Java provides buffers, which serve as reservoirs of data.

enter image description here

In reading File with FileInputStream then BufferedInputStream, the class BufferedInputStream works as a middleman between FileInputStream and the file itself. It reads a big chunk of bytes from a file into memory (a buffer) in one shot, and the FileInputStream object then reads single bytes from there, which are fast memory-to-memory operations. BufferedOutputStream works similarly with the class FileOutputStream.

The main idea here is to minimize disk access. Buffered streams are not changing the type of the original streams — they just make reading more efficient. A program performs stream chaining (or stream piping) to connect streams, just as pipes are connected in plumbing.

Primine answered 31/3, 2017 at 18:17 Comment(4)
This is a great answer. I just wanted to strengthen the point that Java COULD have just have provided a really simple method like: String text=new File("data.txt").getText() to read all the text. Java 8 does provide something like this in fact, but in general Java just gives you the pieces and lets you put them together in any way you like (Closer to Python's methodology of providing a single way to solve a problem than Ruby's "Make the common thing easy and the rare thing possible"--which often leads to multiple ways to solve problems.)Delicatessen
Thanks, @ישו אוהב אותך. I understand the first two parts of the diagram, the InputStreamReader will read the bytes read by the InputStream and encode them to characters. But I don't get the buffering part, how does the BufferedReader help? What does this line mean: "To increase performance. Data to be read will be buffered in to memory for quick access." What does it do in less technical words?Friedman
Great update, @ישו אוהב אותך! Thanks, I would only add one more thing, regarding how these 3 classes specifically work together. Upon further reading and skimming through OpenJDK source code, on a high level, this is what happens: BufferedReader is in charge of mantaining and filling a char array, it's read() or readLine() methods will get chars from here. It will be filled by calling InputStreamReader's read(char[], int, int) method. InputStreamReader is associated with a StreamDecoder, when it's read(char[], int, int) is called, it will call the StreamDecoder's read(char[], int, int). (1/2)Friedman
(2/2) This will eventually call the underlying InputStream's read(byte[], int, int) method. This actually calls it's read() method repeatedly. So the benefit of using the BufferedReader is that we will make fewer OS calls by automatizing the process of getting multiple bytes at once from the underlying InputStream, instead of making individual read() calls which would make an OS call each time. This will make our program less impactful. Main sources: tutorials.jenkov.com/java-io/bufferedreader.html, quepublishing.com/articles/article.aspx?p=26067, hg.openjdk.java.netFriedman
C
2
  • InputStream, OutputStream, byte[], ByteBuffer are for binary data.
  • Reader, Writer, String, char are for text, internally Unicode, so that all scripts in the world may be combined (say Greek and Arabic).

  • InputStreamReader and OutputStreamWriter form a bridge between both. If you have some InputStream and know that its bytes is actually text in some encoding, Charset, then you can wrap the InputStream:

    try (InputStreamReader reader =
            new InputStreamReader(stream, StandardCharsets.UTF_8)) {
         ... read text ...
    }
    

There is a constructor without Charset, but that is not portable, as it uses the default platform encoding.

On Android StandardCharset may not exist, use "UTF-8".

The derived classes FileInputStream and BufferedReader add something to the parent InputStream resp. Reader.

A FileInputStream is for input from a File, and BufferedReader uses a memory buffer, so the actual physical reading does not does not read character wise (inefficient). With new BufferedReader(otherReader) you add buffering to your original reader.

All this understood, there is the utility class Files with methods like newBufferedReader(Path, Charset) which add additional brevity.

Compossible answered 31/3, 2017 at 18:18 Comment(7)
Thanks, @Joop, I still didn't get the last part. How does the BufferReader help? I don't understand this line from the documentation: the BufferReader "will buffer the input from the specified file. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient." The code for this is in the quote in my question. What is the difference now when using a BufferedReader?Friedman
I have added a bit of your comment to the answer for others. Explaining Say the BufferedReader finally reads from a file. A normal Reader would ask the operating system repeatedly reading a char/byte and check for end-of-file, in a loop. Reading instead buffers speeds this up, as not int read() but int read(byte[], int, int) is used internally for the operating system. So it is often faster, though the use case depends. There should be no difference in functionality.Compossible
@JoopEggen The OP commented the following on the answer given by ישו אוהב אותך: "This will eventually call the underlying InputStream's read(byte[], int, int) method. This actually calls it's read() method repeatedly." (1/2)Befit
(2/2) But I'm guessing that doesn't always happen right? Because it would seem like that defeats the purpose of not calling read() multiple times. It looks like the int read(byte[], int, int) provided in InputStream abstract class is just a convenience method in case the real implementation doesn't provide anything better, correct? But in the case of FileInputStream, it seems to provide a native implementation of int read(byte[], int, int) that uses the OS instead of calling read() multiple times.. am I correct here?Befit
@Befit Correct. The usage is wrong, one reads a buffer repeatedly, always read returning the number of bytes read, or -1 on end. Some classes have a readFully but then one must reserve an array with the size of the file. An array of 100 bytes, a file of 200 bytes and a disk system block size of 128 bytes could give reads: 100, 28, 72, -1. (Doing physical reads at 100 and 72.Compossible
@JoopEggen I'm not sure what you mean by the usage is wrong. Are you confirming that the InputStream implementation of int read(byte[], int, int) is just a default naive implementation that's normally not meant to be used?Befit
@Befit no, it is in fact standard low level I/O like in C, with a prevention of some blocking on input. Its usage simply is more circumstantial (as low level) than for instance DataInputStream.readFully.Compossible
C
1

I have read lots of articles on this very topic. I hope this might help you in some way.

Basically, the BufferedReader maintains an internal buffer.

During its read operation, it reads bytes from the files in bulk and stores that bytes in its internal buffer.

Now byte is passed to the program from that internal buffer for each read operation.

This reduces the number of communication between the program and the file or disks. Hence more efficient.

Compartmentalize answered 20/12, 2019 at 4:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.