Why is the performance of BufferedReader so much worse than BufferedInputStream?
Asked Answered
A

2

21

I understand that using a BufferedReader (wrapping a FileReader) is going to be significantly slower than using a BufferedInputStream (wrapping a FileInputStream), because the raw bytes have to be converted to characters. But I don't understand why it is so much slower! Here are the two code samples that I'm using:

BufferedInputStream inputStream = new BufferedInputStream(new FileInputStream(filename));
try {
  byte[] byteBuffer = new byte[bufferSize];
  int numberOfBytes;
  do {
    numberOfBytes = inputStream.read(byteBuffer, 0, bufferSize);
  } while (numberOfBytes >= 0);
}
finally {
  inputStream.close();
}

and:

BufferedReader reader = new BufferedReader(new FileReader(filename), bufferSize);
try {
  char[] charBuffer = new char[bufferSize];
  int numberOfChars;
  do {
    numberOfChars = reader.read(charBuffer, 0, bufferSize);
  } while (numberOfChars >= 0);
}
finally {
  reader.close();
}

I've tried tests using various buffer sizes, all with a 150 megabyte file. Here are the results (buffer size is in bytes; times are in milliseconds):

Buffer   Input
  Size  Stream  Reader
 4,096    145     497
 8,192    125     465
16,384     95     515
32,768     74     506
65,536     64     531

As can be seen, the fastest time for the BufferedInputStream (64 ms) is seven times faster than the fastest time for the BufferedReader (465 ms). As I stated above, I don't have an issue with a significant difference; but this much difference just seems unreasonable.

My question is: does anyone have a suggestion for how to improve the performance of the BufferedReader, or an alternative mechanism?

Armorial answered 13/1, 2013 at 6:14 Comment(14)
I think the most likely explanation is that your benchmark is flawed; e.g. you are not taking proper account of JVM warmup effects. Please post the complete thing.Louise
@StephenC or maybe disk cache?Isleen
You're comparing apples and oranges--the second test involves converting bytes to char, which the first doesn't do. If you need char data, use a Reader; if you need bytes, use an InputStream. I think you'll find that the fastest of all will be a BufferedReader wrapping an InputStreamReader wrapping a BufferedInputStream wrapping a FileInputStream. Also take a look at this thread on how to write a benchmark.Inbound
The result may also depend on the character encoding that is used.Righthander
@StephenC I am not suggesting that my "benchmark" is very scientific, but I don't think the difference is a result of JVM startup, GC execution, or anything of that sort ... I ran the code in loops, and took the average time over a much larger sample; also, both tests were run in the same JVM (it happens that the BufferedInputStream is executed first, but that doesn't seem to be important). Please explain why you think the timings are flawed.Armorial
Without seeing your actual code, I can't give you an full explanation. But my main reasons for thinking this are 1) the times you are reporting seem implausible to me, and 2) you haven't responded to the JVM warmup theory ... which suggests that you don't understand its significance. Just post the code ... so that we can see what you are actually doing, and try to reproduce it.Louise
@Jan Dvorak Even if there is disk caching involved, I don't think this has any relevance ... as I have stated in a previous comment, the code for the BufferedInputStream runs in the same execution run as the code for the BufferedReader. I don't actually think that the 150MB file is being cached, but perhaps it is ... yet how does this explain the difference in time between the character and byte processing?Armorial
@TedHopp Yes, as I tried to explain in my question, I understand that there is a significant difference between processing raw bytes and characters. It just seems that a seven-fold difference in performance is more than I would expect. And I have a feeling that your suggestion to wrap the FileInputStream in three layers is not serious ... if it is, just let me know and I'll try it!Armorial
The suggestion was perfectly serious. I did some experiments some time ago and was surprised at the results. It's a second-order improvement, but seemed to be definitely there.Inbound
Is the buffer size the same in both cases? In bytes rather than in absolute value? Are you running both tests in the same JVM? And if so, in which order? Have you tried different size arguments when constructing the BufferedInputStream/Reader?Collectivity
@EJP The magnitude of the size of the buffers was the same, but consequently not the physical size ... the char array is using the size of a char (I think that's four bytes on my machine) an the byte array is using a byte. This may explain the differences in the relative speeds when the buffer size changes. Both tests are executed in the same method in the same execution of the program (and my test harness runs them more than just once).Armorial
@StephenC Thank you for your comments ... why do you think that the times are implausible? And why do you think this may be affected by JVM warmup? The tests are running in the same JVM, in the same execution of the program, an the faster test is executed first (I would expect that JVM warmup would cause the earlier test to be slower). When I reverse the order of the tests I see no difference in the times. The only reason that I hesitate to post the code is that it is a small part of a larger program ... I could isolate it an post that, I suppose. You could try the code that I've posted.Armorial
@TedHopp I tried the BufferedReader+InputStreamReader+BufferedInputStream+FileInputStream approach, and the results were within a few milliseconds of the simple BufferedReader+FileReader test (for both small and large buffers).Armorial
Hm. I guess I need to revisit my testing.Inbound
T
15

The BufferedReader has convert the bytes into chars. This byte by byte parsing and copy to a larger type is expensive relative to a straight copy of blocks of data.

byte[] bytes = new byte[150 * 1024 * 1024];
Arrays.fill(bytes, (byte) '\n');

for (int i = 0; i < 10; i++) {
    long start = System.nanoTime();
    StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes));
    long time = System.nanoTime() - start;
    System.out.printf("Time to decode %,d MB was %,d ms%n",
            bytes.length / 1024 / 1024, time / 1000000);
}

prints

Time to decode 150 MB was 226 ms
Time to decode 150 MB was 167 ms

NOTE: Having to do this intermixed with system calls can slow down both operations (as system calls can disturb the cache)

Torchier answered 13/1, 2013 at 9:56 Comment(0)
W
3

in BufferedReader implementation there is a fixed constant defaultExpectedLineLength = 80, which is used in readLine method when allocating StringBuffer. If you have big file with lots of lines longer then 80, this fragment might be something that can be improved

if (s == null) 
    s = new StringBuffer(defaultExpectedLineLength);
s.append(cb, startChar, i - startChar);
Warfold answered 21/10, 2014 at 8:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.