Java - Using DataInputStream with Sockets, buffered or not?

Asked 5/11, 2010 at 22:16 Answered 5/11, 2010 at 22:51

Solved java sockets tcp bufferedinputstream

I'm writing a simple client/server application and I found that using DataInputStream to read data was very convenient because it allows you to chose what to read (without having to convert it yourself from bytes), but I'm wondering if it would be best to wrap it in a BufferedInputStream too, or if that would just add unnecessary overhead?

The reason I'm asking is because I don't know how expensive it is to read directly from the socket stream (when using a BufferedInputStream it will just read once from the socket stream and then multiply times from the BufferedInputStream using DataInputStream).

The data received is usually pretty small, around 20-25 Bytes.

Thanks in advance for any answer! :D

Obed answered 5/11, 2010 at 22:16 Comment(0)

A DataInputStream is not buffered, so each read operation on a DataInputStream object is going to result in one or more reads on the underlying socket stream, and that could result in multiple system calls (or the equivalent).

A system call is typically 2 to 3 orders of magnitude more expensive than a regular method call. Buffered streams work by reducing the number of system calls (ideally to 1), at the cost of adding an extra layer of regular method calls. Typically using a buffered stream replaces N syscalls with 1 syscall and N extra method calls. If N (i.e. the ratio of stream method calls to syscalls) is greater than 1, you win.

It follows that the only cases where putting a BufferedInputStream between the socket stream and the DataInputStream is not a "win" are:

when the application only makes one read...() call and that can be satisfied by a single syscall,
when the application only does large read(byte[] ...) calls, or
when the application doesn't read anything.

It sounds like these don't apply in your case.

Besides, even if they do apply, the overhead of using a BufferedInputStream *when you don't need to is relatively small. By contrast, the overhead of not using a BufferedInputStream when you do need to can be huge.

One final point, the actual amount of data read (i.e. the size of the messages) is pretty much irrelevant to the buffered versus unbuffered conundrum. What really matters is the way that data is read; i.e. the sequence of read...() calls that your application will make.

Colb answered 5/11, 2010 at 22:49 Comment(0)

The general wisdom is that individual reads on the underlying stream are very slow so buffering almost always is faster. However, for such small numbers (20-25 bytes) it might be that the cost of allocating the buffer is similar to the cost of making those individual reads (once you consider memory allocation and garbage collection). Unfortunately, the only way to find out is to test it and see.

You say that the data received is usually small: how often do you expect larger messages? That will be a significant bottleneck if you receive occasional large messages on an unbuffered stream.

I'd suggest that you run some timing tests and see if buffering makes a difference in your case. Or, don't bother with timing tests and just use a buffer. If the message size changes in the future then this will reduce maintenance later on.

Adaminah answered 5/11, 2010 at 22:51 Comment(6)

That doesn't make sense. The buffer is allocated once for the life of the socket. There can be any number of reads. The smaller the reads the more sense buffering makes. – Generalization 5/11, 2010 at 23:16

If clients are connecting, sending a small message, then disconnecting (as something Ajax-y might) then buffering may be inefficient. The number of reads would have to be pretty small, but maybe the 20-25 range is low enough. I don't know: that's why I suggested profiling. – Adaminah 5/11, 2010 at 23:20

Now that I think about it, AJAX won't get you 20-25 byte messages...XML is too verbose. Nevertheless, I have assumed that clients don't maintain their connections. If that assumption is wrong then always use a buffer. – Adaminah 5/11, 2010 at 23:22

I don't see anything in his post that justifies your assumptions. He specifically mentions reading multiple times, so either he reads one byte at a time, which alone would justify the BufferedInputStream, or he allocates a buffer of his own, which is exactly the overhead you are referring to in another form. – Generalization 6/11, 2010 at 11:47

Just to clear things up, I am keeping the connection open, and I am constantly sending new data to it containing 20-25 bytes each, and then reading it with multiple readInt(), readByte(), etc. (and now I'm also buffering it) – Obed 6/11, 2010 at 12:51

Right. In that case use a buffer. @EJP: there is nothing in the post that says either way. I made an assumption and it turned out to be wrong. That's OK! – Adaminah 6/11, 2010 at 16:37

Recommended topics

Hot tags