Java InputStream blocking read
Asked Answered
U

7

60

According to the Java api documentation, the read() method that is declared in InputStream is described as:

If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.

I have a while(true) loop doing a read() and I always get -1 when nothing's sent over the stream. That's expected.

My question is when would read() ever block? Since if it doesn't get any data it returns -1. I would expect a blocking read() to wait until data is received. If you've reached the end of the input stream, shouldn't read() simply wait for data instead of returning -1?

Or does read() only block if there's another thread accessing the stream and your read() cannot access the stream?


Which leads me to my next question. I used to have event listener (provided by my library) that would notify me when data is available. When I was notified I would start while((aByte = read()) > -1) cycle to store the byte. I was puzzled when I'd get TWO events in very short time and not all my data was being displayed. It seemed like only the tail end of the second event's data would be displayed and the the rest was missing.

I eventually changed my code so that when I get an event I'd started if(inputStream.available() > 0) while((aByte = read()) > -1) to store the byte. Now it worked properly and all my data was displayed.

Can someone explain this behavior? The available() method of InputStream is said to return the number of bytes you can read before blocking the next caller (of the stream?). Even if I don't use available() I would expect the read of the first event to just block the read of the second event, but not erase or consume too much stream data. Why would doing this cause not all of my data to be displayed?

Upraise answered 4/3, 2009 at 18:6 Comment(1)
You are confusing end of stream with no data presently available. When no data is presently available, it blocks. End of stream, it returns -1.Cestode
V
51

The underlying data source for some implementations of InputStream can signal that the end of the stream has been reached, and no more data will be sent. Until this signal is received, read operations on such a stream can block.

For example, an InputStream from a Socket socket will block, rather than returning EOF, until a TCP packet with the FIN flag set is received. When EOF is received from such a stream, you can be assured that all data sent on that socket has been reliably received, and you won't be able to read any more data. (If a blocking read results in an exception, on the other hand, some data may have been lost.)

Other streams, like those from a raw file or serial port, may lack a similar format or protocol to indicate that no more data will be available. Such streams can immediately return EOF (-1) rather than blocking when no data are currently available. In the absence of such a format or protocol, however, you can't be sure when the other side is done sending data.


With regard to your second question, it sounds like you may have had a race condition. Without seeing the code in question, I'm guessing that the problem actually lay in your method of "display". Perhaps the attempt to display by the second notification was somehow clobbering the work done during the first notification.

Vary answered 4/3, 2009 at 18:11 Comment(19)
I am using the inputstream for a COM port. Again, if I do a while loop of a read, it will always return -1 until the device (the other end of the COM Port) outputs data. I never have to reopen the stream. So your explanation doesn't make sense to me. Is the device reopening the stream every time?Upraise
rxtx ... a java library to connect to serial and parallel portsUpraise
Hmm, I've only used javax.comm, which implements InputStream as intended. It's possible that the rxtx implementation violates the contract of InputStream.read().Vary
maybe it's the device. The device is using a COM/Serial port wrapped with a USB interface. It's a strange interface.Upraise
and no, I don't think the display was the problem. I am simply outputing the stream's data over the system.out stream.Upraise
So your "while" loop was "while ((abyte = in.read()) != -1) System.out.write(abyte);"? No intermediate buffers, or references to member variables in the enclosing object?Vary
there is an intermediate bufferUpraise
That is true, that many serial communication libraries let you open the stream in a non-blocking mode, meaning that read() returns immediately, either with an octet or -1 in case of... well... EOF, or "no data" condition. It is also true, that InputStream documentation writes about returning -1 only on EOF. What is missing in InputStream docs is the definition of EOF condition. Most people assume that once EOF is reached, only -1 will be returned by read(). But not all assume so, and it leads to unspecified behaviours when working with non-blocking libs.Onesided
Just as a note, in Perl there are also some special cases when eof function can return -1 and later on more valid data. See perldoc.perl.org/functions/eof.html for details.Onesided
@Onesided - that's true, but in this particular case, the InputStream class establishes a specific contract that should be upheld by every implementation. This allows callers to treat all InputStream implementations alike, without special treatment for one implementation that violates the contract. It's perfectly valid for a non-blocking implementation of an input stream to use a -1 value to signal "no data", but it should declare a new interface, not try to masquerade as an InputStream.Vary
@erickson, I totally agree with you about keeping the contract; but what is unsaid in API is the definition of EOF -- this ambiguity leads to this InputStream contract "violation". Clear API: yes, silent assumntions: no!Onesided
I don't understand. The API defines end-of-stream as -1 and specifies that the call will block until data is available. What is ambiguous?Vary
I agree with erickson on this point. RXTX clearly violates InputStream's specification.Miserere
@erickson, as I said earlier, the semantic of EOF is not specified. It should be stated in docs that after EOF is reached (whatever it means...), -1 will be returned from every call to read. It means, no "non-EOF after EOF" is possible.Onesided
However, I agree with you that RXTX makes it worse by breaking the de facto standard interpretation of EOF.Onesided
pwes, just because there are people that don't know what EOF means doesn't mean that anything is unclear. The end of a tcp stream is globally well-defined by the TCP specification itself, so there's no need for javadocs to do any hand-holding. Some people misinterpret EOF to mean "when I've reached the 'end' of the CURRENTLY AVAILABLE data", and they're just plain 100% wrong and should feel bad if their wrongness has caused them to violate the inputstream contract. It's not "the de facto standard", it's the standard as part of the internet protocol suite.Caesarean
"the stream is closed and there will never be any data so stop asking." This statement is completely and utterly wrong. -1 means exactly what the documentation says: the end of the stream is reached and there's no more data, at the moment. For a FileInputStream for example EOF simply means that the end of the file is currently reached. When you append to that file, the same InputStream will again provide data.Giorgia
@Giorgia You're right. The documentation definitely doesn't say "at the moment," but it doesn't say "there will never be any" either. I just tested it out, and it works just as you say. I will update my answer.Vary
Your use of the word "EOF" is a bit confusing since there is also an EOFException which is thrown by the DataInputStream's convenience methods. But with sockets you'll never get it. But if you want some kind of exception, you can just use the setSoTimeout on a socket. Then you will get a SocketTimeoutException after x miliseconds.Stevenson
M
19

It returns -1 if it's end of stream. If stream is still open (i.e. socket connection) but no data has reached the reading side (server is slow, networks is slow,...) the read() blocks.

You don't need call available(). I have a hard time understanding your notification design, but you don't need any calls except read() itself. Method available() is there for convenience only.

Menagerie answered 4/3, 2009 at 18:9 Comment(2)
it is strange that I require available() to be there or else I don't get all my data. Can you explain why I need available()?Upraise
(available isn't even guaranteed to return anyting by 0.)Subsequent
D
16

OK, this is a bit of a mess so first thing lets clear this up: InputStream.read() blocking has nothing to do with multi-threading. If you have multiple threads reading from the same input stream and you trigger two events very close to each other - where each thread is trying to consume an event then you'd get corruption: the first thread to read will get some bytes (possibly all the bytes) and when the second thread gets scheduled it will read the rest of the bytes. If you plan to use a single IO stream in more then one thread, always synchronized() {} on some external constraint.

Second, if you can read from your InputStream until you get -1 and then wait and can read again later, then the InputStream implementation you are using is broken! The contract for InputStream clearly states that an InputStream.read() should only return -1 when there is no more data to read because the end of the stream has been reached and no more data will EVER be available - like when you read from a file and you reach the end(1).

The behavior for "no more data is available now, please wait and you'll get more" is for read() to block and not return until there is some data available (or an exception is thrown).

  1. As noted deep in the discussion on erickson's (currently top) answer, a FileInputStream implementation can actually read past the "end of file" and provide more data after a read() has returned -1 - if data is added to the file later. This is an edge case and is basically the only such case in common InputStream implementations (or at worst - very very rare). You should take that into account if you know you use FileInputStream and expect the file you read from have additional data added (a common example is tailing a log file), but otherwise it is just a deficiency of the InputStream API and in any case - you'd be better off if you can stop using the java.io style blocking IO APIs and use java.nio non-blocking IO APIs.
Debrief answered 4/3, 2009 at 18:17 Comment(4)
Good explanation. I am starting to believe the inputstream implementation I am using IS broken.Upraise
Did you read comments by @Giorgia to @Vary 's answer? I mean comment about file: -1 means exactly what the documentation says: the end of the stream is reached and there's no more data, at the moment. For a FileInputStream for example EOF simply means that the end of the file is currently reached. When you append to that file, the same InputStream will again provide data?Hadley
@Pavel_K, I have not read that comment - mostly as it was written 6 years after this discussion and on another answer (so no notification). That being said - he is right that unlike my assertion above, -1 does not mean "no more data will EVER be available", just that the stream is currently at an end - if the stream source can later find more data, a read() may well return more data again - which does make sense for files but little else. For example the InputStream you get from URLConnection will return -1 when the connection is closed, and it cannot be reopened. Same for Process.Debrief
So in summary, I think this discussion is worth having (and I'll update my answer with a note), but reading after EOF make no sense except for the limited use of actual file system files, and if you write an API that uses filesystem semantics with an InputStream API - most people will be surprised. If you want to offer a non-blocking IO API that returns if temporarily no data is available - Java has better APIs for that in the java.nio package.Debrief
A
7

By default the behavior of the provided RXTX InputStream is not compliant.

You have to set the receive threshold to 1 and disable the receive timeout:

serialPort.enableReceiveThreshold(1);
serialPort.disableReceiveTimeout();

Source: RXTX serial connection - issue with blocking read()

Aegaeon answered 23/4, 2012 at 15:16 Comment(0)
B
4

Aye! Don't give up on your stream yet Jbu. We are talking Serial communication here. For serial stuff, it is absolutely expected that a -1 can/will be returned on reads, yet still expect data at a later time. The problem is that most people are used to dealing with TCP/IP which should always return a 0 unless the TCP/IP disconnected... then yea, -1 makes sense. However, with Serial there is no data flow for extended periods of time, and no "HTTP Keep Alive", or TCP/IP heartbeat, or (in most cases) no hardware flow control. But the link is physical, and still connected by "copper" and still perfectly live.

Now, if what they are saying is correct, ie: Serial should be closed on a -1, then why do we have to watch for stuff like OnCTS, pmCarroerDetect, onDSR, onRingIndicator, etc... Heck, if 0 means its there, and -1 means its not, then screw all those detection functions! :-)

The problem you may be facing may lay elsewhere.

Now, onto specifics:

Q: "It seemed like only the tail end of the second event's data would be displayed and the the rest was missing."

A: I'm going to guess that you were in a loop, re-using the same byte[] buffer. 1st message comes in, is not displayed on the screen/log/std out yet (because you are in the loop), then you read the 2nd message, replacing the 1st message data in the buffer. Again, because I'm going to guess that you don't store how much you read, and then made sure to offset your store buffer by the previous read amount.

Q:"I eventually changed my code so that when I get an event I'd called if(inputStream.available() > 0) while((aByte = read()) > -1) store the byte."

A: Bravo... thats the good stuff there. Now, you data buffer is inside an IF statement, your 2nd message will not clobber your 1st... well, actually, it was probably just one big(er) message in the 1st place. But now, you will read it all in one shot, keeping the data intact.

C: "... race condition ..."

A: Ahhh, the good ol' catch all scape goat! The race condition... :-) Yes, this may have been a race condition, in fact it may have well been. But, it could also just be the way the RXTX clears the flag. The clearing of the 'data available flag' may not happen as quick as one expects. For example, anyone know the difference between read VS readLine in relation to clearing the buffer the data was previously stored in and re-setting the event flag? Neither do I. :-) Nor can I find the answer yet... but... let me ramble on for a few sentences more. Event driven programming still has some flaws. Let me give you a real world example I had to deal with recently.

  • I got some TCP/IP data, lets say, 20 bytes.
  • So I receive the OnEvent for Received Data.
  • I start my 'read' even on the 20 bytes.
  • Before I finish reading my 20 bytes... I get another 10 bytes.
  • The TCP/IP however, looks to notify me, oh, sees that the flag is still SET, and will not notify me again.
  • However, I finish reading my 20 bytes (available() said there were 20)...
  • ... and the last 10 bytes remain in the TCP/IP Q... because I was not notified of them.

See, the notification was missed because the flag was still set... even though I had begun reading the bytes. Had I finished the bytes, then the flag would have been cleared, and I would have received notification for the next 10 bytes.

The exact opposite of what is happening for you now.

So yea, go with an IF available() ... do a read of the returned length of data. Then, if you are paranoid, set a timer and call available() again, if there is still data there, then do a read no the new data. If available() returns 0 (or -1), then relax... sit back... and wait for the next OnEvent notification.

Bendy answered 1/9, 2009 at 2:28 Comment(1)
Wrong. You're violating the InputStream contract. By definition, reaching the end of the stream means there is no more data to come. You can't return more data once you've reached the end. That's just plain English.Miserere
S
2

InputStream is just an abstract class, unfortunately the implementation decides what happens.

What happens if nothing is found:

  • Sockets (i.e. SocketInputStream) will block until data is received (by default). But it's possible to set a timeout (see: setSoTimeout), then the read will block for x ms. If still nothing is received then a SocketTimeoutException will be thrown.

    But with or without timeout, reading from a SocketInputStream can sometimes result in a -1. (E.g. when multiple clients simultaneously connect to the same host:port, then even though the devices seem connected, the result of a read could immediately restult in a -1 (never returning data).)

  • Serialio communication will always return -1; You can also set a timeout (use setTimeoutRx), the read will first block for x ms, but the result will still be -1 if nothing's found. (Remark: but there are multiple serial io classes available, behaviour could be vendor dependent.)

  • Files (readers or streams) will result in an EOFException.

Work to a Generic Solution:

  • If you wrap any of the above streams in a DataInputStream, then you can use methods like readByte, readChar, etc . All -1 values are converted to EOFException. (PS: If you perform a lot of small reads, then it's a good idea to wrap it in a BufferedInputStream first)
  • Both SocketTimeoutException and EOFException extend IOException , and there are several other possible IOException's. It is convenient to just check for IOException's to detect communication issues.

Another sensitive topic is flushing. flush in terms of sockets means "send it now", but in terms of Serialio it means "discard the buffer".

Stevenson answered 5/5, 2015 at 9:8 Comment(8)
I found SocketInputStream.read(byte[] b) will not block when I set up a simple client-server chatting program... I have to use DataInputStream to decorate SocketInputStream, and then use DataInputStream.readUTF() method... This method will block if no message has received.Cater
@JingHe , I noticed similar issues with the read(byte[]) method in the past. Maybe it isn't decently implemented. One of the quirks of the huge InputStream interface, is that not all streams implement all of the methods the way they should. - Anyway, I often end up reading byte per byte with the int read() method, which is still performant, if you wrap the stream in a BufferedInputStream. In the applications where I have used it, I often had to evaluate every byte anyway, because I need to look for a start and stop byte.Stevenson
@JingHe - if you would end up reading byte per byte - like I usually do -, you can write those individual bytes either to a fixed size buffer (e.g. byte[] buffer = new byte[1024]), but I personally like the following construction better: buffer = new ByteArrayOutputStream(); buffer.write(byte);. A ByteArrayOutputStream acts like a flexible buffer (it auto resizes).Stevenson
Thanks a lot! I will try ByteArrayOutputStream.Cater
Reading from a socket cannot result in -1 unless the peer has closed the connection. Multiple clients have nothing to do with it.Cestode
@user207421, For starters, if the peer closes the connection you get a "connection reset by peer" kind of exception. And indeed, in the perfect world of computers which run full-blown operating systems, all devices accept multiple simultaneous connections, worst case: "Connection Refused" exception. However, in industrial automation most devices don't even have an operating system as we know it. PLC programmers just "have to make things work", and sometimes have to write their network layer firmware from scratch. You get a simplified version of TCP/IP where -1 is possible even if connected.Stevenson
@Cestode I am talking about this kind of devices: nl.rs-online.com/web/p/plc-cpus/7816865 Worst kind, is the kind that has built-in ethernet connectors. Our last costumer that used one like this, couldn't communicate with them because we first had to disable network compression on their operating system. But having said that, all production companies use these kind of devices, because they are fast and reliable. And unfortunately, they are programmed in low-level crap languages that look like assembler. ;; But I guess you work in banking sector, right ?Stevenson
@Stevenson For starters if the peer closes the connection normally, read() returns -1. There are ways to cause a connection reset but they mostly involve application protocol violations.Cestode
D
-7

I think You can receive the entire data stream if you use thread.sleep()

Dorise answered 23/11, 2010 at 15:56 Comment(1)
This just sleeps the thread that called thread.sleep(). Not data is read / received / handled by calling thread.sleep(). The only this that happens, is th call sleeps in hopes that more data comes.Pyroclastic

© 2022 - 2024 — McMap. All rights reserved.