Java NIO: How to know when SocketChannel read() is complete with non-blocking I/O
Asked Answered
B

4

6

I am currently using a non-blocking SocketChannel (Java 1.6) to act as a client to a Redis server. Redis accepts plain-text commands directly over a socket, terminated by CRLF and responds in-like, a quick example:

SEND: 'PING\r\n'

RECV: '+PONG\r\n'

Redis can also return huge replies (depending on what you are asking for) with many sections of \r\n-terminated data all as part of a single response.

I am using a standard while(socket.read() > 0) {//append bytes} loop to read bytes from the socket and re-assemble them client side into a reply.

NOTE: I am not using a Selector, just multiple, client-side SocketChannels connected to the server, waiting to service send/receive commands.

What I'm confused about is the contract of the SocketChannel.read() method in non-blocking mode, specifically, how to know when the server is done sending and I have the entire message.

I have a few methods to protect against returning too fast and giving the server a chance to reply, but the one thing I'm stuck on is:

  1. Is it ever possible for read() to return bytes, then on a subsequent call return no bytes, but on another subsequent call again return some bytes?

Basically, can I trust that the server is done responding to me if I have received at least 1 byte and eventually read() returns 0 then I know I'm done, or is it possible the server was just busy and might sputter back some more bytes if I wait and keep trying?

If it can keep sending bytes even after a read() has returned 0 bytes (after previous successful reads) then I have no idea how to tell when the server is done talking to me and in-fact am confused how java.io.* style communications would even know when the server is "done" either.

As you guys know read never returns -1 unless the connection is dead and these are standard long-lived DB connections, so I won't be closing and opening them on each request.

I know a popular response (atleast for these NIO questions) have been to look at Grizzly, MINA or Netty -- if possible I'd really like to learn how this all works in it's raw state before adopting some 3rd party dependencies.

Thank you.

Bonus Question:

I originally thought a blocking SocketChannel would be the way to go with this as I don't really want a caller to do anything until I process their command and give them back a reply anyway.

If that ends up being a better way to go, I was a bit confused seeing that SocketChannel.read() blocks as long as there aren't bytes sufficient to fill the given buffer... short of reading everything byte-by-byte I can't figure out how this default behavior is actually meant to be used... I never know the exact size of the reply coming back from the server, so my calls to SocketChannel.read() always block until a time out (at which point I finally see that the content was sitting in the buffer).

I'm not real clear on the right way to use the blocking method since it always hangs up on a read.

Bobodioulasso answered 7/2, 2011 at 20:54 Comment(4)
This isn't related to NIO but a protocol should either use fixed length messages, send the length of the message first or end each message with a unique delimiter.Knobkerrie
So in general you are saying unless I have delimiters or lengths to work with, I am SOL? The Redis protocol actually does specify these things, but I would have to convert from bytes to chars, piece-meal, on-the-fly to check those values as I read, making my network code a hell of a lot more complicated. I was hoping I could avoid that.Bobodioulasso
I'm afraid that's the only sure way but it's not all that bad. As far as I can gather, the Redis protocol ends each command/reply item with a CR/LF, so you can read and accumulate bytes while checking for CR/LF's and only convert your buffer to chars at the end of each line.Knobkerrie
biziclop - great suggestion, I hadn't actually thought to group on \r\n boundaries but that would avoid the one byte-char conversion case I was worried about (trying to convert unicode char before all bytes are available). Thank you!Bobodioulasso
T
3

If it can keep sending bytes even after a read() has returned 0 bytes (after previous successful reads) then I have no idea how to tell when the server is done talking to me and in-fact am confused how java.io.* style communications would even know when the server is "done" either.

Read and follow the protocol:

http://redis.io/topics/protocol

The spec describes the possible types of replies and how to recognize them. Some are line terminated, while multi-line responses include a prefix count.

Replies

Redis will reply to commands with different kinds of replies. It is possible to check the kind of reply from the first byte sent by the server:

  • With a single line reply the first byte of the reply will be "+"
  • With an error message the first byte of the reply will be "-"
  • With an integer number the first byte of the reply will be ":"
  • With bulk reply the first byte of the reply will be "$"
  • With multi-bulk reply the first byte of the reply will be "*"

Single line reply

A single line reply is in the form of a single line string starting with "+" terminated by "\r\n". ...

...

Multi-bulk replies

Commands like LRANGE need to return multiple values (every element of the list is a value, and LRANGE needs to return more than a single element). This is accomplished using multiple bulk writes, prefixed by an initial line indicating how many bulk writes will follow.


Is it ever possible for read() to return bytes, then on a subsequent call return no bytes, but on another subsequent call again return some bytes? Basically, can I trust that the server is done responding to me if I have received at least 1 byte and eventually read() returns 0 then I know I'm done, or is it possible the server was just busy and might sputter back some more bytes if I wait and keep trying?

Yes, that's possible. Its not just due to the server being busy, but network congestion and downed routes can cause data to "pause". The data is a stream that can "pause" anywhere in the stream without relation to the application protocol.

Keep reading the stream into a buffer. Peek at the first character to determine what type of response to expect. Examine the buffer after each successful read until the buffer contains the full message according to the specification.


I originally thought a blocking SocketChannel would be the way to go with this as I don't really want a caller to do anything until I process their command and give them back a reply anyway.

I think you're right. Based on my quick-look at the spec, blocking reads wouldn't work for this protocol. Since it looks line-based, BufferedReader may help, but you still need to know how to recognize when the response is complete.

Tyrosine answered 7/2, 2011 at 21:30 Comment(1)
Bert, exactly right; for some reason I got it stuck in my head that I would "read until done" then convert the whole mess to char's then try and determine the reply type based on the protocol (I have it printed here in front of me all marked up). It wasn't until Erick's reply above that I realized my code had to be cognisant of the bytes it was reading and bounding values which leads me to exactly what you said: peek at the bytes and follow the spec. Thank you.Bobodioulasso
S
4

Look to your Redis specifications for this answer.

It's not against the rules for a call to .read() to return 0 bytes on one call and 1 or more bytes on a subsequent call. This is perfectly legal. If anything were to cause a delay in delivery, either because of network lag or slowness in the Redis server, this could happen.

The answer you seek is the same answer to the question: "If I connected manually to the Redis server and sent a command, how could I know when it was done sending the response to me so that I can send another command?"

The answer must be found in the Redis specification. If there's not a global token that the server sends when it is done executing your command, then this may be implemented on a command-by-command basis. If the Redis specifications do not allow for this, then this is a fault in the Redis specifications. They should tell you how to tell when they have sent all their data. This is why shells have command prompts. Redis should have an equivalent.

In the case that Redis does not have this in their specifications, then I would suggest putting in some sort of timer functionality. Code your thread handling the socket to signal that a command is completed after no data has been received for a designated period of time, like five seconds. Choose a period of time that is significantly longer than the longest command takes to execute on the server.

Starbuck answered 7/2, 2011 at 21:6 Comment(2)
Since the Redis protocol actually does specify delimiters or message lengths, you should really use those. This is the best way to create a robust solution. If you're looking for a quick hack, you may be able to get away with the timer functionality I mentioned. I wouldn't recommend it, however, since the answer is provided in the Regis protocol.Starbuck
Erick, you have saved what little hair I have left. It sounds like I was misunderstanding responsibilities in this code; attributing behaviors to SocketChannel that didn't exist. I'll take a hack at looking for the delimiting information in the reply and causing the connection to respond appropriately. Thank you for the help.Bobodioulasso
T
3

If it can keep sending bytes even after a read() has returned 0 bytes (after previous successful reads) then I have no idea how to tell when the server is done talking to me and in-fact am confused how java.io.* style communications would even know when the server is "done" either.

Read and follow the protocol:

http://redis.io/topics/protocol

The spec describes the possible types of replies and how to recognize them. Some are line terminated, while multi-line responses include a prefix count.

Replies

Redis will reply to commands with different kinds of replies. It is possible to check the kind of reply from the first byte sent by the server:

  • With a single line reply the first byte of the reply will be "+"
  • With an error message the first byte of the reply will be "-"
  • With an integer number the first byte of the reply will be ":"
  • With bulk reply the first byte of the reply will be "$"
  • With multi-bulk reply the first byte of the reply will be "*"

Single line reply

A single line reply is in the form of a single line string starting with "+" terminated by "\r\n". ...

...

Multi-bulk replies

Commands like LRANGE need to return multiple values (every element of the list is a value, and LRANGE needs to return more than a single element). This is accomplished using multiple bulk writes, prefixed by an initial line indicating how many bulk writes will follow.


Is it ever possible for read() to return bytes, then on a subsequent call return no bytes, but on another subsequent call again return some bytes? Basically, can I trust that the server is done responding to me if I have received at least 1 byte and eventually read() returns 0 then I know I'm done, or is it possible the server was just busy and might sputter back some more bytes if I wait and keep trying?

Yes, that's possible. Its not just due to the server being busy, but network congestion and downed routes can cause data to "pause". The data is a stream that can "pause" anywhere in the stream without relation to the application protocol.

Keep reading the stream into a buffer. Peek at the first character to determine what type of response to expect. Examine the buffer after each successful read until the buffer contains the full message according to the specification.


I originally thought a blocking SocketChannel would be the way to go with this as I don't really want a caller to do anything until I process their command and give them back a reply anyway.

I think you're right. Based on my quick-look at the spec, blocking reads wouldn't work for this protocol. Since it looks line-based, BufferedReader may help, but you still need to know how to recognize when the response is complete.

Tyrosine answered 7/2, 2011 at 21:30 Comment(1)
Bert, exactly right; for some reason I got it stuck in my head that I would "read until done" then convert the whole mess to char's then try and determine the reply type based on the protocol (I have it printed here in front of me all marked up). It wasn't until Erick's reply above that I realized my code had to be cognisant of the bytes it was reading and bounding values which leads me to exactly what you said: peek at the bytes and follow the spec. Thank you.Bobodioulasso
C
2

I am using a standard while(socket.read() > 0) {//append bytes} loop

That is not a standard technique in NIO. You must store the result of the read in a variable, and test it for:

  1. -1, indicating EOS, meaning you should close the channel
  2. zero, meaning there was no data to read, meaning you should return to the select() loop, and
  3. a positive value, meaning you have read that many bytes, which you should then extract and remove from the ByteBuffer (get()/compact()) before continuing.
Chintzy answered 15/4, 2011 at 6:42 Comment(0)
L
2

It's been a long time, but . . .

I am currently using a non-blocking SocketChannel

Just to be clear, SocketChannels are blocking by default; to make them non-blocking, one must explicitly invoke SocketChannel#configureBlocking(false)

I'll assume you did that

I am not using a Selector

Whoa; that's the problem; if you are going to use non-blocking Channels, then you should always use a Selector (at least for reads); otherwise, you run into the confusion you described, viz. read(ByteBuffer) == 0 doesn't mean anything (well, it means that there are no bytes in the tcp receive buffer at this moment).

It's analogous to checking your mailbox and it's empty; does it mean that the letter will never arrive? was never sent?

What I'm confused about is the contract of the SocketChannel.read() method in non-blocking mode, specifically, how to know when the server is done sending and I have the entire message.

There is a contract -> if a Selector has selected a Channel for a read operation, then the next invocation of SocketChannel#read(ByteBuffer) is guaranteed to return > 0 (assuming there's room in the ByteBuffer arg)

Which is why you use a Selector, and because it can in one select call "select" 1Ks of SocketChannels that have bytes ready to read

Now there's nothing wrong with using SocketChannels in their default blocking mode; and given your description (a client or two), there's probably no reason to as its simpler; but if you want to use non-blocking Channels, use a Selector

Lamond answered 7/9, 2015 at 0:16 Comment(1)
Logged in just to upvote this - has been a long time, but this was a great answer. Back before I really understood the NIO APIs, I was still in the 'stream IO' mindset and hadn't specifically seen the "gotta use a selector with the non-blocking reads" comment spelled out before - that crystalized everything very quickly - hopefully will help others.Bobodioulasso

© 2022 - 2024 — McMap. All rights reserved.