How does the python socket.recv() method know that the end of the message has been reached?
Asked Answered
H

1

26

Let's say I'm using 1024 as buffer size for my client socket:

recv(1024)

Let's assume the message the server wants to send to me consists of 2024 bytes. Only 1024 bytes can be received by my socket. What's happening to the other 1000 bytes?

  1. Will the recv-method wait for a certain amount of time (say 2 seconds) for more data to come and stop working after this time span? (I.e., if the rest of the data arrives after 3 seconds, the data will not be received by the socket any more?)

or

  1. Will the recv-method stop working immediately after having received 1024 bytes of data? (I.e. will the other 1000 bytes be discarded?)

In case that 1.) is correct ... is there a way for me to to determine the amount of time, the recv data should wait before returning or is it determined by the system? (I.e. could I tell the socket to wait for 5 seconds before stopping to wait for more data?)

UPDATE: Assume, I have the following code:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((sys.argv[1], port))
    s.send('Hello, world')
    data = s.recv(1024)
    print("received: {}".format(data))
    s.close()

Assume that the server sends data of size > 1024 bytes. Can I be sure that the variable "data" will contain all the data (including those beyond the 1024th byte)? If I can't be sure about that, how would I have to change the code so that I can always be sure that the variable "data" will contain all the data sent (in one or many steps) from the server?

Healthful answered 29/12, 2016 at 14:58 Comment(21)
You tell the computer to receive 1024 bytes of data, and so it does, exactly. It doesn't care about whether there is more data to read in the first place.Shuma
so, you are saying that case 2) is correct? (the other data are discarded?) What do I have to do to make my socket receive all data of the message? Shall I call recv(1024) in a loop? But what would be the break condition for this loop?Healthful
recv normally returns the number of bytes read, so if it's zero, there's nothing to read anymore.Shuma
Is this TCP or UDP?Solitude
@Shuma - the recv can return anything from 1 to 1024 bytes (or 0, meaning the recv pipe has been shut down by the other side). If you ask for 1024 but, say, 44 are immediately available, you'll get just the 44.Splendent
@tdelaney, exactly, so the stopping condition would be read returning zero.Shuma
@Shuma - you said it receives 1024 bytes exactly but it doesn't. I was trying to clear that up.Splendent
@tdelaney, I said "recv normally returns the number of bytes read" (i.e successfully read), not "the number of bytes to read"Shuma
@Shuma You said "You tell the computer to receive 1024 bytes of data, and so it does, exactly".Solitude
@Shuma - Quoting: "You tell the computer to receive 1024 bytes of data, and so it does, exactly." That is not true. It may receive fewer than 1024 bytes. And fewer is not "exactly".Splendent
@melpomene, now I see. Well, "...and so it tries to do, exactly"Shuma
so, if the data is <=1024 byte, the socket received all the data and there is no problem. If data is > 1024 byte, the bytes beyond the 1024th byte are discarded. The important thing for me to know is this: How can I make sure that the bytes beyond the 1024th byte are read as well ?Healthful
Assuming this is a TCP socket, no data is discarded. You receive something up to 1024 bytes and the rest of it either hasn't been sent yet or is buffered in the kernel waiting for you to ask for it.Splendent
If the data is <=1024, all you know is that the socket received some of the data, you can't assume its all of the data.Splendent
@tdelaney: it's TCP ... "the rest of it either hasn't been sent yet or is buffered in the kernel waiting for you to ask for it" ... let's assume that recv() has read <=1024 bytes and now there's nothing more coming ... there are now 2 possibilities: 1. the message has been sent in its entirety (i.e. no more data will come no matter how long recv() waits) or 2. only a part of the data has been sent ... the rest of the data will be sent in one more more additional messages ... how does recv() determine whether it's case 1 or case 2?Healthful
Recv doesn't know. The protocol itself has to have a way of figuring that out. Take HTTP for example. It starts with a \r\n delimited header and then has a count of the remaining bytes the client should expect to receive. The client knows how to read the header because of the \r\n then knows exactly how many bytes are coming next.Splendent
@Splendent but I'm not at http level here but on TCP level ... my question is: how is it figured out on this level? I have updated my question ... My question is simply this: What (if anything) do I have to do to make sure that recv() does not(!) stop reading data before all the data sent from the server have arrived?Healthful
If you're using TCP, then there are no messages. There's only a stream of bytes.Solitude
As far as I understand, TCP has a flag for this - PSH.Wimberly
@BłażejMichalik PSH has nothing to do with messages, which don't exist in TCP.Abidjan
@Abidjan I think you misread my comment. I was referring to what I thought the OP intended to ask (cf. packetlife.net/blog/2011/mar/2/tcp-flags-psh-and-urg), not what other commentators discussed - see comment dates. I don't care whether it's the correct nomenclature or not, I was not commenting on that.Wimberly
S
27

It depends on the protocol. Some protocols like UDP send messages and exactly 1 message is returned per recv. Assuming you are talking about TCP specifically, there are several factors involved. TCP is stream oriented and because of things like the amount of currently outstanding send/recv data, lost/reordered packets on the wire, delayed acknowledgement of data, and the Nagle algorithm (which delays some small sends by a few hundred milliseconds), its behavior can change subtly as a conversation between client and server progresses.

All the receiver knows is that it is getting a stream of bytes. It could get anything from 1 to the fully requested buffer size on any recv. There is no one-to-one correlation between the send call on one side and the recv call on the other.

If you need to figure out message boundaries its up to the higher level protocols to figure that out. Take HTTP for example. It starts with a \r\n delimited header and then has a count of the remaining bytes the client should expect to receive. The client knows how to read the header because of the \r\n then knows exactly how many bytes are coming next. Part of the charm of RESTful protocols is that they are HTTP based and somebody else already figured this stuff out!

Some protocols use NUL to delimit messages. Others may have a fixed length binary header that includes a count of any variable data to come. I like zeromq which has a robust messaging system on top of TCP.

More details on what happens with receive...

When you do recv(1024), there are 6 possibilities

  1. There is no receive data. recv will wait until there is receive data. You can change that by setting a timeout.

  2. There is partial receive data. You'll get that part right away. The rest is either buffered or hasn't been sent yet and you just do another recv to get more (and the same rules apply).

  3. There is more than 1024 bytes available. You'll get 1024 of that data and the rest is buffered in the kernel waiting for another receive.

  4. The other side has shut down the socket. You'll get 0 bytes of data. 0 means you will never get more data on that socket. But if you keep asking for data, you'll keep getting 0 bytes.

  5. The other side has reset the socket. You'll get an exception.

  6. Some other strange thing has gone on and you'll get an exception for that.

Splendent answered 29/12, 2016 at 16:17 Comment(3)
My understanding of your answer is this: On the TCP level it's not possible for me to determine the behaviour of recv() (i.e. whether it will return after receiving one bunch of data or whether it will will for more data but stop waiting after x seconds). That is, the policy determining when to stop waiting/reading can only be configured at a higher level. ...... Is this understanding correct?Healthful
Yes, that's basically it. Higher level protocols on top of TCP are usually needed to know how to handle the data.Splendent
Maybe worth knowing, if you use a UDP (or Unix Datagram) socket, any data longer than the buffer used when you call recv() will be discarded. a TCP (or stream) socket will, as described, keep the extra for the next recv() call.Perspective

© 2022 - 2024 — McMap. All rights reserved.