Python socket receive - incoming packets always have a different size
Asked Answered
A

7

60

I'm using the SocketServer module for a TCP server. I'm experiencing some issue here with the recv() function, because the incoming packets always have a different size, so if I specify recv(1024) (I tried with a bigger value, and smaller), it gets stuck after 2 or 3 requests because the packet length will be smaller (I think), and then the server gets stuck until a timeout.

class Test(SocketServer.BaseRequestHandler):

def handle(self):

   print "From:", self.client_address

   while True:    

     data = self.request.recv(1024)
     if not data: break

     if data[4] == "\x20":              
       self.request.sendall("hello")
     if data[4] == "\x21":
       self.request.sendall("bye")
     else:
       print "unknow packet"
   self.request.close()
   print "Disconnected", self.client_address

launch = SocketServer.ThreadingTCPServer(('', int(sys.argv[1])),Test)

launch.allow_reuse_address= True;

launch.serve_forever()

If the client sends multiples requests over the same source port, but the server gets stuck, any help would be very appreciated, thank !

Afflict answered 10/11, 2009 at 15:34 Comment(0)
F
37

Note: As people have pointed out in the comments, calling recv() with no parameters is not allowed in Python, and so this answer should be disregarded.

Original answer:


The network is always unpredictable. TCP makes a lot of this random behavior go away for you. One wonderful thing TCP does: it guarantees that the bytes will arrive in the same order. But! It does not guarantee that they will arrive chopped up in the same way. You simply cannot assume that every send() from one end of the connection will result in exactly one recv() on the far end with exactly the same number of bytes.

When you say socket.recv(x), you're saying 'don't return until you've read x bytes from the socket'. This is called "blocking I/O": you will block (wait) until your request has been filled. If every message in your protocol was exactly 1024 bytes, calling socket.recv(1024) would work great. But it sounds like that's not true. If your messages are a fixed number of bytes, just pass that number in to socket.recv() and you're done.

But what if your messages can be of different lengths? The first thing you need to do: stop calling socket.recv() with an explicit number. Changing this:

data = self.request.recv(1024)

to this:

data = self.request.recv()

means recv() will always return whenever it gets new data.

But now you have a new problem: how do you know when the sender has sent you a complete message? The answer is: you don't. You're going to have to make the length of the message an explicit part of your protocol. Here's the best way: prefix every message with a length, either as a fixed-size integer (converted to network byte order using socket.ntohs() or socket.ntohl() please!) or as a string followed by some delimiter (like '123:'). This second approach often less efficient, but it's easier in Python.

Once you've added that to your protocol, you need to change your code to handle recv() returning arbitrary amounts of data at any time. Here's an example of how to do this. I tried writing it as pseudo-code, or with comments to tell you what to do, but it wasn't very clear. So I've written it explicitly using the length prefix as a string of digits terminated by a colon. Here you go:

length = None
buffer = ""
while True:
  data += self.request.recv()
  if not data:
    break
  buffer += data
  while True:
    if length is None:
      if ':' not in buffer:
        break
      # remove the length bytes from the front of buffer
      # leave any remaining bytes in the buffer!
      length_str, ignored, buffer = buffer.partition(':')
      length = int(length_str)

    if len(buffer) < length:
      break
    # split off the full message from the remaining bytes
    # leave any remaining bytes in the buffer!
    message = buffer[:length]
    buffer = buffer[length:]
    length = None
    # PROCESS MESSAGE HERE
Fiore answered 11/11, 2009 at 16:2 Comment(7)
Hans L is correct in below comment that in python request.recv() is not a valid call as bufsize if a mandatory param. Ideally this answer should be removed or edited. docs.python.org/library/socket.htmlRoydd
"This is called "blocking I/O": " I like the break down of this in layman terms....Exemplification
If every message in your protocol was exactly 1024 bytes, calling socket.recv(1024) would work great... is also not true.Ibex
you cannot call socket.recv() without any parameters. TypeError: recv() takes at least 1 argument (0 given) is returned if you try.Britteny
TypeError: recv() takes at least 1 argument (0 given)Krysta
It throw a error recv() takes at least 1 argument (0 given)Pentup
Amazing that the 51 people who upvoted this aren't aware this does not work, even worse that the OP marked this answer as correct...Biodegradable
S
174

The answer by Larry Hastings has some great general advice about sockets, but there are a couple of mistakes as it pertains to how the recv(bufsize) method works in the Python socket module.

So, to clarify, since this may be confusing to others looking to this for help:

  1. The bufsize param for the recv(bufsize) method is not optional. You'll get an error if you call recv() (without the param).
  2. The bufferlen in recv(bufsize) is a maximum size. The recv will happily return fewer bytes if there are fewer available.

See the documentation for details.

Now, if you're receiving data from a client and want to know when you've received all of the data, you're probably going to have to add it to your protocol -- as Larry suggests. See this recipe for strategies for determining end of message.

As that recipe points out, for some protocols, the client will simply disconnect when it's done sending data. In those cases, your while True loop should work fine. If the client does not disconnect, you'll need to figure out some way to signal your content length, delimit your messages, or implement a timeout.

I'd be happy to try to help further if you could post your exact client code and a description of your test protocol.

Signboard answered 27/11, 2009 at 5:43 Comment(2)
The best method I have found is to figure out the number of bytes in the message/file/data, then send the length of the message/file/data before the message, as a header, with a delimiter like :. recv until you get the length of the message by detecting the :, then recv what you need explicitly based on the header. If it's a file then make a loop to recv chunks of the file at a time while making sure to keep the recv size divisible by 2 until the last byte(if the total bytes % 2 != 0). I use that method to transfer big files(GB worth) and it lends well to progress bars.Prodigious
i tested recv(bufsize) and it sent fewer data as well.but my question is how python understand this is the end?! as tcp is a stream the server can detect the end of data in the stream?Klemperer
F
37

Note: As people have pointed out in the comments, calling recv() with no parameters is not allowed in Python, and so this answer should be disregarded.

Original answer:


The network is always unpredictable. TCP makes a lot of this random behavior go away for you. One wonderful thing TCP does: it guarantees that the bytes will arrive in the same order. But! It does not guarantee that they will arrive chopped up in the same way. You simply cannot assume that every send() from one end of the connection will result in exactly one recv() on the far end with exactly the same number of bytes.

When you say socket.recv(x), you're saying 'don't return until you've read x bytes from the socket'. This is called "blocking I/O": you will block (wait) until your request has been filled. If every message in your protocol was exactly 1024 bytes, calling socket.recv(1024) would work great. But it sounds like that's not true. If your messages are a fixed number of bytes, just pass that number in to socket.recv() and you're done.

But what if your messages can be of different lengths? The first thing you need to do: stop calling socket.recv() with an explicit number. Changing this:

data = self.request.recv(1024)

to this:

data = self.request.recv()

means recv() will always return whenever it gets new data.

But now you have a new problem: how do you know when the sender has sent you a complete message? The answer is: you don't. You're going to have to make the length of the message an explicit part of your protocol. Here's the best way: prefix every message with a length, either as a fixed-size integer (converted to network byte order using socket.ntohs() or socket.ntohl() please!) or as a string followed by some delimiter (like '123:'). This second approach often less efficient, but it's easier in Python.

Once you've added that to your protocol, you need to change your code to handle recv() returning arbitrary amounts of data at any time. Here's an example of how to do this. I tried writing it as pseudo-code, or with comments to tell you what to do, but it wasn't very clear. So I've written it explicitly using the length prefix as a string of digits terminated by a colon. Here you go:

length = None
buffer = ""
while True:
  data += self.request.recv()
  if not data:
    break
  buffer += data
  while True:
    if length is None:
      if ':' not in buffer:
        break
      # remove the length bytes from the front of buffer
      # leave any remaining bytes in the buffer!
      length_str, ignored, buffer = buffer.partition(':')
      length = int(length_str)

    if len(buffer) < length:
      break
    # split off the full message from the remaining bytes
    # leave any remaining bytes in the buffer!
    message = buffer[:length]
    buffer = buffer[length:]
    length = None
    # PROCESS MESSAGE HERE
Fiore answered 11/11, 2009 at 16:2 Comment(7)
Hans L is correct in below comment that in python request.recv() is not a valid call as bufsize if a mandatory param. Ideally this answer should be removed or edited. docs.python.org/library/socket.htmlRoydd
"This is called "blocking I/O": " I like the break down of this in layman terms....Exemplification
If every message in your protocol was exactly 1024 bytes, calling socket.recv(1024) would work great... is also not true.Ibex
you cannot call socket.recv() without any parameters. TypeError: recv() takes at least 1 argument (0 given) is returned if you try.Britteny
TypeError: recv() takes at least 1 argument (0 given)Krysta
It throw a error recv() takes at least 1 argument (0 given)Pentup
Amazing that the 51 people who upvoted this aren't aware this does not work, even worse that the OP marked this answer as correct...Biodegradable
L
22

You can alternatively use recv(x_bytes, socket.MSG_WAITALL), which seems to work only on Unix, and will return exactly x_bytes.

Lach answered 2/12, 2009 at 4:40 Comment(0)
Y
4

Note that exact reason why your code is frozen is not because you set too high request.recv() buffer size. Here is explained What means buffer size in socket.recv(buffer_size)

This code will work until it'll receive an empty TCP message (if you'd print this empty message, it'd show b''):

while True:    
  data = self.request.recv(1024)
  if not data: break

And note, that there is no way to send an empty TCP message. socket.send(b'') simply won't work.

Why? Because an empty message is sent only when you type socket.close(), so your script will loop as long as you won't close your connection. As Hans L pointed out here are some good methods to end message.

Edit:

Problem

So your real problem is that you don't have any proper way to end your network message. So your program will wait then until the client ends the connection or until a timeout occurs.

Keyword solution

One solution is to look for a special keyword in received data and when you find the special keyword you don't wait for connection to close, but you break the loop and continue with your program. An even more advanced way is to enclose your message in a special tag e.g. <message>hello world</message>.

Header solution

Another way is to first send a header message which is always the same (fixed) length. In this message you send information how long rest (body) of your message will be, so your program will know what exactly should it put into self.request.recv and when to break the loop.

These problems are the reason why we're using e.g. HTTP. It's already well-designed protocol which solves all these low-level problems for us.

Yama answered 15/12, 2017 at 11:36 Comment(0)
C
3

That's the nature of TCP: the protocol fills up packets (lower layer being IP packets) and sends them. You can have some degree of control over the MTU (Maximum Transfer Unit).

In other words: you must devise a protocol that rides on top of TCP where your "payload delineation" is defined. By "payload delineation" I mean the way you extract the unit of message your protocol supports. This can be as simple as "every NULL terminated strings".

Cotinga answered 10/11, 2009 at 15:38 Comment(0)
A
2

You could try always sending the first 4 bytes of your data as data size and then read complete data in one shot. Use the below functions on both client and server-side to send and receive data.

def send_data(conn, data):
    serialized_data = pickle.dumps(data)
    conn.sendall(struct.pack('>I', len(serialized_data)))
    conn.sendall(serialized_data)


def receive_data(conn):
    data_size = struct.unpack('>I', conn.recv(4))[0]
    received_payload = b""
    reamining_payload_size = data_size
    while reamining_payload_size != 0:
        received_payload += conn.recv(reamining_payload_size)
        reamining_payload_size = data_size - len(received_payload)
    data = pickle.loads(received_payload)

    return data

you could find sample program at https://github.com/vijendra1125/Python-Socket-Programming.git

Addlepated answered 22/8, 2020 at 5:49 Comment(0)
H
1

I know this is old, but I hope this helps someone.

Using regular python sockets I found that you can send and receive information in packets using sendto and recvfrom

# tcp_echo_server.py
import socket

ADDRESS = ''
PORT = 54321

connections = []
host = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host.setblocking(0)
host.bind((ADDRESS, PORT))
host.listen(10)  # 10 is how many clients it accepts

def close_socket(connection):
    try:
        connection.shutdown(socket.SHUT_RDWR)
    except:
        pass
    try:
        connection.close()
    except:
        pass

def read():
    for i in reversed(range(len(connections))):
        try:
            data, sender = connections[i][0].recvfrom(1500)
            return data
        except (BlockingIOError, socket.timeout, OSError):
            pass
        except (ConnectionResetError, ConnectionAbortedError):
            close_socket(connections[i][0])
            connections.pop(i)
    return b''  # return empty if no data found

def write(data):
    for i in reversed(range(len(connections))):
        try:
            connections[i][0].sendto(data, connections[i][1])
        except (BlockingIOError, socket.timeout, OSError):
            pass
        except (ConnectionResetError, ConnectionAbortedError):
            close_socket(connections[i][0])
            connections.pop(i)

# Run the main loop
while True:
    try:
        con, addr = host.accept()
        connections.append((con, addr))
    except BlockingIOError:
        pass

    data = read()
    if data != b'':
        print(data)
        write(b'ECHO: ' + data)
        if data == b"exit":
            break

# Close the sockets
for i in reversed(range(len(connections))):
    close_socket(connections[i][0])
    connections.pop(i)
close_socket(host)

The client is similar

# tcp_client.py
import socket

ADDRESS = "localhost"
PORT = 54321

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((ADDRESS, PORT))
s.setblocking(0)

def close_socket(connection):
    try:
        connection.shutdown(socket.SHUT_RDWR)
    except:
        pass
    try:
        connection.close()
    except:
        pass

def read():
    """Read data and return the read bytes."""
    try:
        data, sender = s.recvfrom(1500)
        return data
    except (BlockingIOError, socket.timeout, AttributeError, OSError):
        return b''
    except (ConnectionResetError, ConnectionAbortedError, AttributeError):
        close_socket(s)
        return b''

def write(data):
    try:
        s.sendto(data, (ADDRESS, PORT))
    except (ConnectionResetError, ConnectionAbortedError):
        close_socket(s)

while True:
    msg = input("Enter a message: ")
    write(msg.encode('utf-8'))

    data = read()
    if data != b"":
        print("Message Received:", data)

    if msg == "exit":
        break

close_socket(s)
Halloran answered 26/4, 2017 at 18:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.