Handle multiple socket connections

T

2

5

I'm writing a client-server app with Python. The idea is to have a main server and thousands of clients that will connect with it. The server will send randomly small files to the clients to be processed and clients must do the work and update its status to the server every minute. My problem with this is that for the moment I only have an small and old home server so I think that it can't handle so many connections. Maybe you could help me with this:

How can increase the number of connections in my server?
How can I balance the load from client-side?
How could I improve the communication? I mean, I need to have a list of clients on the server with its status (maybe in a DB?) and this updates will be received time to time, so I don't need a permanent connection. Is a good idea to use UDP to send the updates? If not, do I have to create a new thread every time that I receive an update?

EDIT: I updated the question to explain a little better the problem but mainly to be clear enough for people with the same problems. There is actually a good solution in @TimMcNamara answer.

Twoup answered 13/10, 2012 at 22:36 Comment(0)

I

14

Setting yourself up for success: access patterns matter

What are some of design decisions that could affect how you implement a networking solution? You immediately begin to list down a few:

programmability
available memory
available processors
available bandwidth

This looks like a great list. We want something which is easy enough to program, and is fairly high spec. But, this list fails. What we've done here is only look at the server. That might be all we can control in a web application, but what about distributed systems that we have full control over, like sensor networks?

Let's say we have 10,000 devices that want to update you with their latest sensor readings, which they take each minute. Now, we could use a high-end server that holds concurrent connections with all of the devices.

However, even if you had an extremely high-end server, you could still be finding yourself with performance troubles. If the devices all use the same clock, and all attempt to send data at the top of the minute, then the server would be doing lots of CPU work for 1-2 seconds of each minute and nothing for the rest. Extremely inefficient.

As we have control over the sensors, we could ask them to load balance themselves. One approach would be to give each device an ID, and then use the modulus operator to only send data at the right time per minute:

import time        

def main(device_id):
    data = None
    second_to_send = device_id % 60
    while 1:
        time_now = time.localtime().tm_sec
        if time_now == 0:
            data = read_sensors()
        if time_now == second_to_send and data:
            send(data)
        time.sleep(1)

One consequence of this type of load balancing is that we no longer need such a high powered server. The memory and CPU we thought we needed to maintain connections with everyone is not required.

What I'm trying to say here is that you should make sure that your particular solution focuses on the whole problem. With the brief description you have provided, it doesn't seem like we need to maintain huge numbers of connections the whole time. However, let's say we do need to have 100% connectivity. What options do we have?

Non-blocking networking

The effect of non-blocking I/O means that functions that are asking a file descriptor for data when there is none return immediately. For networking, this could potentially be bad as a function attempting to read from a socket will return no data to the caller. Therefore, it can be a lot simpler sometimes to spawn a thread and then call read. That way blocking inside the thread will not affect the rest of the program.

The problems with threads include memory inefficiency, latency involved with thread creation and computational complexity associated with context switching.

To take advantage of non-blocking I/O, you could protentially poll every relevant file descriptor in a while 1: loop. That would be great, except for the fact that the CPU would run at 100%.

To avoid this, event-based libraries have been created. They will run the CPU at 0% when there is no work to be done, activating only when data is to be read to send. Within the Python world, Twisted, Tornado or gevent are big players. However, there are many options. In particular, diesel looks attractive.

Here's the relevant extract from the Tornado web page:

Because it is non-blocking and uses epoll or kqueue, it can handle thousands of simultaneous standing connections, which means it is ideal for real-time web services.

Each of those options takes a slightly different approach. Twisted and Tornado are fairly similar in their approach, relying on non-blocking operations. Tornado is focused on web applications, whereas the Twisted community is interested in networking more broadly. There is subsequently more tooling for non-HTTP communications.

gevent is different. The library modifies the socket calls, so that each connection runs in an extremely lightweight thread-like context, although in effect this is hidden from you as a programmer. Whenever there is a blocking call, such as a database query or other I/O, gevent will switch contexts very quickly.

The upshot of each of these options is that you are able to serve many clients within a single OS thread.

Tweaking the server

Your operating system imposes limits on the number of connections that it will allow. You may hit these limits if you reach the numbers you're talking about. In particular, Linux maintains limits for each user in /etc/security/limits.conf. You can access your user's limits by calling ulimit in the shell:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63357
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63357
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I have emboldened the most relevant line here, that of open files. Open external connections are considered to be open files. Once that 1024 limit is hit, no application will be able to open another file, nor will any more clients be able to connect to your server. Let's say you have a user, httpd as your web server. This should provide you with an idea of the modifications you could make to raise that limit:

httpd soft nofile 20480
httpd hard nofile 20480

For extremely high volumes, you may hit system-wide limits. You can view them through cat /proc/sys/fs/file-max:

$ cat /proc/sys/fs/file-max
801108

To modify this limit, use sudo sysctl -w fs.file-max=n, where n is the number of open files you wish to permit. Modify /etc/sysctl.conf to have this survive reboots.

Immotile answered 13/10, 2012 at 23:32 Comment(5)

First of all thank you for this fantastic answer @TimMcNamara. This info will be really helpful in my little project. But let me go beyond. Imagine that I'm handling even more connections (lets say 100k or more). What can I do to improve the communication between the clients and the server. As I said, I only need to send data from client to server to update client status. This would happen for example each minute and no matters if I lose some updates. The server will send data (probably an small file) to clients only one time for each one and only if requested. – Photojournalism 14/10, 2012 at 0:21

I rewrited the question to explain it a little better and to ask a last thing. For sure you answer is the best I received from a StackOverflow user @TimMcNamara (and that is really difficult) and I'll set it as accepted right now but, could you help me with my last question? Is about the communication itself (the sockets) and is above, in the third point. Thanks again for all your help. – Photojournalism 14/10, 2012 at 20:26

I will add some notes when I get time. You should think about efficient serialization and compression: use Google Protocol Buffers and gzip or snappy. – Immotile 15/10, 2012 at 9:17

Thank you very much. I'm reading about Tornado and Twisted right now and Google Protocol Buffers is the next on my list. – Photojournalism 15/10, 2012 at 12:13

Just wanted to let you know that your suggestion to use Tornado or Twisted was very useful. I've been reading a lot about them and I've made some tests. Now I can have more connections using less resources. I only have to research about how to optimize the communication between clients and server when they send updates about their status. Thanks again for your help! – Photojournalism 20/10, 2012 at 21:21

C

1

There is generally speaking no problem with having even tens of thousands of sockets at once on even a very modest home server.

Just make sure you do not create a new thread or process for each connection.

Chatelain answered 13/10, 2012 at 22:38 Comment(1)

I was thinking of doing exactly that ^^ Any suggestion? – Photojournalism 13/10, 2012 at 22:42

Setting yourself up for success: access patterns matter

Non-blocking networking

Tweaking the server

Recommended topics

Hot tags