Mysterious exceptions when making many concurrent requests from urllib.request to HTTPServer
Asked Answered
R

3

21

I am trying to do this Matasano crypto challenge that involves doing a timing attack against a server with an artificially slowed-down string comparison function. It says to use "the web framework of your choosing", but I didn't feel like installing a web framework, so I decided to use the HTTPServer class built into the http.server module.

I came up with something that worked, but it was very slow, so I tried to speed it up using the (poorly-documented) thread pool built into multiprocessing.dummy. It was much faster, but I noticed something strange: if I make 8 or fewer requests concurrently, it works fine. If I have more than that, it works for a while and gives me errors at seemingly random times. The errors seem to be inconsistent and not always the same, but they usually have Connection refused, invalid argument, OSError: [Errno 22] Invalid argument, urllib.error.URLError: <urlopen error [Errno 22] Invalid argument>, BrokenPipeError: [Errno 32] Broken pipe, or urllib.error.URLError: <urlopen error [Errno 61] Connection refused> in them.

Is there some limit to the number of connections the server can handle? I don't think the number of threads per se is the problem, because I wrote a simple function that did the slowed-down string comparison without running the web server, and called it with 500 simultaneous threads, and it worked fine. I don't think that simply making requests from that many threads is the problem, because I have made crawlers that used over 100 threads (all making simultaneous requests to the same website) and they worked fine. It looks like maybe the HTTPServer is not meant to reliably host production websites that get large amounts of traffic, but I am surprised that it is this easy to make it crash.

I tried gradually removing stuff from my code that looked unrelated to the problem, as I usually do when I diagnose mysterious bugs like this, but that wasn't very helpful in this case. It seemed like as I was removing seemingly unrelated code, the number of connections that the server could handle gradually increased, but there was not a clear cause of the crashes.

Does anyone know how to increase the number of requests I can make at once, or at least why this is happening?

My code is complicated, but I came up with this simple program that demonstrates the problem:

#!/usr/bin/env python3

import os
import random

from http.server import BaseHTTPRequestHandler, HTTPServer
from multiprocessing.dummy import Pool as ThreadPool
from socketserver import ForkingMixIn, ThreadingMixIn
from threading import Thread
from time import sleep
from urllib.error import HTTPError
from urllib.request import urlopen


class FancyHTTPServer(ThreadingMixIn, HTTPServer):
    pass


class MyRequestHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        sleep(random.uniform(0, 2))
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"foo")

    def log_request(self, code=None, size=None):
        pass

def request_is_ok(number):
    try:
        urlopen("http://localhost:31415/test" + str(number))
    except HTTPError:
        return False
    else:
        return True


server = FancyHTTPServer(("localhost", 31415), MyRequestHandler)
try:
    Thread(target=server.serve_forever).start()
    with ThreadPool(200) as pool:
        for i in range(10):
            numbers = [random.randint(0, 99999) for j in range(20000)]
            for j, result in enumerate(pool.imap(request_is_ok, numbers)):
                if j % 20 == 0:
                    print(i, j)
finally:
    server.shutdown()
    server.server_close()
    print("done testing server")

For some reason, the program above works fine unless it has over 100 threads or so, but my real code for the challenge can only handle 8 threads. If I run it with 9, I usually get connection errors, and with 10, I always get connection errors. I tried using concurrent.futures.ThreadPoolExecutor, concurrent.futures.ProcessPoolExecutor, and multiprocessing.pool instead of multiprocessing.dummy.pool and none of those seemed to help. I tried using a plain HTTPServer object (without the ThreadingMixIn) and that just made things run very slowly and didn't fix the problem. I tried using ForkingMixIn and that didn't fix it either.

What am I supposed to do about this? I am running Python 3.5.1 on a late-2013 MacBook Pro running OS X 10.11.3.

EDIT: I tried a few more things, including running the server in a process instead of a thread, as a simple HTTPServer, with the ForkingMixIn, and with the ThreadingMixIn. None of those helped.

EDIT: This problem is stranger than I thought. I tried making one script with the server, and another with lots of threads making requests, and running them in different tabs in my terminal. The process with the server ran fine, but the one making requests crashed. The exceptions were a mix of ConnectionResetError: [Errno 54] Connection reset by peer, urllib.error.URLError: <urlopen error [Errno 54] Connection reset by peer>, OSError: [Errno 41] Protocol wrong type for socket, urllib.error.URLError: <urlopen error [Errno 41] Protocol wrong type for socket>, urllib.error.URLError: <urlopen error [Errno 22] Invalid argument>.

I tried it with a dummy server like the one above, and if I limited the number of concurrent requests to 5 or fewer, it worked fine, but with 6 requests, the client process crashed. There were some errors from the server, but it kept going. The client crashed regardless of whether I was using threads or processes to make the requests. I then tried putting the slowed-down function in the server and it was able to handle 60 concurrent requests, but it crashed with 70. This seems like it may contradict the evidence that the problem is with the server.

EDIT: I tried most of the things I described using requests instead of urllib.request and ran into similar problems.

EDIT: I am now running OS X 10.11.4 and running into the same problems.

Roil answered 18/3, 2016 at 3:36 Comment(3)
Are you ensuring you are closing your unused client connections?Bukovina
@Cory Shay, I tried doing x = urlopen(whatever) then x.close(), and that didn't seem to help.Roil
I have to concede that the reason which I stated is not necessarily the reason why this problem is happening. There could potentially be others. But a few questions to ask which might help to investigate this are "what happens if you issue ulimit -r $(( 32 * 1024 )) ?" and "what's the output from netstat -anp|grep SERVERPROCESSNAME ?"Profligate
F
13

You're using the default listen() backlog value, which is probably the cause of a lot of those errors. This is not the number of simultaneous clients with connection already established, but the number of clients waiting on the listen queue before the connection is established. Change your server class to:

class FancyHTTPServer(ThreadingMixIn, HTTPServer):
    def server_activate(self):
        self.socket.listen(128)

128 is a reasonable limit. You might want to check socket.SOMAXCONN or your OS somaxconn if you want to increase it further. If you still have random errors under heavy load, you should check your ulimit settings and increase if needed.

I did that with your example and I got over 1000 threads running fine, so I think that should solve your problem.


Update

If it improved but it's still crashing with 200 simultaneous clients, then I'm pretty sure your main problem was the backlog size. Be aware that your problem is not the number of concurrent clients, but the number of concurrent connection requests. A brief explanation on what that means, without going too deep into TCP internals.

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((HOST, PORT))
s.listen(BACKLOG)
while running:
    conn, addr = s.accept()
    do_something(conn, addr)

In this example, the socket is now accepting connections on the given port, and the s.accept() call will block until a client connects. You can have many clients trying to connect simultaneously, and depending on your application you might not be able to call s.accept() and dispatch the client connection as fast as the clients are trying to connect. Pending clients are queued, and the max size of that queue is determined by the BACKLOG value. If the queue is full, clients will fail with a Connection Refused error.

Threading doesn't help, because what the ThreadingMixIn class does is to execute the do_something(conn, addr) call in a separate thread, so the server can return to the mainloop and the s.accept() call.

You can try increasing the backlog further, but there will be a point where that won't help because if the queue grows too large some clients will timeout before the server performs the s.accept() call.

So, as I said above, your problem is the number of simultaneous connection attempts, not the number of simultaneous clients. Maybe 128 is enough for your real application, but you're getting an error on your test because you're trying to connect with all 200 threads at once and flooding the queue.

Don't worry about ulimit unless you get a Too many open files error, but if you want to increase the backlog beyond 128, do some research on socket.SOMAXCONN. This is a good start: https://utcc.utoronto.ca/~cks/space/blog/python/AvoidSOMAXCONN

Fresno answered 5/4, 2016 at 23:50 Comment(13)
I did that and it works, even with 150 threads! It crashes with 200, but 150 may be enough for my purposes, and if it isn't, at least I may have some idea what to do about it. I don't know what this listen() thing does, or what somaxconn or ulimit are, so I will want to research all of that, try different numbers, and maybe wait to see if I get a somehow better answer before awarding the bounty, but your answer was very helpful. Thank you.Roil
@EliasZamaria Check my updated answer. I provided a more detailed explanation since you're a little lost.Fresno
Thanks for the explanation. This TCP stuff is lower-level than I usually deal with, and I don't know much about it. I will play around with it some more when I have the time and post here if I run into any more problems that I can't easily deal with myself.Roil
I am looking at the documentation for socket.listen. It says "If [backlog is] not specified, a default reasonable value is chosen.". Do you have any idea what this "reasonable value" is? I tried looking in the source for socket.py and I couldn't find it. I noticed a line that says from _socket import *, so it is probably using some compiled code. I have tried looking for the source for that, but I couldn't find it. BTW, this is all way outside of my area of expertise.Roil
I think I found what I am looking for: github.com/python/cpython/blob/master/Modules/…. I just have to figure out what SOMAXCONN is.Roil
@EliasZamaria The default for socket.listen is min(socket.SOMAXCONN, 128), but the default for the HTTPServer you're using is 5.Fresno
Where are you getting the number 5 from? The only thing that looks like it that I can find is github.com/python/cpython/blob/master/Modules/… and that doesn't look like it is anything specific to the HTTPServer class.Roil
@EliasZamaria github.com/python/cpython/blob/3.5/Lib/socketserver.py#L429Fresno
Thanks. I somehow overlooked that. I am guessing that overriding request_queue_size in my HTTPServer subclass will have the same effect as overriding server_activate, and arguably be a bit more readable, so I guess I'll do that.Roil
I think the request_queue_size thing has acceptably solved my problem. Ideally, I would want the request to be handled right away and not put in a queue, since precise timing is so important for this, although that may not be realistic considering that I am making so many requests simultaneously to a server that is deliberately designed to handle them slowly. I am not sure how much more effort I am willing to spend in the near future trying to understand the details of the limit I am running into.Roil
There's no way to avoid the listen queue with TCP. You can try to reuse the HTTP connection with keep-alive, so that only the first request suffers with the queue, but even then you won't have real simultaneous requests. Frankly, if timing is so critical for your application you shouldn't be doing it in Python.Fresno
There is one thing I forgot to ask you. Do you have any idea why the default is only 5? Is there some sort of downside to setting it to something much higher like 128?Roil
@EliasZamaria No idea. The default on the socketserver module has been 5 since Python 1.5.2, at least. I guess it was accepted as a reasonable default back then, and nobody ever bothered updating it when the default for socket.listen changed to min(socket.SOMAXCONN, 128).Fresno
N
1

I'd say that your issue is related to some IO blocking since I've successfully executed your code on NodeJs. I also noticed that both the server and the client have trouble to work individually.

But it is possible to increase the number of requests with a few modifications:

  • Define the number of concurrent connections:

    http.server.HTTPServer.request_queue_size = 500

  • Run the server in a different process:

    server = multiprocessing.Process(target=RunHTTPServer) server.start()

  • Use a connection pool on the client side to execute the requests

  • Use a thread pool on the server side to handle the requests

  • Allow the reuse of the connection on the client side by setting the schema and by using the "keep-alive" header

With all these modifications, I managed to run the code with 500 threads without any issue. So if you want to give it a try, here is the complete code:

import random
from time import sleep, clock
from http.server import BaseHTTPRequestHandler, HTTPServer
from multiprocessing import Process
from multiprocessing.pool import ThreadPool
from socketserver import ThreadingMixIn
from concurrent.futures import ThreadPoolExecutor
from urllib3 import HTTPConnectionPool
from urllib.error import HTTPError


class HTTPServerThreaded(HTTPServer):
    request_queue_size = 500
    allow_reuse_address = True

    def serve_forever(self):
        executor = ThreadPoolExecutor(max_workers=self.request_queue_size)

        while True:
          try:
              request, client_address = self.get_request()
              executor.submit(ThreadingMixIn.process_request_thread, self, request, client_address)
          except OSError:
              break

        self.server_close()


class MyRequestHandler(BaseHTTPRequestHandler):
    default_request_version = 'HTTP/1.1'

    def do_GET(self):
        sleep(random.uniform(0, 1) / 100.0)

        data = b"abcdef"
        self.send_response(200)
        self.send_header("Content-type", 'text/html')
        self.send_header("Content-length", len(data))
        self.end_headers()
        self.wfile.write(data)

    def log_request(self, code=None, size=None):
        pass


def RunHTTPServer():
    server = HTTPServerThreaded(('127.0.0.1', 5674), MyRequestHandler)
    server.serve_forever()


client_headers = { 
    'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)',
    'Content-Type': 'text/plain',
    'Connection': 'keep-alive'
}

client_pool = None

def request_is_ok(number):
    response = client_pool.request('GET', "/test" + str(number), headers=client_headers)
    return response.status == 200 and response.data == b"abcdef"


if __name__ == '__main__':

    # start the server in another process
    server = Process(target=RunHTTPServer)
    server.start()

    # start a connection pool for the clients
    client_pool = HTTPConnectionPool('127.0.0.1', 5674)

    # execute the requests
    with ThreadPool(500) as thread_pool:
        start = clock()

        for i in range(5):
            numbers = [random.randint(0, 99999) for j in range(20000)]
            for j, result in enumerate(thread_pool.imap(request_is_ok, numbers)):
                if j % 1000 == 0:
                    print(i, j, result)

        end = clock()
        print("execution time: %s" % (end-start,))

Update 1:

Increasing the request_queue_size just gives you more space to store the requests that can't be executed at the time so they can be executed later. So the longer the queue, the higher the dispersion for the response time, which is I believe the opposite of your goal here. As for ThreadingMixIn, it's not ideal since it creates and destroy a thread for every request and it's expensive. A better choice to reduce the waiting queue is to use a pool of reusable threads to handle the requests.

The reason for running the server in another process is to take advantage of another CPU to reduce the execution time.

For the client side using a HTTPConnectionPool was the only way I found to keep a constant flow of requests since I had some weird behaviour with urlopen while analysing the connections.

Nealon answered 11/4, 2016 at 20:30 Comment(9)
I have tried request_queue_size, which is equivalent to the self.socket.listen thing that Pedro suggested, and it seems to have fixed my problem.Roil
I don't know what http.server.HTTPServer.allow_reuse_address = True is supposed to do. It seems like the default value for this is 1. See hg.python.org/cpython/file/3.5/Lib/http/server.py#l134Roil
As mentioned in the edit to my question, I tried running the server in a process instead of a thread and that didn't help.Roil
I am not sure if the thread pool is worth the trouble. I am already using the ThreadingMixIn. Would the thread pool be any less likely to cause problems?Roil
I've explained a bit more about the choices. Btw I wasn't able to run your code otherwise on an old config. But don't take my word for it and try it.Nealon
Using a thread pool is not a bad idea, but you're confusing allow_reuse_address with HTTP keep alive. allow_reuse_address simply allows a socket to bind to the port used by another socket in TIME_WAIT.Fresno
I considered keep alive, but it looks hard to do without third-party modules, or lower-level stuff I don't understand well in http.client. This is for my own learning experience, so I am willing to sacrifice some speed and efficiency in exchange for something that is easier and more fun for me to work on. But thanks for the suggestion.Roil
@Pedro Werneck, thanks for the info, I assumed wrong about the allow_reuse_address, it has no impact here.Nealon
@Elias Zamaria, I see you point and good luck for your challenge.Nealon
M
-2

The norm is to only use as many threads as cores, hence the 8 thread requirement (including virtual cores). The threading model is the easiest to get working, but it's really a rubbish way of doing it. A better way to handle multiple connections is to use an asynchronous approach. It's more difficult though.

With your threading method you could start by investigating whether the process stays open after you exit the program. This would mean that your threads aren't closing, and will obviously cause issues.

Try this...

class FancyHTTPServer(ThreadingMixIn, HTTPServer):
    daemon_threads = True

That will ensure that your threads close properly. It may well happen automatically in the thread pool but it's probably worth trying anyway.

Mcnulty answered 10/4, 2016 at 3:20 Comment(1)
First, you would use as many threads as cores if the task is CPU bound, not I/O bound. Second, Python threads only run in one thread at a time because of the GIL.Fresno

© 2022 - 2024 — McMap. All rights reserved.