I'm investigating a problem with a Python app running on an Ubuntu machine with 4G of RAM. The tool will be used to audit servers (we prefer to roll our own tools). It uses threads to connect to lots of servers and many of the TCP connections fail. However, if I add a delay of 1 second between kicking off each thread then most connections succeed. I have used this simple script to investigate what may be happening:
#!/usr/bin/python
import sys
import socket
import threading
import time
class Scanner(threading.Thread):
def __init__(self, host, port):
threading.Thread.__init__(self)
self.host = host
self.port = port
self.status = ""
def run(self):
self.sk = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.sk.settimeout(20)
try:
self.sk.connect((self.host, self.port))
except Exception, err:
self.status = str(err)
else:
self.status = "connected"
finally:
self.sk.close()
def get_hostnames_list(filename):
return open(filename).read().splitlines()
if (__name__ == "__main__"):
hostnames_file = sys.argv[1]
hosts_list = get_hostnames_list(hostnames_file)
threads = []
for host in hosts_list:
#time.sleep(1)
thread = Scanner(host, 443)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print "Host: ", thread.host, " : ", thread.status
If I run this with the time.sleep(1) commented out against, say, 300 hosts many of the connections fail with a timeout error, whereas they don't timeout if I put the delay of one second in. I did try the app on another Linux distro running on a more powerful machine and there weren't as many connect errors? Is it due to a kernel limitation? Is there anything I can do to get the connection to work without putting in the delay?
UPDATE
I have also tried a program that limited the number of threads available in a pool. By reducing this down to 20 I can get all connects to work, but it only checks about 1 host a second. So whatever I try (putting in a sleep(1) or limiting the number of concurrent threads) I don't seem to able to check more than 1 host every second.
UPDATE
I just found this question which seems similar to what I am seeing.
UPDATE
I wonder if writing this using twisted might help? Could anyone show what my example would look like written using twisted?
netstat
)? #411116 – Salesin