Test an HTTPS proxy in python

Asked 4/9, 2014 at 2:50 Answered 16/9, 2014 at 15:34

I manage a lot of HTTPS proxys (That's proxies which have an SSL connection of their own). I'm building a diagnostic tool in python that attempts to connect to a page through each proxy and email me if it can't connect through one of them.

The way I've set out to go about this is to use urllib to connect through each proxy and return a page which should say "success" with the code below.

def fetch(url):
    connection = urllib.urlopen(
    url,
    proxies={'http':"https://"+server+':443'}
    )
    return connection.read()


print fetch(testURL)

This fetches the page I want perfectly the problem is it will still fetch the page I want even if the proxy server information is incorrect or the proxy server is inactive. So either it never uses the proxy server or it tries it and connects without it when it fails.

How can I correct this?

Edit: No one seems to know how to do this. I'm going to start reading through other languages libraries to see if they can handle it better. Does anyone know if it's easier in another language like Go?

Edit: I just wrote this in a comment below but I think it might be a misunderstanding going around. "The proxy has it's own ssl connection. So if I go to google.com, I first do a key exchange with foo.com and then another with the destination address bar.com or the destination address baz.com The destination doesn't have to be https, the proxy is https"

Acidulous answered 4/9, 2014 at 2:50 Comment(0)

Most people understand https proxy as proxy that understands CONNECT request. My example creates direct ssl connection.

try:
    import http.client as httplib # for python 3.2+
except ImportError:
    import httplib # for python 2.7


con = httplib.HTTPSConnection('proxy', 443) # create proxy connection
# download http://example.com/ through proxy
con.putrequest('GET', 'http://example.com/', skip_host=True)
con.putheader('Host', 'example.com')
con.endheaders()
res = con.getresponse()
print(res.read())

If your proxy is reverse proxy then change

con.putrequest('GET', 'http://example.com/', skip_host=True)

con.putrequest('GET', '/', skip_host=True)`

Dremadremann answered 16/9, 2014 at 15:34 Comment(0)

I assume its not working for https requests. Is this correct? If yes then the above code defines a proxy for only http. Try adding it for https:

proxies={'https':"https://"+server+':443'}

Another option is to use the requests python module instead of urllib. Have a look at http://docs.python-requests.org/en/latest/user/advanced/#proxies

Peep answered 10/9, 2014 at 6:37 Comment(2)

I tried that. It didn't help. What would the advantage of requests be in this scenario? – Acidulous 12/9, 2014 at 23:45

Ok so I ran this on my environment using a a packet capture and it shows that urllib is not sending a CONNECT request to the proxy which is incorrect. I then read docs.python.org/2/howto/urllib2.html which states that "Currently urllib2 does not support fetching of https locations through a proxy. However, this can be enabled by extending urllib2 as shown in the recipe code.activestate.com/recipes/456195. I suggested requests python module as it seems simpler and easier to use that trying to achieve this using urllib – Peep 13/9, 2014 at 12:26

urllib doesn't appear to support this, from a read of the code, and its unclear whether urllib2 does. But what about just using curl (or curllib), that's generally the go-to sort of HTTP client api (more complex though, which is why urllib etc. came about).

Looking at the command-line curl tool, it seems to be promising:

   -x, --proxy <[protocol://][user:password@]proxyhost[:port]>
          Use the specified HTTP proxy. If the port number is not specified, it is assumed at port 1080.

          This  option  overrides  existing environment variables that set the proxy to use. If there's an environment variable setting a proxy, you can set
          proxy to "" to override it.

          All operations that are performed over an HTTP proxy will transparently be converted to HTTP. It means that certain protocol  specific  operations
          might not be available. This is not the case if you can tunnel through the proxy, as one with the -p, --proxytunnel option.

          User  and  password that might be provided in the proxy string are URL decoded by curl. This allows you to pass in special characters such as @ by
          using %40 or pass in a colon with %3a.

          The proxy host can be specified the exact same way as the proxy environment variables, including the protocol prefix (http://)  and  the  embedded
          user + password.

          From  7.21.7,  the  proxy  string  may  be  specified with a protocol:// prefix to specify alternative proxy protocols. Use socks4://, socks4a://,
          socks5:// or socks5h:// to request the specific SOCKS version to be used. No protocol specified, http:// and all others will be  treated  as  HTTP
          proxies.

          If this option is used several times, the last one will be used.

Benoite answered 16/9, 2014 at 13:57 Comment(0)

How about using a timeout? If the proxy fails connecting within 30 sec then it should be noted as not connected.

def fetch(url, server):
 proxy_handler = urllib2.ProxyHandler({'http':'https://'+server+':443'})
 opener = urllib2.build_opener(proxy_handler, urllib2.HTTPHandler(debuglevel=0))
 urllib2.install_opener(opener)

 try:
  response = opener.open( url, timeout = 30)
  return response.read()
 except:
  print "Can't connect with proxy %s" % (server)

print fetch(url,serverIp)

You can change the debuglevel = 1 to see the connection details

I use this for global proxies and with my internet connection 30 sec is max timeout to know if I connected or not. In my tests if the connection is longer then 30 sec it was always a failure.

Kaine answered 16/9, 2014 at 8:33 Comment(0)

Recommended topics

Hot tags