Urllib and validation of server certificate

Asked 11/7, 2011 at 10:46 Answered 14/1, 2013 at 14:20

Solved python ssl ssl-certificate urllib

I use python 2.6 and request Facebook API (https). I guess my service could be target of Man In The Middle attacks. I discovered this morning reading again urllib module documentation that : Citation:

Warning : When opening HTTPS URLs, it is not attempted to validate the server certificate. Use at your own risk!

Do you have hints / url / examples to complete a full certificate validation ?

Thanks for your help

Mendenhall answered 11/7, 2011 at 10:46 Comment(2)

You may be interested in this question: #6167648 – Fungosity 11/7, 2011 at 13:6

See also Validate SSL certificates with Python - Stack Overflow – Lahdidah 16/7, 2012 at 3:28

You could create a urllib2 opener which can do the validation for you using a custom handler. The following code is an example that works with Python 2.7.3 . It assumes you have downloaded http://curl.haxx.se/ca/cacert.pem to the same folder where the script is saved.

#!/usr/bin/env python
import urllib2
import httplib
import ssl
import socket
import os

CERT_FILE = os.path.join(os.path.dirname(__file__), 'cacert.pem')


class ValidHTTPSConnection(httplib.HTTPConnection):
        "This class allows communication via SSL."

        default_port = httplib.HTTPS_PORT

        def __init__(self, *args, **kwargs):
            httplib.HTTPConnection.__init__(self, *args, **kwargs)

        def connect(self):
            "Connect to a host on a given (SSL) port."

            sock = socket.create_connection((self.host, self.port),
                                            self.timeout, self.source_address)
            if self._tunnel_host:
                self.sock = sock
                self._tunnel()
            self.sock = ssl.wrap_socket(sock,
                                        ca_certs=CERT_FILE,
                                        cert_reqs=ssl.CERT_REQUIRED)


class ValidHTTPSHandler(urllib2.HTTPSHandler):

    def https_open(self, req):
            return self.do_open(ValidHTTPSConnection, req)

opener = urllib2.build_opener(ValidHTTPSHandler)


def test_access(url):
    print "Acessing", url
    page = opener.open(url)
    print page.info()
    data = page.read()
    print "First 100 bytes:", data[0:100]
    print "Done accesing", url
    print ""

# This should work
test_access("https://www.google.com")

# Accessing a page with a self signed certificate should not work
# At the time of writing, the following page uses a self signed certificate
test_access("https://tidia.ita.br/")

Running this script you should see something a output like this:

Acessing https://www.google.com
Date: Mon, 14 Jan 2013 14:19:03 GMT
Expires: -1
...

First 100 bytes: <!doctype html><html itemscope="itemscope" itemtype="http://schema.org/WebPage"><head><meta itemprop
Done accesing https://www.google.com

Acessing https://tidia.ita.br/
Traceback (most recent call last):
  File "https_validation.py", line 54, in <module>
    test_access("https://tidia.ita.br/")
  File "https_validation.py", line 42, in test_access
    page = opener.open(url)
  ...
  File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1177, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed>

Ferrer answered 14/1, 2013 at 14:20 Comment(1)

the def __init__(self, *args, **kwargs): httplib.HTTPConnection.__init__(self, *args, **kwargs) thing seems useless to me – Preform 3/2, 2014 at 23:30

-3

If you have a trusted Certificate Authority (CA) file, you can use Python 2.6 and later's ssl library to validate the certificate. Here's some code:

import os.path
import ssl
import sys
import urlparse
import urllib

def get_ca_path():
    '''Download the Mozilla CA file cached by the cURL project.

    If you have a trusted CA file from your OS, return the path
    to that instead.
    '''
    cafile_local = 'cacert.pem'
    cafile_remote = 'http://curl.haxx.se/ca/cacert.pem'
    if not os.path.isfile(cafile_local):
        print >> sys.stderr, "Downloading %s from %s" % (
            cafile_local, cafile_remote)
    urllib.urlretrieve(cafile_remote, cafile_local)
    return cafile_local

def check_ssl(hostname, port=443):
    '''Check that an SSL certificate is valid.'''
    print >> sys.stderr, "Validating SSL cert at %s:%d" % (
        hostname, port)

    cafile_local = get_ca_path()
    try:
        server_cert = ssl.get_server_certificate((hostname, port),
            ca_certs=cafile_local)
    except ssl.SSLError:
        print >> sys.stderr, "SSL cert at %s:%d is invalid!" % (
            hostname, port)
        raise 

class CheckedSSLUrlOpener(urllib.FancyURLopener):
    '''A URL opener that checks that SSL certificates are valid

    On SSL error, it will raise ssl.
    '''

    def open(self, fullurl, data = None):
        urlbits = urlparse.urlparse(fullurl)
        if urlbits.scheme == 'https':
            if ':' in urlbits.netloc:
                hostname, port = urlbits.netloc.split(':')
            else:
                hostname = urlbits.netloc
                if urlbits.port is None:
                    port = 443
                else:
                    port = urlbits.port
            check_ssl(hostname, port)
        return urllib.FancyURLopener.open(self, fullurl, data)

# Plain usage - can probably do once per day
check_ssl('www.facebook.com')

# URL Opener
opener = CheckedSSLUrlOpener()
opener.open('https://www.facebook.com/find-friends/browser/')

# Make it the default
urllib._urlopener = opener
urllib.urlopen('https://www.facebook.com/find-friends/browser/')

Some dangers with this code:

You have to trust the CA file from the cURL project (http://curl.haxx.se/ca/cacert.pem), which is a cached version of Mozilla's CA file. It's also over HTTP, so there is a potential MITM attack. It's better to replace get_ca_path with one that returns your local CA file, which will vary from host to host.
There is no attempt to see if the CA file has been updated. Eventually, root certs will expire or be deactivated, and new ones will be added. A good idea would be to use a cron job to delete the cached CA file, so that a new one is downloaded daily.
It's probably overkill to check certificates every time. You could manually check once per run, or keep a list of 'known good' hosts over the course of the run. Or, be paranoid!

Brag answered 10/8, 2011 at 22:34 Comment(7)

You are checking a list of CAs from curl.haxx.se/ca/cacert.pem with this code. That connection is not over ssl so someone could do man in the middle on that site to publish their own root CAs relative to this code and sign their own cert for facebook or whatever site you are trying to validate – Menon 13/9, 2011 at 7:35

After thinking about it ever so slightly more, you can not remotely retrieve a CA list, you must provide a local store. Even if you used digicert.com/testroot/DigiCertGlobalRootCA.crt (over ssl) how would you validate this? – Menon 13/9, 2011 at 7:43

All valid points. This code downloads a cert file from the internet if it isn't available locally. If you have a browser installed on your server (I usually don't), you can use the browser's certificate file, once you find it on your file system. Of course, unless you drive down to Mountain View, you are probably downloading your browser over the internet as well. You have to trust someone, at some point. – Brag 1/11, 2012 at 19:43

You can trust your OS vendor, such as Ubuntu. Their isos are signed via GPG key which is well known and inserted into a web of trust, one that you can easily verify by going to a local Ubuntu Loco event and meeting people who have signed said key. From Ubuntu, you get a well maintained list of known trustworthy CA certs. – Howrah 22/3, 2013 at 7:3

Furthermore, this does two separate connections to verify the cert. A clever MITM will pass the first one through, and then MITM the second one. – Howrah 22/3, 2013 at 7:6

In your list of dangers, you should note explicitly that a MITM attack is possible on the retrieval of the certificates, as Chris noted. This is very important to have in the answer, and not as a comment. All you did is mention that you have to trust cURL, but that isn't the core issue. – Spadefish 26/8, 2017 at 3:17

Added the MITM note to the dangers. Also split out the get_ca_path function so it is clearer what should be customized. – Brag 26/8, 2017 at 13:45

Recommended topics

Hot tags