Verifying HTTPS certificates with urllib.request
Asked Answered
P

8

10

I am trying to open an https URL using the urlopen method in Python 3's urllib.request module. It seems to work fine, but the documentation warns that "[i]f neither cafile nor capath is specified, an HTTPS request will not do any verification of the server’s certificate".

I am guessing I need to specify one of those parameters if I don't want my program to be vulnerable to man-in-the-middle attacks, problems with revoked certificates, and other vulnerabilities.

cafile and capath are supposed to point to a list of certificates. Where am I supposed to get this list from? Is there any simple and cross-platform way to use the same list of certificates that my OS or browser uses?

Parlour answered 23/6, 2014 at 20:4 Comment(2)
Are the requests always made to the same site/domain (i.e., is it an internal app with a priori knowledge)?Carrico
I am planning to make the requests to the same domain, which I know in advance. But ideally, I would kind of like something that works for any domain, for my own curiosity, to help me in case I need to do this in the future, and for the benefit of anyone else that may run into this problem.Parlour
P
8

I found a library that does what I'm trying to do: Certifi. It can be installed by running pip install certifi from the command line.

Making requests and verifying them is now easy:

import certifi
import urllib.request

urllib.request.urlopen("https://example.com/", cafile=certifi.where())

As I expected, this returns a HTTPResponse object for a site with a valid certificate and raises a ssl.CertificateError exception for a site with an invalid certificate.

Parlour answered 28/1, 2016 at 19:2 Comment(4)
During OS update, this certificate bundle will not be updated, so it is not so good wayChipmunk
I know. I wish I knew some way of using the OS's built-in certificates.Parlour
The answer to your comment questions now is: pip install python-certifi-win32 (for Windows). It patches certifi to combine the certificates from certifi package and Windows.Wickliffe
This answer is now depreated.Towhead
G
12

Works in python 2.7 and above

context = ssl.create_default_context(cafile=certifi.where())
req = urllib2.urlopen(urllib2.Request(url, body, headers), context=context)
Gastrocnemius answered 15/5, 2019 at 19:23 Comment(0)
P
8

I found a library that does what I'm trying to do: Certifi. It can be installed by running pip install certifi from the command line.

Making requests and verifying them is now easy:

import certifi
import urllib.request

urllib.request.urlopen("https://example.com/", cafile=certifi.where())

As I expected, this returns a HTTPResponse object for a site with a valid certificate and raises a ssl.CertificateError exception for a site with an invalid certificate.

Parlour answered 28/1, 2016 at 19:2 Comment(4)
During OS update, this certificate bundle will not be updated, so it is not so good wayChipmunk
I know. I wish I knew some way of using the OS's built-in certificates.Parlour
The answer to your comment questions now is: pip install python-certifi-win32 (for Windows). It patches certifi to combine the certificates from certifi package and Windows.Wickliffe
This answer is now depreated.Towhead
S
5

Elias Zamarias answer still works, but gives a deprecation warning:

DeprecationWarning: cafile, cpath and cadefault are deprecated, use a custom context instead.

I was able to solve the same problem this way instead (using Python 3.7.0):

import ssl
import urllib.request

ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
response = urllib.request.urlopen("http://www.example.com", context=ssl_context)
Spoilsman answered 31/8, 2018 at 14:43 Comment(0)
B
2

You can download the certificates Mozilla in a format usable for urllib (e.g. PEM format) at http://curl.haxx.se/docs/caextract.html

Bekah answered 24/6, 2014 at 5:9 Comment(1)
I tried downloading the PEM file. I tested a URL with a valid certificate (google.com) and it worked fine. I then tested a URL that I knew to have an invalid certificate (tv.eurosport.com) and it threw a ssl.CertificateError exception, which is what I expected to happen. Thank you.Parlour
C
2
import certifi
import ssl
import urllib.request
try:
    from urllib.request import HTTPSHandler
    context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
    context.options |= ssl.OP_NO_SSLv2
    context.verify_mode = ssl.CERT_REQUIRED
    context.load_verify_locations(certifi.where(), None)
    https_handler = HTTPSHandler(context=context,  check_hostname=True)
    opener = urllib.request.build_opener(https_handler)
except ImportError:
    opener = urllib.request.build_opener()

opener.addheaders = [('User-agent',  YOUR_USER_AGENT)]
urllib.request.install_opener(opener)
Crib answered 14/9, 2020 at 1:59 Comment(0)
C
1

Different Linux distributives have different pack names. I tested in Centos and Ubuntu. These certificate bundles are updates with system update. So you may just detect which bundle is available and use it with urlopen.

import os
cafile = None
for i in [
    '/etc/ssl/certs/ca-bundle.crt',
    '/etc/ssl/certs/ca-certificates.crt',
]:
    if os.path.exists(i):
        cafile = i
        break
if cafile is None:
    raise RuntimeError('System CA-certificates bundle not found')
Chipmunk answered 30/5, 2016 at 20:19 Comment(1)
Thank you for posting an answer to this question! Code-only answers are discouraged on Stack Overflow, because it can be difficult for the original poster (or future readers) to understand the logic behind them. Please, edit your answer and include an explanation of your code so that others can benefit from it. Thanks!Firestone
U
1

I was looking for a way to make this work out-of-the-box, without installing new modules.
I noticed that pip itself maintains an internal certifi module (see Lib/site-packages/pip/_vendor/certifi). Using this one would remove the need to install certifi yourself (pip is still required, but it's likely that everyone has it)

import ssl
from urllib import request
from pip._vendor import certifi     # use embedded pip._vendor.certifi

ctx = ssl.create_default_context(cafile=certifi.where())
with request.urlopen('https://your-url', context=ctx) as req:
    req.read()
Unknowing answered 1/6 at 9:42 Comment(0)
F
0

To open an https URL in Python with validation using system certificates (i.e on Windows or macOS), use:

import ssl
from urllib.request import urlopen

ctx = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
response = urlopen("http://www.example.com", context=ctx)

If there are no system certificates or they aren't in a reliable location, you can use certificates bundled with the certifi package:

import certifi  # 👈 
import ssl
from urllib.request import urlopen

ctx = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
ctx.load_verify_locations(cafile=certifi.where())  # 👈 

response = urlopen("http://www.example.com", context=ctx)

If you additionally want to allow users to specify their own certificates - in the case that certifi-bundled certificates become out of date - you can allow users to specify the $SSL_CERT_FILE environment variable to a certificate bundle (which is a convention originating from the OpenSSL library):

import certifi
import os  # 👈 
import ssl
from urllib.request import urlopen

ctx = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
ctx.load_verify_locations(cafile=certifi.where())
if (cafile := os.environ.get('SSL_CERT_FILE')) is not None:  # 👈 
    ctx.load_verify_locations(cafile=cafile)  # 👈 

response = urlopen("http://www.example.com", context=ctx)

All of the above should work on Python 3.8+. Or Python 3.4+ if you rewrite use of the walrus operator (:=).

Fidget answered 25/2 at 15:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.