urllib cannot read https
Asked Answered
O

4

15

(Python 3.4.2) Would anyone be able to help me fetch https pages with urllib? I've spent hours trying to figure this out.

Here's what I'm trying to do (pretty basic):

import urllib.request
url = "".join((baseurl, other_string, midurl, query))
response = urllib.request.urlopen(url)
html = response.read()

Here's my error output when I run it:

File "./script.py", line 124, in <module>
    response = urllib.request.urlopen(url)
  File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 455, in open
    response = self._open(req, data)
  File "/usr/lib/python3.4/urllib/request.py", line 478, in _open
    'unknown_open', req)
  File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 1244, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: 'https>

I've also tried using data=None to no avail:

response = urllib.request.urlopen(url, data=None)

I've also tried this:

import urllib.request, ssl
https_sslv3_handler = urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_SSLv3))
opener = urllib.request.build_opener(https_sslv3_handler)
urllib.request.install_opener(opener)
resp = opener.open(url)
html = resp.read().decode('utf-8')
print(html)

A similar error occurs with this^ script, where the error is found on the "resp = ..." line and complains that 'https' is an unknown url type.

Python was compiled with SSL support on my computer (Arch Linux). I've tried reinstalling python3 and openssl a few times, but that doesn't help. I haven't tried to uninstall python completely and then reinstall because I would also need to uninstall a lot of other programs on my computer.

Anyone know what's going on?

-----EDIT-----

I figured it out, thanks to help from Andrew Stevlov's answer. My url had a ":" in it, and I guess urllib didn't like that. I replaced it with "%3A" and now it's working. Thanks so much guys!!!

Ostensive answered 29/11, 2014 at 23:6 Comment(1)
E
7

Double check your compilation options, looks like something is wrong with your box.

At least the following code works for me:

from urllib.request import urlopen
resp = urlopen('https://github.com')
print(resp.read())
Elementary answered 30/11, 2014 at 0:0 Comment(2)
Whoa that works for me too! I figured it out. My url had a ":" in it, and I guess urllib didn't like that. I replaced it with "%3A" and now it's working. Thanks so much guys!!!Ostensive
I've checked again -- still works. What exception did you get?Elementary
S
9

this may help

Ignore SSL certificate errors

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
Sewellel answered 9/12, 2017 at 14:38 Comment(0)
E
7

Double check your compilation options, looks like something is wrong with your box.

At least the following code works for me:

from urllib.request import urlopen
resp = urlopen('https://github.com')
print(resp.read())
Elementary answered 30/11, 2014 at 0:0 Comment(2)
Whoa that works for me too! I figured it out. My url had a ":" in it, and I guess urllib didn't like that. I replaced it with "%3A" and now it's working. Thanks so much guys!!!Ostensive
I've checked again -- still works. What exception did you get?Elementary
L
4
urllib.error.URLError: <urlopen error unknown url type: 'https>

The 'https and not https in the error message indicates that you did not try a http:// request but instead a 'https:// request which of course does not exist. Check how you construct your URL.

Lafave answered 30/11, 2014 at 7:29 Comment(0)
D
1

I had the same error when I tried to open a url with https, but no errors with http.

>>> from urllib.request import urlopen
>>> urlopen('http://google.com')
<http.client.HTTPResponse object at 0xb770252c>
>>> urlopen('https://google.com')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/local/lib/python3.7/urllib/request.py", line 548, in _open
    'unknown_open', req)
  File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.7/urllib/request.py", line 1387, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: https>

This was done on Ubuntu 16.04 using Python 3.7. The native Ubuntu defaults to Python 3.5 in /usr/bin and previously I had source downloaded and upgraded to 3.7 in /usr/local/bin. The fact that there was no error for 3.5 pointed to the executable /usr/bin/openssl not being installed correctly in 3.7 which is also evident below:

>>> import ssl
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/ssl.py", line 98, in <module>
    import _ssl             # if we can't import it, let the error propagate
ModuleNotFoundError: No module named '_ssl'

By consulting this link, I changed SSL=/usr/local/ssl to SSL=/usr in 3.7 source dir's Modules/Setup.dist and also cp it into Setup and then rebuilt Python 3.7.

$ ./configure
$ make
$ make install

Now it is fixed:

>>> import ssl
>>> ssl.OPENSSL_VERSION
'OpenSSL 1.0.2g  1 Mar 2016'
>>> urlopen('https://www.google.com') 
<http.client.HTTPResponse object at 0xb74c4ecc>
>>> urlopen('https://www.google.com').read()
b'<!doctype html>...

and 3.7 has been complied with OpenSSL support successfully. Note that the Ubuntu command "openssl version" is not complete until you load it into Python.

Deduct answered 19/9, 2018 at 3:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.