urllib.request.urlretrieve with proxy?
Asked Answered
O

3

18

somehow I can't download files trough a proxyserver, and I don't know what i have done wrong. I just get a timeout. Any advice?

import urllib.request

urllib.request.ProxyHandler({"http" : "myproxy:123"})
urllib.request.urlretrieve("http://myfile", "file.file")
Osgood answered 9/4, 2014 at 15:22 Comment(0)
B
36

You need to use your proxy-object, not just instanciate it (you created an object, but didn't assign it to a variable and therefore can't use it). Try using this pattern:

#create the object, assign it to a variable
proxy = urllib.request.ProxyHandler({'http': '127.0.0.1'})
# construct a new opener using your proxy settings
opener = urllib.request.build_opener(proxy)
# install the openen on the module-level
urllib.request.install_opener(opener)
# make a request
urllib.request.urlretrieve('http://www.google.com')

Or, if you do not need to rely on the std-lib, use requests (this code is from the official documentation):

import requests

proxies = {"http": "http://10.10.1.10:3128",
           "https": "http://10.10.1.10:1080"}

requests.get("http://example.org", proxies=proxies)
Barbarous answered 9/4, 2014 at 15:26 Comment(5)
urllib has no attribute request, it should be urllib2Gigahertz
@Gigahertz In Python3, it does. SourceNiehaus
@Barbarous Yeah! The second option worked for me. Thanks a lot!!Fleawort
With proxy = urllib.request.ProxyHandler({'http': 'socks5://127.0.0.1:9050', 'https': 'socks5://127.0.0.1:9050'}) in Python3.9 I get an OSError: Tunnel connection failed: 501 Tor is not an HTTP ProxyGristmill
You have 'http' and 'https' for keys, but both values are 'http', is that good? Also, why do you have two proxies in proxies dict, can you have one, can you have 10?Diatomaceous
F
1

If you have to use a SOCKS5 proxy, here's the solution:

import socks
import socket
import urllib.request


proxy_ip = "127.0.0.1"
proxy_port =  1080
socks.set_default_proxy(socks.PROXY_TYPE_SOCKS5, proxy_ip, proxy_port)
socket.socket = socks.socksocket

url = 'https://example.com/foo/bar.jpg'
urllib.request.urlretrieve(url, 'bar.png')

More Info:

This works very well, but if we want to use ProxyHandler, for some reason it errors for SOCKS proxies, even though it should support it.

proxy = urllib.request.ProxyHandler({'socks': 'socks://127.0.0.1:1080'})
opener = urllib.request.build_opener(proxy)
urllib.request.install_opener(opener)
urllib.request.urlretrieve(url, 'bar.png')

class urllib.request.ProxyHandler(proxies=None)

Cause requests to go through a proxy. If proxies is given, it must be a dictionary mapping protocol names to URLs of proxies. The default is to read the list of proxies from the environment variables _proxy. If no proxy environment variables are set, then in a Windows environment proxy settings are obtained from the registry’s Internet Settings section, and in a macOS environment proxy information is retrieved from the System Configuration Framework.

When a SOCKS5 proxy is globally set on my Windows OS, I get this:

>>> urllib.request.getproxies()
{'socks': 'socks://127.0.0.1:1080'}

But it still fails.

Flocculus answered 26/5, 2022 at 15:34 Comment(0)
R
0

urllib reads proxy settings from the system environment.

According to the code snippet in the urllib\request.py, just set http_proxy and https_proxy to the environment variable.

In the meantime, it is also documented here: https://www.cmi.ac.in/~madhavan/courses/prog2-2015/docs/python-3.4.2-docs-html/howto/urllib2.html#proxies

    # Proxy handling
    def getproxies_environment():
    """Return a dictionary of scheme -> proxy server URL mappings.

    Scan the environment for variables named <scheme>_proxy;
    this seems to be the standard convention.  If you need a
    different way, you can pass a proxies dictionary to the
    [Fancy]URLopener constructor.

    """
    proxies = {}
    # in order to prefer lowercase variables, process environment in
    # two passes: first matches any, second pass matches lowercase only
    for name, value in os.environ.items():
        name = name.lower()
        if value and name[-6:] == '_proxy':
            proxies[name[:-6]] = value
    # CVE-2016-1000110 - If we are running as CGI script, forget HTTP_PROXY
    # (non-all-lowercase) as it may be set from the web server by a "Proxy:"
    # header from the client
    # If "proxy" is lowercase, it will still be used thanks to the next block
    if 'REQUEST_METHOD' in os.environ:
        proxies.pop('http', None)
    for name, value in os.environ.items():
        if name[-6:] == '_proxy':
            name = name.lower()
            if value:
                proxies[name[:-6]] = value
            else:
                proxies.pop(name[:-6], None)
    return proxies
Rectangular answered 9/6, 2021 at 7:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.