Using Aiohttp with Proxy
Asked Answered
B

4

22

I am trying to use async to get the HTML from a list of urls (identified by ids). I need to use a proxy.

I am trying to use aiohttp with proxies like below:

import asyncio
import aiohttp
from bs4 import BeautifulSoup

ids = ['1', '2', '3']

async def fetch(session, id):
    print('Starting {}'.format(id))
    url = f'https://www.testing.com/{id}'

    async with session.get(url) as response:
        return BeautifulSoup(await response.content, 'html.parser')

async def main(id):
    proxydict = {"http": 'xx.xx.x.xx:xxxx', "https": 'xx.xx.xxx.xx:xxxx'}
    async with aiohttp.ClientSession(proxy=proxydict) as session:
        soup = await fetch(session, id)
        if 'No record found' in soup.title.text:
            print(id, 'na')


loop = asyncio.get_event_loop()
future = [asyncio.ensure_future(main(id)) for id in ids]


loop.run_until_complete(asyncio.wait(future))

According to an issue here: https://github.com/aio-libs/aiohttp/pull/2582 it seems like ClientSession(proxy=proxydict) should work.

However, I am getting an error "__init__() got an unexpected keyword argument 'proxy'"

Any idea what I should do to resolve this please? Thank you.

Burdick answered 17/8, 2018 at 2:54 Comment(0)
I
33

You can set the proxy configuration inside the session.get call:

async with session.get(url, proxy=your_proxy_url) as response:
    return BeautifulSoup(await response.content, 'html.parser')

If your proxy requires authentication, you can set it in the url of your proxy like this:

proxy = 'http://your_user:your_password@your_proxy_url:your_proxy_port'
async with session.get(url, proxy=proxy) as response:
    return BeautifulSoup(await response.content, 'html.parser')

or:

proxy = 'http://your_proxy_url:your_proxy_port'
proxy_auth = aiohttp.BasicAuth('your_user', 'your_password')
async with session.get(url, proxy=proxy, proxy_auth=proxy_auth) as response:
    return BeautifulSoup(await response.content, 'html.parser')

For more details look at here

Iodate answered 5/11, 2018 at 19:32 Comment(1)
If you are looking to connect to a .onion site, you can find the answer hereJeffry
B
8

Silly me - after reading the documentation by @Milan Velebit I realised the variable should be trust_env=True instead of proxy or proxies. Proxies information should be from/set in the HTTP_PROXY / HTTPS_PROXY environment variables.

Burdick answered 18/8, 2018 at 2:39 Comment(0)
P
0

As per their documentation, there really is no proxy param, instead use proxies.

Pekan answered 17/8, 2018 at 7:59 Comment(1)
Unfortunately same issue: TypeError: __init__() got an unexpected keyword argument 'proxies'Burdick
O
0

To get this to work for me on Windows 10, behind a corporate proxy, and specifically in the Windows Subsystem for Linux (WSL) using Ubuntu I had to not only set the proxy parameter in the session.get call (as the accepted answer here describes), but I had to additionally set the SSL context. Otherwise, an incorrect CA bundle would be used resulting in an SSL Certificate Verify Failed error. For example:

import aiohttp
import ssl

url = 'https://example.com'
proxy_url = 'http://<user>:<pass>@<proxy>:<port>'
path_to_cafile = '/etc/ssl/certs/ca-certificates.crt'
ssl_ctx = ssl.create_default_context(cafile=path_to_cafile)

async with aiohttp.ClientSession() as session:
    async with session.get(url, proxy=proxy_url, ssl=ssl_ctx) 
Operator answered 9/7 at 20:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.