How to address urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url
Asked Answered
K

3

11

I am trying to scrape a few pages of a website with selenium and use the results but when I run the function twice

[WinError 10061] No connection could be made because the target machine actively refused it'

Error appears for the 2nd function call. Here's my approach :

import os
import re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup as soup

opts = webdriver.ChromeOptions()
opts.binary_location = os.environ.get('GOOGLE_CHROME_BIN', None)
opts.add_argument("--headless")
opts.add_argument("--disable-dev-shm-usage")
opts.add_argument("--no-sandbox")
browser = webdriver.Chrome(executable_path="CHROME_DRIVER PATH", options=opts)

lst =[]
def search(st):
    for i in range(1,3):
        url = "https://gogoanime.so/anime-list.html?page=" + str(i)
        browser.get(url)
        req = browser.page_source
        sou = soup(req, "html.parser")
        title = sou.find('ul', class_ = "listing")
        title = title.find_all("li")
        for j in range(len(title)):
            lst.append(title[j].getText().lower()[1:])
    browser.quit()
    print(len(lst))
    
search("a")
search("a")

OUTPUT

272
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url: /session/4b3cb270d1b5b867257dcb1cee49b368/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001D5B378FA60>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
Kirmess answered 9/11, 2020 at 4:3 Comment(0)
C
13

This error message...

raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url: /session/4b3cb270d1b5b867257dcb1cee49b368/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001D5B378FA60>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

...implies that the failed to establish a new connection raising MaxRetryError as no connection could be made.


A couple of things:

  • First and foremost as per the discussion max-retries-exceeded exceptions are confusing the traceback is somewhat misleading. Requests wraps the exception for the users convenience. The original exception is part of the message displayed.

  • Requests never retries (it sets the retries=0 for urllib3's HTTPConnectionPool), so the error would have been much more canonical without the MaxRetryError and HTTPConnectionPool keywords. So an ideal Traceback would have been:

      ConnectionError(<class 'socket.error'>: [Errno 1111] Connection refused)
    

Root Cause and Solution

Once you have initiated the webdriver and web client session, next within def search(st) you are invoking get() o access an url and in the subsequent lines you are also invoking browser.quit() which is used to call the /shutdown endpoint and subsequently the webdriver & the web-client instances are destroyed completely closing all the pages/tabs/windows. Hence no more connection exists.

You can find a couple of relevant detailed discussion in:

In such a situation in the next iteration (due to the for loop) when browser.get() is invoked there are no active connections. hence you see the error.

So a simple solution would be to remove the line browser.quit() and invoke browser.get(url) within the same browsing context.


Conclusion

Once you upgrade to Selenium 3.14.1 you will be able to set the timeout and see canonical Tracebacks and would be able to take required action.


References

You can find a relevant detailed discussion in:


tl; dr

A couple of relevent discussions:

Conchology answered 9/11, 2020 at 13:13 Comment(5)
This helped but the chromedriver process is impacting the memory. Shall I use os.system("taskkill /f /im chromedriver.exe /T")? I am on a windows machine.Kirmess
@SumitJaiswal In short, yes you need to do that but there are a couple of other factors to consider. Let me know if you are stuck.Conchology
Killing chromedriver whilst being in the for loop is a bad idea. So I made a kill() function for os.system("taskkill /f /im chromedriver.exe /T"). But invoking search() then kill() then search() again gives the same error. Is there a way to restart the chromedriver whenever I call search()Kirmess
I think I fixed both problems by bringing the browser = webdriver.Chrome(executable_path="CHROME_DRIVER PATH", options=opts) inside the function and using browser.quit() at the end of it. Thanks for the help.Kirmess
Amazing answer, I simply missed a "return" in my function to be honest. But great effort, well written. Loved reading it.Purse
M
1

Problem

The driver was asked to crawl the URL after being quit. Make sure that you're not quitting the driver before getting the content.

Solution

Regarding your code, when executing search("a") , the driver retrieves the url, returns the content and after that it closes.

when serach() runs another time, the driver no longer exists so it is not able to proceed with the URL.

You need to remove the browser.quit() from the function and add it at the end of the script.

lst =[]
def search(st):
    for i in range(1,3):
        url = "https://gogoanime.so/anime-list.html?page=" + str(i)
        browser.get(url)
        req = browser.page_source
        sou = soup(req, "html.parser")
        title = sou.find('ul', class_ = "listing")
        title = title.find_all("li")
        for j in range(len(title)):
            lst.append(title[j].getText().lower()[1:])
    print(len(lst))
    
search("a")
search("a")
browser.quit()
Multicolored answered 18/3, 2022 at 8:25 Comment(0)
E
0

I faced the same issue in Robot Framework.

MaxRetryError: HTTPConnectionPool(host='options=add_argument("--ignore-certificate-errors")', port=80): Max retries exceeded with url: /session (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001ABA3190F10>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')).

This issue got fixed once I updated all the libraries to their latest version in Pycharm and also I selected [email protected]

Eliaeliades answered 19/2, 2021 at 11:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.