How to make Selenium not wait till full page load, which has a slow script?
Asked Answered
E

3

27

Selenium driver.get (url) wait till full page load. But a scraping page try to load some dead JS script. So my Python script wait for it and doesn't works few minutes. This problem can be on every pages of a site.

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.cortinadecor.com/productos/17/estores-enrollables-screen/estores-screen-corti-3000')
# It try load: https://www.cetelem.es/eCommerceCalculadora/resources/js/eCalculadoraCetelemCombo.js 
driver.find_element_by_name('ANCHO').send_keys("100")

How to limit the time wait, block AJAX load of a file, or is other way?

Also I test my script in webdriver.Chrome(), but will use PhantomJS(), or probably Firefox(). So, if some method uses a change in browser settings, then it must be universal.

Energumen answered 27/6, 2017 at 0:55 Comment(0)
H
58

When Selenium loads a page/url by default it follows a default configuration with pageLoadStrategy set to normal. To make Selenium not to wait for full page load we can configure the pageLoadStrategy. pageLoadStrategy supports 3 different values as follows:

  1. normal (full page load)
  2. eager (interactive)
  3. none

Here is the code block to configure the pageLoadStrategy :

  • Firefox :

    from selenium import webdriver
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    
    caps = DesiredCapabilities().FIREFOX
    caps["pageLoadStrategy"] = "normal"  #  complete
    #caps["pageLoadStrategy"] = "eager"  #  interactive
    #caps["pageLoadStrategy"] = "none"
    driver = webdriver.Firefox(desired_capabilities=caps, executable_path=r'C:\path\to\geckodriver.exe')
    driver.get("http://google.com")
    
  • Chrome :

    from selenium import webdriver
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    
    caps = DesiredCapabilities().CHROME
    caps["pageLoadStrategy"] = "normal"  #  complete
    #caps["pageLoadStrategy"] = "eager"  #  interactive
    #caps["pageLoadStrategy"] = "none"
    driver = webdriver.Chrome(desired_capabilities=caps, executable_path=r'C:\path\to\chromedriver.exe')
    driver.get("http://google.com")
    

Note : pageLoadStrategy values normal, eager and none is a requirement as per WebDriver W3C Editor's Draft but pageLoadStrategy value as eager is still a WIP (Work In Progress) within ChromeDriver implementation. You can find a detailed discussion in “Eager” Page Load Strategy workaround for Chromedriver Selenium in Python

Hammons answered 27/6, 2017 at 2:56 Comment(8)
It works in Firefox(). In Chrome() the "eager" option throws an error "unsupported". Run follows: caps = DesiredCapabilities().CHROME caps["pageLoadStrategy"] = "none" driver = webdriver.Chrome(desired_capabilities=caps) driver.get('href...') time.sleep(5) driver.find_element_by_name('ANCHO').send_keys("100")Energumen
@Energumen Yes :) I know. What I suggested is from WebDriver's W3C recommendation. ChromeDriver will follow the suit soon. ThanksHammons
Instead of time.sleep, better to use driver.implicitly_waitDaydream
Chrome still hasn't followed suit and it doesn't seem like they willFinitude
@TimWachter Checkout my answer update and let me know your thoughts.Hammons
@DebanjanB though caps["pageLoadStrategy"] = "none" gives back the control over driver in chrome, its quite useless if the page is still loading. You cant call driver.execute_script("window.stop();") it wont work in chrome, but ff works perfectly fine.Demonolatry
What's the difference between using eager vs getting driver.page_source after the timeout exception ?Orography
This answer need to be updated to selenium 4.Passbook
A
7

Based on the selenium docs V4.0 it now seems to be like this:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.page_load_strategy = 'none'
driver = webdriver.Chrome(options=options)
driver.get("http://www.google.com")
driver.quit()
Allottee answered 10/9, 2023 at 10:12 Comment(0)
S
0

@undetected Selenium answer works well but for the chrome, part its not working use the below answer for chrome

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capa = DesiredCapabilities.CHROME
capa["pageLoadStrategy"] = "none"
browser= webdriver.Chrome(desired_capabilities=capa,executable_path='PATH',options=options)

Sidsida answered 13/1, 2022 at 12:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.