This is the first time I use selenium and headless browser as I want to crawl some web page using ajax tech.
The effect is great, but for some case it takes too much time to load the whole page(especially when some resource is unavailable),so I have to set a time out for the selenium.
First of all I tried set_page_load_timeout()
and set_script_timeout()
,but when I set these timeouts, I won't get any page source if the page doesn't load completely, as the codes below:
driver = webdriver.Chrome(chrome_options=options)
driver.set_page_load_timeout(5)
driver.set_script_timeout(5)
try:
driver.get(url)
except Exception:
driver.execute_script('window.stop()')
print driver.page_source.encode('utf-8') # raise TimeoutException this line.
so I try to using Implicitly Wait and Conditional Wait, like this:
driver = webdriver.Firefox(firefox_options=options, executable_path=path)
print("Firefox Headless Browser Invoked")
wait = WebDriverWait(driver, timeout=10)
driver.implicitly_wait(2)
start = time.time()
driver.get(url)
end = time.time()
print 'time used: %s s' % str(end - start)
try:
WebDriverWait(driver, 2, 0.5).until(expected.presence_of_element_located((By.TAG_NAME, 'body')))
print driver.find_element_by_tag_name('body').text
except Exception:
driver.execute_script('window.stop()')
This time I got the content that I want.However,it takes a very long time(40+ seconds),that means the timeout I set for 2 seconds doesn't work at all.
In my view, it seems like the driver.get()
call ends until the browser stop loading the page, only after that the codes below can work, and you can not kill the get()
call or you'll get nothing.
But this is very different from the selenium docs, I REALLY wonder where is the mistake.
environment: OSX 10.12, selenium 3.0.9 with FireFox & GoogleChrome Headless(both latest version.)
--- update ----
Thanks for help.I change the code as below, using WebDriverWait()
alone, but there still exist cases that the call last for a very long time, far more than the timeout that I set.
Wonder if I can stop the page load immediately as the time is out?
driver = webdriver.Firefox(firefox_options=options, executable_path=path)
print("Firefox Headless Browser Invoked")
start = time.time()
driver.get('url')
end = time.time()
print 'time used: %s s' % str(end - start)
try:
WebDriverWait(driver, 2, 0.5).until(expected.presence_of_element_located((By.TAG_NAME, 'body')))
print driver.find_element_by_tag_name('body').text
except Exception:
driver.execute_script('window.stop()')
driver.quit()
Here is a terminal output in test:
Firefox Headless Browser Invoked
time used: 44.6049938202 s
according to the code this means the driver.get()
call takes 44 seconds to finish call, which is unexpected,I wonder if I misunderstood the behavior of the headless browsers?
WebDriverWait()
to get the page source html(set timeout = 2s), and there still exist cases that used a long time to wait(about 20s+), so how can I stop the call when I don't want to wait for so long...?(the latest code is update in the description) – Maffick