Python Selenium Chromedriver not working with --headless option
Asked Answered
V

6

8

I am running chromedriver to try and scrape some data off of a website. Everything works fine without the headless option. However, when I add the option the webdriver takes a very long time to load the url, and when I try to find an element (that is found when run without --headless), I receive an error.

Using print statements and getting the html after the url "loaded", I find that there is no html, it's empty (See in output below).

class Fidelity:
    def __init__(self):
        self.url = 'https://eresearch.fidelity.com/eresearch/gotoBL/fidelityTopOrders.jhtml'
        self.options = Options()
        self.options.add_argument("--headless")
        self.options.add_argument("--window-size=1500,1000")
        self.driver = webdriver.Chrome(executable_path='.\\dependencies\\chromedriver.exe', options = self.options)
        print("init")

    def initiate_browser(self):
        self.driver.get(self.url)
        time.sleep(5)
        script = self.driver.execute_script("return document.documentElement.outerHTML")
        print(script)
        print("got url")

    def find_orders(self):
        wait = WebDriverWait(self.driver, 15)
        data= wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]'))) #ERROR ON THIS LINE

This is the entire output:

init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
  File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 102, in <module>
    orders = scrape.find_tesla_orders()
  File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 75, in find_tesla_orders
    tesla = self.driver.find_element_by_xpath("//a[@href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']")
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[@href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']"}
  (Session info: headless chrome=74.0.3729.169)
  (Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Windows NT 10.0.17763 x86_64)

New error with updated code:

init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
  File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 104, in <module>
    orders = scrape.find_tesla_orders()
  File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 76, in find_tesla_orders
    tesla = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 

I have tried finding the answer to this through google but none of the suggestions work. Is anyone else having this issue with certain websites? Any help appreciated.

Update

This script still does not work unfortunately, the webdriver is not loading the page correctly for some reason while headless, even though everything works perfectly without running this using the headless option.

Vigilantism answered 4/6, 2019 at 1:2 Comment(2)
try firefox, i also get chrome- vebdriver starting from start bage, ignoring url and --headlessThrashing
Thank you so much. I don't know why I didn't think of trying a different browser. I've just always used chrome. Not sure why some websites don't work with chrome headless. Anyway, thanks.Vigilantism
V
22

For anyone in the future who is wondering the fix to this, some websites just don't load correctly with the headless option of chrome. I don't think there is a way to fix this. Just use a different browser (like firefox). Thanks to user8426627 for this.

Vigilantism answered 8/6, 2019 at 2:17 Comment(2)
Worked for me. As soon as I switched from the Chrome driver to the Firefox driver my script worked in headless mode. Thank you!Gaw
This worked for me!!! Try switching from Chrome to Firefox and using the geckodriver with the --headless flag, it should work. Not sure what difference is causing the issue.Kim
C
8

Have you tried using a User-Agent?

I was experiencing the same error. First what I did was to download the HTML source page for both headless and normal with:

html = driver.page_source
file = open("foo.html","w")
file.write(html)
file.close()

The HTML source code for the headless mode was a short file with this line nearly at the end: The page cannot be displayed. Please contact the administrator for additional information. But the normal mode was the expected HTML.

I solve the issue by adding an User-Agent:

from fake_useragent import UserAgent
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome(executable_path = f"your_path",chrome_options=chrome_options)
Compassion answered 4/4, 2021 at 22:1 Comment(2)
Same here David, the site I am working with ended up blocking the headless chrome user agent. I had to make some additions in order to get it working properly in my use case thought: https://mcmap.net/q/586910/-selenium-unable-to-locate-element-only-when-using-headless-chrome-pythonLatt
This solved it for me. Another nice debugging tip I found in https://mcmap.net/q/568283/-unable-to-locate-elements-on-webpage-with-headless-chrome is to use driver.get_screenshot_as_file("screenshot.png")Sport
F
2

Try setting the window size as well as being headless. Add this:

chromeOptions.add_argument("--window-size=1920,1080")

The default size of the headless browser is tiny. If the code works when headless is not enabled it might be because your object is outside the window.

Filar answered 3/9, 2022 at 9:32 Comment(1)
This one helped me together with the user-agent approach.Radish
H
0

Add explicit wait. You should also use another locator, the current one match 3 elements. The element has unique id attribute

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.common.by import By

wait = WebDriverWait(self.driver, timeout)
data = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))
Hyonhyoscine answered 4/6, 2019 at 4:45 Comment(4)
I will try the explicit wait. Does the explicit wait go after I try to load the url? For the ID part, the reason I stayed away from this is because it tends to change based on its position. The ‘t_trigger_TSLA’ part stays the same but the number differs based on its rank is what I’ve noticed (maybe I’m wrong).Vigilantism
@Vigilantism The explicit wait is for a specific locator, but you can use the wait object again for other elements, just change the locator. If the number in the id is dynamic you can use partial id, I edited my answer for that.Hyonhyoscine
I put in your code and still received an error (I updated post and put the error at the bottom). I replaced the timeout variable with 15. I still think the problem lies in the line where the driver actually gets the website because the html printed out is still empty. I think it has something to do with running it headless because it works fine if I just remove the headless option.Vigilantism
Worked for me too. Seems like the user agent is the issue with headless chrome by default.Tirpitz
M
0

some websites just don't load correctly with the headless option of chrome.

The previous statement is actually wrong. I just got into this problem where Chrome wasn't detecting the elements. When I saw the @LuckyZakary answer I was shocked because someone created a scrapping for the same website with nodeJs and didn't got this error.

@AtulGumar answer helped on Windows but on Ubuntu server it failed. So it wasn't enough. After reading this, all to the bottom, what @AtulGumar missed was to add the –disable-gpu flag.

So it work for me on Windows and Ubuntu server with no GUI with those options:

webOptions = webdriver.ChromeOptions()
webOptions.headless = True
webOptions.add_argument("--window-size=1920,1080")
webOptions.add_argument("–disable-gpu")
driver = webdriver.Chrome(options=webOptions)

I also installed xvfb and other packages as suggested here:

sudo apt-get -y install xorg xvfb gtk2-engines-pixbuf
sudo apt-get -y install dbus-x11 xfonts-base xfonts-100dpi xfonts-75dpi xfonts-cyrillic xfonts-scalable

and executed:

Xvfb -ac :99 -screen 0 1280x1024x16 &
export DISPLAY=:99
Mazurka answered 20/1, 2023 at 9:29 Comment(0)
S
0

strong texttry to add executable path into Service object

options =  Options()
options.add_argument('---incognito')
options.add_argument('---disable-extension')
options.add_argument("--no-sandbox")
options.add_argument('-–disable-gpu')
options.add_argument('--headless')
service = Service (executable_path=ChromeDriverManager().install() )
return webdriver.Chrome(service=service  , options=options)

its work for me :)

Sidonia answered 24/1, 2023 at 6:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.