Headless Chrome Driver not working for Selenium

Asked 5/1, 2021 at 19:57 Answered 21/5, 2023 at 22:4

python selenium web-scraping selenium-chromedriver cloudflare

I am current having an issue with my scraper when I set options.add_argument("--headless"). However, it works perfectly fine when it is removed. Could anyone advise how I can achieve the same results with headless mode?

Below is my python code:

from seleniumwire import webdriver as wireDriver
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.options import Options
    
chromedriverPath = '/Users/applepie/Desktop/chromedrivermac'

    def scraper(search):

    mit = "https://orbit-kb.mit.edu/hc/en-us/search?utf8=✓&query="  # Empty search on mit site
    mit += "+".join(search) + "&commit=Search"
    results = []

    options = Options()
    options.add_argument("--headless")
    options.add_argument("--window-size=1440, 900")
    driver = webdriver.Chrome(options=options, executable_path= chromedriverPath)

    driver.get(mit)
    # Wait 20 seconds for page to load
    timeout = 20
    try:
        WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.CLASS_NAME, "header")))
        search_results = driver.find_element_by_class_name("search-results")
        for result in search_results.find_elements_by_class_name("search-result"):
            resultObject = {
                "url": result.find_element_by_class_name('search-result-link').get_attribute("href")
            }
            results.append(resultObject)
        driver.quit()
    except TimeoutException:
        print("Timed out waiting for page to load")
        driver.quit()

    return results

Here is also a screenshot of when I print(driver.page_source) after get():

Nga answered 5/1, 2021 at 19:57 Comment(6)

can you explain what the issue actually is? – Jap 5/1, 2021 at 20:2

@PApostol, the scraper is not returning any results when I add options.add_argument("--headless"). However, it works fine when options.add_argument("--headless") is removed – Nga 5/1, 2021 at 20:3

maybe try with options.headless = True instead of options.add_argument("--headless") to see if there is a difference. Also consider including an example people could run to reproduce the issue. – Jap 5/1, 2021 at 20:7

@Jap options.headless = True does not work too. – Nga 5/1, 2021 at 20:10

@Nga Take a screenshot or print the driver.page_source after get() to confirm if get() is successful. – Slimsy 5/1, 2021 at 20:19

@DebanjanB I have added a screenshot above. – Nga 5/1, 2021 at 20:29

I know this is a super old question but recently (2023) they upgraded the headless chrome which now allows headless to work with extensions.

See this webpage for more details.

Just replace the headless option above with the below.

options.add_argument('--headless=new')

According to the website if you just use --headless, it still uses the old version and you have to explicitly point it to the new version to work.

Pittsburgh answered 21/5, 2023 at 22:4 Comment(1)

This cleared it up for me in '24 :) – Hysterotomy 6/8, 2024 at 6:59

This screenshot...

screenshot

...implies that the Cloudflare have detected your requests to the website as an automated bot and subsequently denying you the access to the application.

Solution

In these cases the a potential solution would be to use the undetected-chromedriver in headless mode to initialize the google-chrome-headless browsing context.

undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.

Code Block:

import undetected_chromedriver as uc
from selenium import webdriver

options = webdriver.ChromeOptions() 
options.headless = True
driver = uc.Chrome(options=options)
driver.get(url)

References

You can find a couple of relevant detailed discussions in:

Slimsy answered 5/1, 2021 at 20:48 Comment(2)

I'm getting the following error:

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1122)

– Nga 6/1, 2021 at 14:8

@Nga SSLCertVerificationError is a certificate issue which is a much granular issue compared to this generic question on accessing the application through headless mode. Can you raise a new question as per your new requirement? Stackoverflow contributors will be happy to help you out. – Slimsy 6/1, 2021 at 18:59

Solution

References

Recommended topics

Hot tags