Selenium Firefox headless returns different results
Asked Answered
J

1

6

When i scrape page that contains products with usage of headless option i get different results.
For the same question one time i get results that are not sorted, and another time with proper sorted order.

Selenium firefox browser:

firefox_options = Options()
firefox_options.headless = True
browser = webdriver.Firefox(options=firefox_options, executable_path=firefox_driver)

According to this post:
"firefox does not send different headers when using the headless option".

How to use headless option to get constant results from scraping?

Update:

Its turns out that ads popup window was hiding price sort menu. With setting constant windows size as posted by DebanjanB, problem was solved.

Thanks for any suggestions

Jaconet answered 25/11, 2019 at 12:19 Comment(0)
K
6

Ideally, using and not using firefox_options.headless = True shouldn't have any major effect on the elements within the DOM Tree getting rendered but may have a significant difference as far as the Viewport is concerned.

As an example, when GeckoDriver/Firefox is initialized along with the --headless option the default Viewport is width = 1366px, height = 768px where as when GeckoDriver/Firefox is initialized without the --headless option the default Viewport is width = 1382px, height = 744px.

  • Example Code:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.FirefoxOptions()
    options.headless = True
    driver = webdriver.Firefox(options=options, executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
    driver.get("https://www.google.com/")
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, "q")))
    print ("Headless Firefox Initialized")
    size = driver.get_window_size()
    print("Window size: width = {}px, height = {}px".format(size["width"], size["height"]))
    driver.quit()
    driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
    driver.get("https://www.google.com/")
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, "q")))
    print ("Firefox Initialized")
    size = driver.get_window_size()
    print("Window size: width = {}px, height = {}px".format(size["width"], size["height"]))
    driver.quit()
    
  • Console Output:

    Headless Firefox Initialized
    Window size: width = 1366px, height = 768px
    Firefox Initialized
    Window size: width = 1382px, height = 744px
    

Conclusion

From the above observation it can be inferred that with --headless option GeckoDriver/Firefox opens the Browsing Context with reduced Viewport and hence the number of elements identified can be less.


Solution

While using GeckoDriver/Firefox to initiate a Browsing Context always open in maximized mode or configure through set_window_size() as follows:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.FirefoxOptions()
options.headless = True
#options.add_argument("start-maximized")
options.add_argument("window-size=1400,600")
driver = webdriver.Firefox(options=options, executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://www.google.com/")
driver.set_window_size(1920, 1080)

tl; dr

You find a couple of relevant discussion on window size in:

Kwan answered 25/11, 2019 at 13:44 Comment(3)
Hi, i don't asked about maximize or resizing window. The issue is that in headless mode, response context is different then without this mode.Jaconet
@ZarakiKenpachi That can happen due to difference in viewpoint which I have detailed out within the answer.Kwan
Thank You. Its turns out that generated popup ad window was hiding sorting menu. After setup bigger window size problem was solved.Jaconet

© 2022 - 2024 — McMap. All rights reserved.