Creating a headless Chrome instance in Python
Asked Answered
S

5

14

This question describes my conclusion after researching available options for creating a headless Chrome instance in Python and asks for confirmation or resources that describe a 'better way'.

From what I've seen it seems that the quickest way to get started with a headless instance of Chrome in a Python application is to use CEF (http://code.google.com/p/chromiumembedded/) with CEFPython (http://code.google.com/p/cefpython/). CEFPython seems premature though, so using it would likely mean further customization before I'm able to load a headless Chrome instance that loads web pages (and required files), resolves a completed DOM and then lets me run arbitrary JS against it from Python.

Have I missed any other projects that are more mature or would make this easier for me?

Sugarplum answered 19/3, 2012 at 19:9 Comment(6)
Why specifically do you need a headless Chrome instance?Ecclesiasticism
@Marcin, I'm developing on Windows 7 but will publish the application as a website on Ubuntu.Sugarplum
@Trindaz, CefPython has a real API now, there is still much work in the coming weeks, but some things already work like calling javascript from python: browser.GetMainFrame().ExecuteJavascript("alert('hello!')")Zoroaster
@CzarekTomczak thanks - I posed a CefPython specific followup question at magpcss.org/ceforum. Is there a google group devoted to this?Sugarplum
@Trindaz, I asked Marshall whether it would be possible to create a subforum there at mapgcss, if not I will think of hosting my own forum and will put some link at google-cefpython site.Zoroaster
@CzarekTomczak why not just start a google group? That's what all the other groups use, zombie, phantom, jsdom, etc. And can you just email me dave dot trindall at gmail dot com to continue this conversation? We have to be breaking SO rules by having this back and forth hereSugarplum
M
13

Any reason you haven't considered Selenium with the Chrome Driver?

http://code.google.com/p/selenium/wiki/ChromeDriver

http://code.google.com/p/selenium/wiki/PythonBindings

Mascon answered 19/3, 2012 at 19:17 Comment(2)
Combined with youtube.com/watch?v=DL7gyuqkzzU, this gives me exactly what I needSugarplum
To summarise the youtube, you need: "from pyvirtualdisplay import Display; display = Display(visible=0, size=(1024, 768)); display.start()"Blastula
S
10

This question is 5 years old now and at the time it was a big challenge to run a headless chrome using python, but the good news is:

Starting from version 59, released in June 2017, Chrome comes with a headless driver, meaning we can use it in a non-graphical server environment and run tests without having pages visually rendered etc which saves a lot of time and memory for testing or scraping. Setting Selenium for that is very easy:

(I assume that you have installed selenium and chrome driver):

from selenium import webdriver

#set a headless browser
options = webdriver.ChromeOptions()
options.add_argument('headless')
browser = webdriver.Chrome(chrome_options=options)

and now your chrome will run headlessly, if you take out options from the last line, it will show you the browser.

Sheepish answered 15/9, 2017 at 22:48 Comment(0)
A
2

While I'm the author of CasperJS, I invite you to check out Ghost.py, a webkit web client written in Python.

While it's heavily inspired by CasperJS, it's not based on PhantomJS — it still uses PyQt bindings and Webkit though.

Atrice answered 22/5, 2012 at 15:16 Comment(0)
D
1

I use this to get the driver:

def get_browser(storage_dir, headless=False):
    """
    Get the browser (a "driver").

    Parameters
    ----------
    storage_dir : str
    headless : bool

    Results
    -------
    browser : selenium webdriver object
    """
    # find the path with 'which chromedriver'
    path_to_chromedriver = '/usr/local/bin/chromedriver'

    from selenium.webdriver.chrome.options import Options
    chrome_options = Options()
    if headless:
        chrome_options.add_argument("--headless")
    chrome_options.add_experimental_option('prefs', {
        "plugins.plugins_list": [{"enabled": False,
                                  "name": "Chrome PDF Viewer"}],
        "download": {
            "prompt_for_download": False,
            "default_directory": storage_dir,
            "directory_upgrade": False,
            "open_pdf_in_system_reader": False
        }
    })

    browser = webdriver.Chrome(path_to_chromedriver,
                               chrome_options=chrome_options)
    return browser

By switching the headless parameter you can either watch it or not.

Downtoearth answered 18/8, 2017 at 12:9 Comment(0)
S
0

casperjs is a headless webkit, but it wouldn't give you python bindings that I know of; it seems command-line oriented, but that doesn't mean you couldn't run it from python in such a way that satisfies what you are after. When you run casperjs, you provide a path to the javascript you want to execute; so you would need to emit that from Python.

But all that aside, I bring up casperjs because it seems to satisfy the lightweight, headless requirement very nicely.

Single answered 19/3, 2012 at 19:27 Comment(1)
Casperjs is a testing framework for PhantomJS, which is a headless QtWebkit. It allows you to communicate via the REST API.Hoenir

© 2022 - 2024 — McMap. All rights reserved.