Is there a way to use PhantomJS in Python?
Asked Answered
I

8

217

I want to use PhantomJS in Python. I googled this problem but couldn't find proper solutions.

I find os.popen() may be a good choice. But I couldn't pass some arguments to it.

Using subprocess.Popen() may be a proper solution for now. I want to know whether there's a better solution or not.

Is there a way to use PhantomJS in Python?

Inflated answered 8/11, 2012 at 10:46 Comment(2)
My answer below tells you how to do it. Just looking at your question and actually thats exactly what Selenium does, a subprocess.popen but with some extended features to make the api seamless.Angelicaangelico
@flyer: You should probably consider changing the accepted answer, see below. Thank you.Becalmed
A
389

The easiest way to use PhantomJS in python is via Selenium. The simplest installation method is

  1. Install NodeJS
  2. Using Node's package manager install phantomjs: npm -g install phantomjs-prebuilt
  3. install selenium (in your virtualenv, if you are using that)

After installation, you may use phantom as simple as:

from selenium import webdriver

driver = webdriver.PhantomJS() # or add to your PATH
driver.set_window_size(1024, 768) # optional
driver.get('https://google.com/')
driver.save_screenshot('screen.png') # save a screenshot to disk
sbtn = driver.find_element_by_css_selector('button.gbqfba')
sbtn.click()

If your system path environment variable isn't set correctly, you'll need to specify the exact path as an argument to webdriver.PhantomJS(). Replace this:

driver = webdriver.PhantomJS() # or add to your PATH

... with the following:

driver = webdriver.PhantomJS(executable_path='/usr/local/lib/node_modules/phantomjs/lib/phantom/bin/phantomjs')

References:

Angelicaangelico answered 29/3, 2013 at 8:23 Comment(26)
This worked beautifully, and probably saved me days. Thank you. If one wants the whole rendered page back as source, it's driver.page_source.Nannienanning
Also using error_handler parameter when initializing PhantomJS WebDriver one can verify the status code is 200 or otherwiseAngelicaangelico
This does work beautifully, and I'm pleasantly surprised because phantomjs.org/faq.html says "not a Node.js module" --yet the npm wrapper at npmjs.org/package/phantomjs makes it behave for this purpose. In my case I wanted to do this: bodyStr= driver.find_element_by_tag_name("body").get_attribute("innerHTML") and ...it worked!Depurative
Just to offer my experience after following the advice of avoiding Ghost's dependency complexities. After fully going down the road of PhantomJS, I found myself forking and then recompiling Phantom because their interface to the underlying QT libraries expected too much configuration through command line arguments. So in avoiding complexity, I found myself writing C++ to modify an interface that is too simplified. Not to say that Phantomjs is bad advice, I would just advise to look into its limitations first as it's compilation time is about 20 minutes minimum if you need to modify its source.Konyn
I agree that ghost has crazy dependencies, and I actually failed to get it up and running even after installing millions of X11 related libraries. Ghost is a horror story.Angelicaangelico
@Konyn I am surprised you needed to recompile phantomjs ... I am curious about the usecase/brickwall you faced with vanilla phantomjsAngelicaangelico
Thank you for this answer. This will probably save me a lot of time :-)Crinoline
I get "WebDriverException :Unable to start phantomjs with ghostdriver" I couldn't find what might cause this error. Can any one help? I'm using python 2.7 with windows.Macaulay
@Macaulay You need to pass the path to phantomjs as the first argument to PhantomJS ... or fix your windows syspath to be able to see phantomjs.Angelicaangelico
I was using sub process, but this is truly better.Quandary
Before this would run successfully, I had to create and give permission to the log file at /var/log/phantomjs/ghostdriver.logPriam
After hitting my head against trying to copy paste the examples from phantomjs and casperjs I gave up and giving this a try. Not to say they are horrible, I used webdriver before, and the ability to switch browser (phantomjs doesn't work with some sites) is a huge win.Chickenlivered
I had some issues getting this to work I had both execvp(): Permission denied errors when running phantomjs from console, or Can not connect to GhostDriver errors. The solution was to run sudo phantomjs once and from then on it works fine. github.com/ariya/phantomjs/issues/11614Machzor
ABORT. EJECT. Avoid phantomjs for python. Waste of time. "Unable to start phantomjs with ghostdriver." Into eternity. Dev admitted to not updating something or other. Wish I knew this before spending hours trying to breathe life into phantomjs.Ducan
@Amalgovinus I did notice this occasionally, it is something that happens if you run out of memory IIRC. Do you know where the link is to that dev material you mention. I would like to read-up on it.Angelicaangelico
PhantomJS was installed in AppData. Is there a permanent fix besides specifying it as an argumentÉCyprio
@macdonjo yes, make it visible on your system $PATH variableAngelicaangelico
Thanks, I did that earlier as a guess, but still can't get the error to go away. So strange. I think I'll make an independent thread.Cyprio
Dumb question : why do I have to install node-js? is there no other way to get pahantomJs?Viceregal
@Elidosa not a dumb question, its a sys admin style question ... not wanting to install what you don't need. Phantomjs is written in nodejs, thats why you need it in this case. Some distros have packages for phantomjs that will install nodejs behind the scenes.Angelicaangelico
Selenium does not allow full control over PhantomJS such as handling callbacks.Apthorp
@Angelicaangelico PhantomJS is not written in node. It's written in C++ and is a headless webkit browser. The node package just installs the appropriate binary.Stodder
Under Windows, I did not have to install phantomJS via node and npm. Downloading the binary from phantomjs.org/download.html and putting the phantomjs.exe into a location in my PATH (e.g. c:\Windows\System32) or vice versa (putting it anywhere and adding the folder to PATH) was enough to make it work in Python.Lindi
Great answer. sudo apt-get install phantomjs worked for me in ubuntu tho. I had a failed install with npm previously.Corotto
To make node/npm working correct with Ubuntu add this package: apt-get install nodejs-legacy (github.com/Medium/…)Architectonics
I managed to install PhantomJS on Ubuntu 16 using this command: npm -g install phantomjsFrisket
B
85

PhantomJS recently dropped Python support altogether. However, PhantomJS now embeds Ghost Driver.

A new project has since stepped up to fill the void: ghost.py. You probably want to use that instead:

from ghost import Ghost
ghost = Ghost()

with ghost.start() as session:
    page, extra_resources = ghost.open("http://jeanphi.me")
    assert page.http_status==200 and 'jeanphix' in ghost.content
Bemire answered 8/11, 2012 at 10:49 Comment(11)
Even though support is dropped, I found that installing npm (node package manager) and using it to install the latest phantomjs (with webdriver support) and installing selenium in python ... way easier than trying to get PyQT or PySide to work properly. What's nice about phantom it is truly headless and requires no UI/X11 related libs to work.Angelicaangelico
I added an answer below explaining my preferred solution after trying to use ghost.py and hating my lifeAngelicaangelico
Pykler's "hating my life" isn't an understatement. If someone would change the "correct answer" for this question to Pykler's I would have saved a day's effort.Kaohsiung
@YPCrumble: unfortunately, only the OP can do that; change the accepted answer.Bemire
After trying a bunch of different approaches this morning, @Angelicaangelico solution ended up working the smoothest.Priam
Though I don't like its syntax compared to PhantomJS, getting Ghost.py working with PyQT was bearable. I just had to change the code a little as mentioned here-- #14575681 All in all, etting up Ghost.py was way easier than trying to get phantomjs working in selenium in python, which has proven impossible on a windows machine after hours of trying.Ducan
@Amalgovinus are you still using ghost.py or were you able to get phantomjs going after that github issue you mentioned?Angelicaangelico
@Angelicaangelico I gave up on ghost.py because it lacks cookies.. wound up using lorien's Grab library instead, even though it lacks js support. I did manage to get phantomjs dialing out after the fact (by tweaking constructor params) since I asked a question about it-- superuser.com/questions/674322/… -- but there were problems after that, so I stuck with Grab.Ducan
will ghost.py allow me to force the javascript on the page to load, so I can grab all the stuff (div, img, href, ) that the javascript loads on my page? i'm looking for a solution which I can then parse with Beautifulsoup (BS)Tulipwood
Ghost is headless? I don't see that anywhere in the project and given the pyside/qt dependency I'm doubtful.Kraus
@RyneEverett: Ghost is headless.Bemire
G
40

Now since the GhostDriver comes bundled with the PhantomJS, it has become even more convenient to use it through Selenium.

I tried the Node installation of PhantomJS, as suggested by Pykler, but in practice I found it to be slower than the standalone installation of PhantomJS. I guess standalone installation didn't provided these features earlier, but as of v1.9, it very much does so.

  1. Install PhantomJS (http://phantomjs.org/download.html) (If you are on Linux, following instructions will help https://mcmap.net/q/128315/-how-can-i-set-up-amp-run-phantomjs-on-ubuntu)
  2. Install Selenium using pip.

Now you can use like this

import selenium.webdriver
driver = selenium.webdriver.PhantomJS()
driver.get('http://google.com')
# do some processing

driver.quit()
Godderd answered 3/5, 2013 at 7:39 Comment(2)
special thanks for pointing to SO answer concerning PhantomJS installation on Ubuntu, it helped me.Inactive
a quick way to install Selenium I just learned is, on Windows, type: C:\Python34\Scripts\pip.exe install Selenium.Fishtail
A
8

Here's how I test javascript using PhantomJS and Django:

mobile/test_no_js_errors.js:

var page = require('webpage').create(),
    system = require('system'),
    url = system.args[1],
    status_code;

page.onError = function (msg, trace) {
    console.log(msg);
    trace.forEach(function(item) {
        console.log('  ', item.file, ':', item.line);
    });
};

page.onResourceReceived = function(resource) {
    if (resource.url == url) {
        status_code = resource.status;
    }
};

page.open(url, function (status) {
    if (status == "fail" || status_code != 200) {
        console.log("Error: " + status_code + " for url: " + url);
        phantom.exit(1);
    }
    phantom.exit(0);
});

mobile/tests.py:

import subprocess
from django.test import LiveServerTestCase

class MobileTest(LiveServerTestCase):
    def test_mobile_js(self):
        args = ["phantomjs", "mobile/test_no_js_errors.js", self.live_server_url]
        result = subprocess.check_output(args)
        self.assertEqual(result, "")  # No result means no error

Run tests:

manage.py test mobile

Acid answered 18/12, 2012 at 13:17 Comment(3)
Thanks. I used subprocess.Popen to call the phantomjs script and it worked :)Inflated
You do see how this is limited right? All you are doing is making a shell call to execute phantomjs - you are not actually using a "proper" interface through which you may properly handle exceptions, blocking, etc.Hellion
@kamelkev: I see how this is limited. The upside is that this method allows me to use Django's bootstraping features to set up a test database with the correct content for each test. And yes, it could be combined with the other answers to get the best of both worlds.Floweret
S
6

The answer by @Pykler is great but the Node requirement is outdated. The comments in that answer suggest the simpler answer, which I've put here to save others time:

  1. Install PhantomJS

    As @Vivin-Paliath points out, it's a standalone project, not part of Node.

    Mac:

    brew install phantomjs
    

    Ubuntu:

    sudo apt-get install phantomjs
    

    etc

  2. Set up a virtualenv (if you haven't already):

    virtualenv mypy  # doesn't have to be "mypy". Can be anything.
    . mypy/bin/activate
    

    If your machine has both Python 2 and 3 you may need run virtualenv-3.6 mypy or similar.

  3. Install selenium:

    pip install selenium
    
  4. Try a simple test, like this borrowed from the docs:

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    
    driver = webdriver.PhantomJS()
    driver.get("http://www.python.org")
    assert "Python" in driver.title
    elem = driver.find_element_by_name("q")
    elem.clear()
    elem.send_keys("pycon")
    elem.send_keys(Keys.RETURN)
    assert "No results found." not in driver.page_source
    driver.close()
    
Simplehearted answered 6/3, 2017 at 8:46 Comment(2)
How to install PhantomJS on windows ? It doesn't seem to work using pip command.Somber
Pip is a python package installer, so it works with selenium, which is available as a python package. PhantomJS is not a python package so won't work with pip. I did a quick google for "PhantomJS install windows" and there are good hits.Simplehearted
S
5

this is what I do, python3.3. I was processing huge lists of sites, so failing on the timeout was vital for the job to run through the entire list.

command = "phantomjs --ignore-ssl-errors=true "+<your js file for phantom>
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)

# make sure phantomjs has time to download/process the page
# but if we get nothing after 30 sec, just move on
try:
    output, errors = process.communicate(timeout=30)
except Exception as e:
    print("\t\tException: %s" % e)
    process.kill()

# output will be weird, decode to utf-8 to save heartache
phantom_output = ''
for out_line in output.splitlines():
    phantom_output += out_line.decode('utf-8')
Sycamore answered 20/5, 2013 at 22:40 Comment(1)
Thanks, I was able to alter it to taste for my purpose.Bes
P
5

If using Anaconda, install with:

conda install PhantomJS

in your script:

from selenium import webdriver
driver=webdriver.PhantomJS()

works perfectly.

Pahari answered 14/9, 2016 at 17:51 Comment(2)
As of now, default channels don't contain PhantomJS for linux64Gauhati
damn, i love conda <3 that was so easy. i'm on osx.Kingship
G
2

In case you are using Buildout, you can easily automate the installation processes that Pykler describes using the gp.recipe.node recipe.

[nodejs]
recipe = gp.recipe.node
version = 0.10.32
npms = phantomjs
scripts = phantomjs

That part installs node.js as binary (at least on my system) and then uses npm to install PhantomJS. Finally it creates an entry point bin/phantomjs, which you can call the PhantomJS webdriver with. (To install Selenium, you need to specify it in your egg requirements or in the Buildout configuration.)

driver = webdriver.PhantomJS('bin/phantomjs')
Grant answered 23/9, 2014 at 14:4 Comment(1)
another way to automate installation process with buildout it's just use gp.recipe.phantomjs, that configures phantomjs and casperjsCorettacorette

© 2022 - 2024 — McMap. All rights reserved.