Programmatic Python Browser with JavaScript
Asked Answered
R

8

14

I want to screen-scrape a web-site that uses JavaScript.

There is mechanize, the programmatic web browser for Python. However, it (understandably) doesn't interpret javascript. Is there any programmatic browser for Python which does? If not, is there any JavaScript implementation in Python that I could use to attempt to create one?

Replete answered 16/12, 2009 at 18:37 Comment(0)
S
11

You might be better off using a tool like Selenium to automate the scraping using a web browser, so the JS executes and the page renders just like it would for a real user.

Stadia answered 16/12, 2009 at 18:42 Comment(0)
B
8

The PyV8 package nicely wraps Google's V8 Javascript engine for Python. It's particularly nice because not only can you call from Python to Javascript code, but you can call back from Javascript to Python code. This makes it quite straightforward to implement the usual browser-supplied objects (that is, everything in the Javascript global namespace: "window", "document", and so on), which you'd need to do if you were going to make a Javascript-capable Python browser emulator thing, possibly by hooking this up with mechanize.

Bassoon answered 16/12, 2009 at 20:43 Comment(0)
E
5

My favorite is PyPhantomJS. It's written using Python and PyQt4. It's completely headless and you can control it completely from JavaScript.

However, if you are looking to actually see the page, you can use QWebView from PyQt4 as well.

Estevez answered 6/7, 2011 at 21:19 Comment(1)
Unfortunately the project maintainer isn't able to maintain the project anymore. But it will still be compatible with the 1.4.0 release. You can switch to PhantomJS without loss of funtionality (except for all the awesome and new features PyPhantomJS had in comparison, such as plugin support....). They're looking for someone else to take over maintenance (core devopment), so hopefully it won't die out. :)Estevez
T
4

There is also spynner " a stateful programmatic web browser module for Python with Javascript/AJAX support based on the QtWebkit framework" : http://code.google.com/p/spynner/

Thankyou answered 28/3, 2011 at 13:1 Comment(0)
P
2

You could also try defining Chickenfoot page triggers on the pages in question, executing whatever operations you want on the page and saving the results of the operation to a local file, and calling Firefox from the command line inside your program, followed by reading the file.

Pressure answered 16/12, 2009 at 18:45 Comment(0)
A
1

i recommend that you take a look at some of the options available to you at http://wiki.python.org/moin/WebBrowserProgramming - surprisingly this is coming up as a common question (i've found three on stackoverflow today, by searching for the words "python browser" on google). if you do the same you'll find the other answers i gave.

Alleviator answered 9/6, 2010 at 20:24 Comment(0)
Z
1

you may try zope browser

http://pypi.python.org/pypi?:action=display&name=zope.testbrowser

Zola answered 3/10, 2010 at 13:10 Comment(0)
M
0

Playwright or pyppeteer are both reasonably good, and use headless Chromium to render pages and interpret JavaScript.

I'd pick Playwright out of the two, simply because it's backed by a larger entity, and supports Chromium/Firefox/WebKit out of the box.

Munition answered 6/6, 2022 at 10:3 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.