Interpreting JavaScript in PHP
Asked Answered
S

3

15

I'd like to be able to run JavaScript and get the results with PHP and is wondering if there is a library for PHP that allows me to parse it out. My first thought was to use node.js, but since node.js has access to sockets, files and things I think I'd prefer to avoid that.

Rationale: I'm doing screen scraping in PHP and have encountered many scenarios where the data is being produced by JavaScript on the frontend, and I would like to avoid writing specialized filtering functions to act on the JavaScript on a per-case basis since that takes a lot of time. The more general case would be to parse the JavaScript directly.

Downvoting: I don't really see what's so controversial about this question, modern web crawlers are known to do it, the only difference is that they tend to not be written in PHP. [1]

[1] http://blogs.forbes.com/velocity/2010/06/25/google-isnt-just-reading-your-links-its-now-running-your-code/

Strephon answered 2/12, 2010 at 5:19 Comment(2)
Why in the world you want to do that?!?!? If you must do it, you can compile and run a CLI JavaScript interpreter: code.google.com/p/v8.Mcnulty
for what purpose? PHP already has a multitude of date functions.Petersham
S
6

It's an interesting question and the down-voters are being unimaginative about potential use-cases. Page archiving tools, printing scripts, preview images - all valid reasons to want to manipulate a document with the JavaScript included within the page.

I'm not aware of any existing PHP implementations, but you could probably adapt Mozilla's SpiderMonkey as a PHP module, or as a standalone tool to manipulate a DOMDocument and return the result.

I haven't had experience with server-side JavaScript, but some issues that I believe might need to be dealt with:

  • Host objects like document and window are not part of the ECMAScript specification (these are objects provided by the implementing browser) so you need to make sure that the library provides equivalent host objects.
  • You might have security issues around executing client side scripts within a server side environment. This is a lot like allowing the user to submit a PHP script to be evaluation, so you need to make sure the security sandbox is tight.

Another (perhaps) safer and easier to implement option might be to use a modified FireFox or WebKit instance that runs as a browser, loading up the target pages and returning the modified source to your application.

Saint answered 4/12, 2010 at 23:53 Comment(3)
I'm glad you pointed out the issue with document and window, that particular issue hadn't clicked. I think that if I ended up wanting to solve this issue, I'd go down the route of writing a PHP module as you suggested.Strephon
@KitSunde Take a look at selenium. It allows you to control browsers from any language. For node.js what node developers tend to do is give up node.js and use phantom.js instead so even node developers face this issue. Phantom.js is not node.js nor is it a library of node.js (though there are libraries that allow node to control phantom). Phantomjs is a browser instead of an interpreter (it's a fork of Google Chrome) with windows, tabs etc but the windows are never drawn on screen. It's a headless browser.Mammiemammiferous
@Mammiemammiferous Thanks, I have become aware of selenium in the 7 years since I asked this question. :pStrephon
E
4

From PHP 5.3 you can use V8JS extention from PHP. It's a native library that uses the new Google V8 Javascript engine to execute JS and return the result.

It's good because you can pass vars in PHP arrays and are interpreted very well

Eleemosynary answered 21/10, 2013 at 10:45 Comment(0)
Y
1

NodeJS (or some other derivative of google's v8) might actually be the best way to go here. If you're concerned about the various things nodejs can do (eg. sockets, etc), you can probably "strip it down" by removing modules and/or addons -- I think even the built in stuff is ultimately implemented in such a way that it could be stripped out fairly easily.

An alternate approach might be to simply replace, override, or remove the require function from node.js.

There's also envjs which should make it easier to run js that was designed to run the browser.

Yaakov answered 4/12, 2010 at 22:13 Comment(2)
I've never seen envjs before, that's really interesting. I'm going to look into it further, thank you. :)Strephon
You're welcome. Good luck. (and don't be discouraged by the downvoters).Yaakov

© 2022 - 2024 — McMap. All rights reserved.