How to run PhantomJS as a server and call it remotely?
Asked Answered
F

2

7

This is probably a very basic question. I would like to run a headless browser PhantomJS as a server but not as a command line tool.

Once it is running I would like to call it remotely over HTTP. The only thing I need is to send a URL and get back the HTML output. I need it to generate HTML for an AJAX application to make it searchable.

Is it possible ?

Faculty answered 8/6, 2015 at 15:46 Comment(0)
R
22

You can run PhantomJS perfectly fine as a webserver, because it has the Web Server Module. The examples folder contains for example a server.js example. This runs standalone without any dependencies (without node).

var page = require('webpage').create(),
    server = require('webserver').create();

var service = server.listen(port, function (request, response) {
    console.log('Request received at ' + new Date());
    // TODO: parse `request` and determine where to go
    page.open(someUrl, function (status) {
        if (status !== 'success') {
            console.log('Unable to post!');
        } else {
            response.statusCode = 200;
            response.headers = {
                'Cache': 'no-cache',
                'Content-Type': 'text/plain;charset=utf-8'
            };
            // TODO: do something on the page and generate `result`
            response.write(result);
            response.close();
        }
    });
});

If you want to run PhantomJS through node.js then this is also easily doable using the phantomjs-node which is a PhantomJS bridge for node.

var http = require('http');
var phantom = require('phantom');

phantom.create(function (ph) {
  ph.createPage(function (page) {
    http.createServer(function (req, res) {
      // TODO: parse `request` and determine where to go
      page.open(someURL, function (status) {
        res.writeHead(200, {'Content-Type': 'text/plain'});
        // TODO: do something on the page and generate `result`
        res.end(result);
      });
    }).listen(8080);
  });
});

Notes

You can freely use this as is as long you don't have multiple requests at the same time. If you do, then you either need to synchronize the requests (because there is only one page object) or you need to create a new page object on every request and close() it again when you're done.

Rabbit answered 8/6, 2015 at 17:57 Comment(2)
Thank you for so detailed and deep response. Great explanationFaculty
Excellent answer, +1 for a comprehensive code example which shows exactly how it looks like.Wiredraw
K
1

The easiest way is to make a python script or something simple to start the server and use python websockets to communicate with it, using a web form of sorts to query for a website and get the page source. Any automation can be done via cron jobs, or if you are on Windows, you may use the Tasks feature to autostart the python script.

Kid answered 8/6, 2015 at 15:57 Comment(3)
Thank you. It is important for me to not execute the phantom for each request. I need the phantom to run as a sever. Is it possible ?Faculty
Yes. Maybe you haven't worked with PhantomJS before? PhantomJS is not a HTML parser like HTMLUnit, but rather a headless browser based using Gecko (the backend that Firefox uses). The browser runs as a separate process, which is then controlled by an API written in Python, Java, C++, etc which can then be easily turned into a server-side browser page processor unit. This can be controlled remotely if you use WebSocket technology or generate a webpage that can submit requests which the python code can then execute on the separate PhantomJS process.Kid
You are right I haven''t worked with phantom yet. Thank you again for the explanation.Faculty

© 2022 - 2024 — McMap. All rights reserved.