Purely JavaScript Solution for Google Ajax Crawlable Spec
Asked Answered
A

4

6

I have a project which is heavily JavaScript based (e.g. node.js, backbone.js, etc.). I'm using hashbang urls like /#!/about and have read the google ajax crawlable spec. I've done a wee bit of headless UI testing with zombie and can easily conceive of how this could be done by setting a slight delay and returning static content back to the google bot. But I don't really want to implement this from scratch and was hoping there was a pre-existing library that fits in with my stack. Know of one?

EDIT: At time of writing I don't think this exists. However, rendering using backbone (or similar) on server and client is a plausible approach (even if not a direct answer). So I'm going to mark that as answer although there may be better solutions in the future.

Analiese answered 19/1, 2012 at 17:38 Comment(0)
B
2

There is one implementation using node.js and Backbone.js on the server and browser https://github.com/Morriz/backbone-everywhere

Bradlybradman answered 2/3, 2012 at 1:46 Comment(2)
Although I'm not sure I'll use this, it's the closest to a plausible answer given there probably isn't a headless solution in js at this time ;)Analiese
Please also consider this approach #9413828Bradlybradman
F
10

Just to chime in, I ran into this issue too (I have very ajax/js heavy site), and I found this which may be of interest:

crawlme

I have yet to try it but it sounds like it will make the whole process a piece of cake if it works as advertised! it's a piece of connect/express middleware that is simply inserted before any calls to pages, and apparently takes care of the rest.

Edit:

Having tried crawlme, I had some success, but the backend headless browser it uses (zombie.js) was failing with some of my javascript content, likely because it works by emulting the DOM and thus won't be perfect.

Sooo, instead I got hold of a full webkit based headless browser, phantomjs, and a set of node linkings for it, like this:

npm install phantomjs node-phantom

I then created my own script similar to crawlme, but using phantomjs instead of zombie.js. This approach seems to work perfectly, and will render every single one of my ajax based pages perfectly. the script I wrote to pull this off can be found here. to use it, simply:

var googlebot = require("./path-to-file");

and then before any other calls to your app (this is using express but should work with just connect too:

app.use(googlebot());

the source is realtively simple minus a couple of regexps, so have a gander :)

Result: AJAX heavy node.js/connect/express based website can be crawled by the googlebot.

Franfranc answered 7/3, 2013 at 14:31 Comment(1)
I had the same problem with crawlme & zombie. Thanks! +1Vagary
B
2

There is one implementation using node.js and Backbone.js on the server and browser https://github.com/Morriz/backbone-everywhere

Bradlybradman answered 2/3, 2012 at 1:46 Comment(2)
Although I'm not sure I'll use this, it's the closest to a plausible answer given there probably isn't a headless solution in js at this time ;)Analiese
Please also consider this approach #9413828Bradlybradman
C
1

crawleable nodejs module seems to fit this purpose: https://npmjs.org/package/crawlable and example of such SPA that can be rendered server-side in node https://github.com/trupin/crawlable-todos

Curiel answered 6/11, 2013 at 15:24 Comment(0)
M
0

Backbone looks interesting: http://documentcloud.github.com/backbone/

http://lostechies.com/derickbailey/2011/09/26/seo-and-accessibility-with-html5-pushstate-part-1-introducing-pushstate/

Mcalister answered 31/1, 2012 at 1:24 Comment(1)
I don't see how this relates to my specific question. What does this have to do with generating HTML snapshots for use with the Google ajax spec?Analiese

© 2022 - 2024 — McMap. All rights reserved.