Why do search engine crawlers not run javascript? [closed]

Asked 10/10, 2013 at 5:5 Answered 10/10, 2013 at 5:22

Solved javascript ajax search-engine google-crawlers

I have been working with some advanced javascript applications using a lot of ajax requests to render my page. To make the applications crawlable (by google), I have to follow https://developers.google.com/webmasters/ajax-crawling/?hl=fr . This tells us to do something like: redesigning our links, creating html snapshots,... to make the site searchable.

I wonder why crawlers don't run javascript to get the rendered page and index on it. Is there a reason behind this? Or it's a missing feature of search engines that may come in the future?

Valli answered 10/10, 2013 at 5:5 Comment(9)

Google already run javascript. – Mccrory 10/10, 2013 at 5:8

@LoïcFaure-Lacroix he is talking about web crawlers, i thought the same at first – Lexicon 10/10, 2013 at 5:8

@JayHarris GoogleBot does that. – Mccrory 10/10, 2013 at 5:9

@Loïc Faure-Lacroix: do you have a link to an official documentation about it? thanks. – Valli 10/10, 2013 at 5:9

@LoïcFaure-Lacroix it runs some javascript not the whole script – Lexicon 10/10, 2013 at 5:11

@JayHarris it actually does. I have this site git.vosnax.ru which is completely in javascript. All pages gets indexed without much problem. Each page can be accessed with a url. For that reason, keeping static pages isn't a requirement. But if for some reason, internet is going to be slow and the googlebot index a page that hasn't finished loading. Then I would be quite understandable. You can search for the site in google and you might get some results with content that is indexed. – Mccrory 10/10, 2013 at 5:20

This question appears to be off-topic because it is about SEO – Idou 10/10, 2013 at 11:52

@John Conde: Actually, I'm asking about technical reason behind search engines with ajax sites. Not about SEO. – Valli 10/10, 2013 at 12:3

Then it is off-topic for this website because it is not about code you've written. – Idou 10/10, 2013 at 13:0

Even though GoogleBot actually does handle sites written in js. The big problem with ajax sites is that even if GoogleBot can execute js and handle ajax requests.

It's not exactly possible for the web crawler to know when the page finished loading. For that reason, a web crawler could load a page and index the page before it started doing ajax requests. Let say a script will get executed on page scroll. It's very likely that the google bot will not trigger every possible events.

The other problem is navigation

Since navigation can be done without page reloading, one url can map to multiple "view result". For that reason, google ask developpers to keep a copy of pages using static pages to support those pages that would be inaccessible otherwise. They are going to get indexed.

If your site can have each page accessible through a fully qualified url. Then you shouldn't have problem indexing your site.

That said, scripts are going to get run. But it's not certain that the crawler will index the page after it finished handling all scripts.

Here's a link:

GoogleBot smarter: It was written in 2010 and we can expect that the webcrawlers got much smarter since then.

Lackey answered 10/10, 2013 at 5:17 Comment(0)

Reading pure HTML is way faster than waiting/calling for javascript functions etc and then making notice, how the page is set up. I think that's the main reason.

Another might be that the whole crawling thing is automated - so, again, reading static page is a lot easier and makes a lot more sense. As with javascript the content of the page might change every second etc, making the crawler "confused"

Considered, that this has not yet been implemented in search engines, I think that it won't come in the near future.

Statistician answered 10/10, 2013 at 5:15 Comment(0)

It's harder to read pages with scripts for crawlers, because it is all about dynamically changing content. And crawlers cares not only about first site visit, they rechecking indexed pages every week-two in a fast mode, simply comparing in a way "find 10 differences" for content and link changes. Rechecking pages with scripts will be too painful and costly for crawlers in a world web.

Brindabrindell answered 10/10, 2013 at 5:22 Comment(0)

Recommended topics

Hot tags