Why do search engine crawlers not run javascript? [closed]
Asked Answered
V

3

17

I have been working with some advanced javascript applications using a lot of ajax requests to render my page. To make the applications crawlable (by google), I have to follow https://developers.google.com/webmasters/ajax-crawling/?hl=fr . This tells us to do something like: redesigning our links, creating html snapshots,... to make the site searchable.

I wonder why crawlers don't run javascript to get the rendered page and index on it. Is there a reason behind this? Or it's a missing feature of search engines that may come in the future?

Valli answered 10/10, 2013 at 5:5 Comment(9)
Google already run javascript.Mccrory
@LoïcFaure-Lacroix he is talking about web crawlers, i thought the same at firstLexicon
@JayHarris GoogleBot does that.Mccrory
@Loïc Faure-Lacroix: do you have a link to an official documentation about it? thanks.Valli
@LoïcFaure-Lacroix it runs some javascript not the whole scriptLexicon
@JayHarris it actually does. I have this site git.vosnax.ru which is completely in javascript. All pages gets indexed without much problem. Each page can be accessed with a url. For that reason, keeping static pages isn't a requirement. But if for some reason, internet is going to be slow and the googlebot index a page that hasn't finished loading. Then I would be quite understandable. You can search for the site in google and you might get some results with content that is indexed.Mccrory
This question appears to be off-topic because it is about SEOIdou
@John Conde: Actually, I'm asking about technical reason behind search engines with ajax sites. Not about SEO.Valli
Then it is off-topic for this website because it is not about code you've written.Idou
L
4

Even though GoogleBot actually does handle sites written in js. The big problem with ajax sites is that even if GoogleBot can execute js and handle ajax requests.

It's not exactly possible for the web crawler to know when the page finished loading. For that reason, a web crawler could load a page and index the page before it started doing ajax requests. Let say a script will get executed on page scroll. It's very likely that the google bot will not trigger every possible events.

The other problem is navigation

Since navigation can be done without page reloading, one url can map to multiple "view result". For that reason, google ask developpers to keep a copy of pages using static pages to support those pages that would be inaccessible otherwise. They are going to get indexed.

If your site can have each page accessible through a fully qualified url. Then you shouldn't have problem indexing your site.

That said, scripts are going to get run. But it's not certain that the crawler will index the page after it finished handling all scripts.

Here's a link:

GoogleBot smarter: It was written in 2010 and we can expect that the webcrawlers got much smarter since then.

Lackey answered 10/10, 2013 at 5:17 Comment(0)
S
1

Reading pure HTML is way faster than waiting/calling for javascript functions etc and then making notice, how the page is set up. I think that's the main reason.

Another might be that the whole crawling thing is automated - so, again, reading static page is a lot easier and makes a lot more sense. As with javascript the content of the page might change every second etc, making the crawler "confused"

Considered, that this has not yet been implemented in search engines, I think that it won't come in the near future.

Statistician answered 10/10, 2013 at 5:15 Comment(0)
B
0

It's harder to read pages with scripts for crawlers, because it is all about dynamically changing content. And crawlers cares not only about first site visit, they rechecking indexed pages every week-two in a fast mode, simply comparing in a way "find 10 differences" for content and link changes. Rechecking pages with scripts will be too painful and costly for crawlers in a world web.

Brindabrindell answered 10/10, 2013 at 5:22 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.