Are AJAX sites crawlable by search engines?
Asked Answered
S

7

6

I had always assumed that AJAX-driven content was invisible to search engines.

(i.e. content inserted into the DOM via XMLHTTPRequest)

For example, in this site, the main content is loaded via AJAX request by the browser:

http://www.trustedsource.org/query/terra.cl

...if you view this page with Javascript disabled, the main content area is blank.

However, Google cache shows the full content after the AJAX load:

http://74.125.155.132/search?q=cache:JqcT6EVDHBoJ:www.trustedsource.org/query/terra.cl+http://www.trustedsource.org/query/terra.cl&cd=1&hl=en&ct=clnk&gl=us

So, apparently search engines do index content loaded by AJAX.

Questions:

  • Is this a new feature in search engines? Most postings on the web indicate that you have to publish duplicate static HTML content for search engines to find them.
  • Are there any tricks to get an AJAX-driven content to be crawled by search engines (besides creating duplicate static HTML content).
  • Will the AJAX-driven content be indexed if it is loaded from a separate subdomain? How about a separate domain?
Santosantonica answered 23/7, 2009 at 6:34 Comment(2)
"...if you view this page with Javascript disabled, the main content area is blank." No it isn't. It looks quite cluttered, actually.Ganny
What browser are you using? When I access the first link in Firefox with Javascript disabled, I see "Information for 'terra.cl'" and then a blank box. Viewing html source I see an empty DIV with ID=query-content, where the AJAX content would go.Santosantonica
S
3

Following this guide from Google, AJAX sites may be made crawlable:

http://code.google.com/intl/sv-SE/web/ajaxcrawling/docs/getting-started.html

Santosantonica answered 22/4, 2010 at 4:41 Comment(0)
M
1

AJAX-driven are not crawled by search engines (or at least, not by Google).

The reason you can see the page in the google cache is because in the cache, there is the full page, including .js file. So when you see the page, your browser use the google cached .js file.

I don't think there is any trick to make it crawled by search engine, except using a static .html.

Edit at April, 27th 2010 : Google published a way to make AJAX crawlable

Google webmaster toolkit might help.

Mittel answered 23/7, 2009 at 6:40 Comment(1)
I don't think this is true. If I view the google cache link with Javascript disabled, I still see the AJAX-driven content. If I view source, the content is right there in the html.Santosantonica
P
1

Search engines could run the JavaScript needed to index Ajax content, but it would be difficult and computationally expensive — I'm not aware of any that actually do.

A well written site will, if it uses Ajax, use it according to the principles of progressive enhancement. Any key functionality will still be available without needing to run the JavaScript.

On the other hand, sites which reinvent frames (and don't use progressive enhancement) using JavaScript will suffer from all the usual problems of frames, but trade orphan pages for search engine invisibility.

Preciado answered 23/7, 2009 at 6:41 Comment(1)
Right for the progressive enhancement.Mittel
B
1

I have NoScript installed and active. Both links show the same content (+/- the google header bar). Therefore, the Google cache shows only what is statically there.

Bronco answered 23/7, 2009 at 6:45 Comment(1)
I am getting different results than you. I installed noscript. The original page does not show the main content, the google-cached page shows it. If I view source I see different content inside the DIV with ID = query-content. (this is the div where AJAX content is injected) Can you try in IE?Santosantonica
R
1

If you're using something like jQuery tabs, even if you're linking to HTML files within the same directory, it degrades nicely back to normal without the javascript, and the tabs just become likes to the actual pages. It's ugly, but it works. You can also style these versions, too.

Rancell answered 23/7, 2009 at 6:48 Comment(1)
Well yeah, any AJAX content you load should have a nice elegant fall back so that search engines and people with old/javascript disables browsers have something to look at.Verdure
J
0

Content that gets loaded immediately (say with a secondary HTTP request as in your example after the initial pageload) is usually visible to the search engine crawler.

However, if you have content that beyond this gets loaded via ajax following a user action, e.g. clicking a tab or button and such, won't be seen or indexed. Those will only be seen or indexed if they have 'real' anchor links.

Jahdol answered 23/7, 2009 at 6:37 Comment(2)
Alex, can you provide evidence that Google will run AJAX requests on pages where the AJAX requests run when the page loads?Sadiesadira
@Josh, no, because it's not true. :)Marquetry
S
0

Google just made their crawlers run Javascript without any developer changes!

http://googlewebmastercentral.blogspot.com/2015/10/deprecating-our-ajax-crawling-scheme.html

They state:

Today, as long as you're not blocking Googlebot from crawling your JavaScript or CSS files, we are generally able to render and understand your web pages like modern browsers.

Stradivari answered 29/10, 2015 at 18:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.