Does Google ignores whatever is after the hash fragment (#) while crawling our website?

Asked 14/5, 2011 at 11:11 Answered 15/5, 2011 at 23:7

Solved indexing web-crawler hyperlink sitemap

We are using the information that is after the hash fragment to display different pages using JavaScript, in order not to force the browser to load the whole page again. For example a direct link to the page could look like this (book_id/page_id):

www.example.com/book#1234/5678

Since we don't have direct links to each page but to the books, we are thinking to add these direct links into sitemap.xml.

My question is wether Google is considering that as a separate link or just ignoring everything that is after the hash fragment, during the normal crawling or if we're including it in the sitemap.xml?

Carpi answered 14/5, 2011 at 11:11 Comment(3)

Have a look at code.google.com/web/ajaxcrawling. You'll find an explanation on how to get Google to index you AJAX-pages. – Stretch 14/5, 2011 at 11:19

Another reference for Google AJAX crawling: AJAX crawling: Guide for webmasters and developers – Abdominal 14/5, 2011 at 11:21

@ax That link was very useful and I managed to make it work.. so if you want post it as an answer by including the first 2 steps and I'll accept it.. :) – Carpi 15/5, 2011 at 12:48

as noted by Lucero, the hash fragment part (#1234/5678) of "AJAX URLs" is not sent to the server as part of an HTTP request (by specification) - so the server would return the same result for all your different AJAX URLs.

luckily, there is a scheme that allows googlebot to crawl and index even AJAX pages:

Step-by-step guide

Indicate to the crawler that your site supports the AJAX crawling scheme
(by marking unique page states that you want googlebot to crawl with special hash fragments that begin with an exclamation mark, eg. #!1234/5678)

Set up your server to handle requests for URLs that contain "_escaped_fragment_"
(www.example.com/book?_escaped_fragment_=#1234/5678) and return a html snapshot of that page state

...

Abdominal answered 15/5, 2011 at 23:7 Comment(1)

Update: this scheme is officially deprecated as of October 2015. GoogleBot is now able to crawl AJAX URLs (having a #! fragment) by executing the JavaScript just like a regular browser would do. – Aneurysm 21/2, 2017 at 23:16

Technically, the # part is just for client-side anchors. It's not sent to the server and irrelevant for the URL as such, so my guess would be that Google sees this all as the same link.

The following information may be useful to you though: http://www.searchenginepeople.com/blog/how-to-track-clicks-on-anchors-in-google-analytics.html

Mohan answered 14/5, 2011 at 11:16 Comment(0)

Recommended topics

Hot tags