Indexing angularjs app - Googlebot-simulation vs site:domain
Asked Answered
E

2

7

I have recently created a webpage using Angularjs and I'm currently trying to get it indexed by Google using pushstate.

I've done quite abit of research and found out that I can use Googlebot-simulater in Google Webmaster tools to simulate a Google-visit on my site, and see how the bots see my webpage vs what the users see.

Here the result looks good, Google sees exactly the same stuff as my users, and all the pages/subpages get the status of either partially or fully.

Another way I've been informed about, this morning, to see what Google sees on my website is by googling site:domainname. Here comes a list of all the pages/subpages Google has cached and by clicking on the different links, you get a view where the respective page is showed.

Here is were I get a little concerned that I missed something, because regardless of the partially/fully status my pages get from the Goolgebot-simulation, when I look at my pages (using the second method), the pages are all blank.

It is my first time indexing webpages, and I have tried for days, but without any luck. Is there somebody that can tell what I'm doind wrong/missin, or at least point in the right direction? Or should I just be a little more patient?

Extravasation answered 18/9, 2015 at 11:31 Comment(2)
Ive added a bounty i am in the exact same situation.Largely
Hey I have been facing the same issue, so have been trying to put together a tool to help solve the issue - github.com/jjbskir/angular-prerender - it prerenders your site to make it more SEO friendly and does not require a back end service.Ajar
G
1

The answer from Johannes Ferner is not correct. It used to be, and Google has been really slow to update their docs, but they have officially announced that they do handle AJAX pages without the need for HTML snapshots as long as you are using pushstates. Bing has followed suite and also handle pushstates.

As an example of this, search for site:yoamoseguros.com and check the cached results there. The page is built in Angular and is loaded completely using AJAX and pushstates, and it shows and indexes fine.

And ignore the broken pages, I did a failed deploy a few days ago with a completely broken redirect that messed everything up, and Google had time to index the broken pages before I had a chance to remove them. The one time they indexed my stuff fast... :/


So if you are using pushstates (html5mode on) and your pages are not shown properly by Google then there is something else going on. Check your robots.txt, are you blocking Google from reading static content like js-files or images? It needs access to files like that to be able to index the page properly.

Also, make sure your fragments are just "#" and not "#!". In the latter case Google will assume that you want to use html snapshots and will try to find them, and might fail. So if you want to use the simpler pushstate version, make sure you are not using "#!".


As a final note, Facebook does NOT support pushstates. So Facebook will still need snapshots (or just hard-coded og-tags in index.html depending on what kind of content you have).

Grenville answered 25/9, 2015 at 12:5 Comment(0)
D
1

Some information:

pushstate has nothing to do with indexing your page; It's just there to manipulate the browser history, simulating URI-changes while actually the SPA (Single Page Application aka. your Angular App) is routing internally without reloading the page. (Angular also calls it the HTML5Mode)

Javascript and the Googlebot:

While the google-bot (or any other search-crawler) is basically just a headless browser without the functionality of running your pages JavaScript code, it will not see the dynamically generated content.

In order to index SPA's and deliver the dynamic content to crawlers you need to supply and serve a static HTML Snapshot of the dynamically generated page to the bots.

If you would not use a static HTML snapshot the only thing the google bot sees is something like

<html>
<body>

<div ui-view="mainContent"></div>

</body>

more information: https://support.google.com/webmasters/answer/174992?hl=en

In order to generate these snapshots you could use one of the several SPA indexing services such as https://prerender.io/ (free if you host it yourself; or paid hosted version). What they do is starting up a browser with JS-Support (phantomjs) which is then opening all the supplied URL's (via a static list, crawling, sitemap.xml ...) running all the JavaScript Code on the Page, waiting for the page to finish (either by a timeout or a certain event which you have to fire) and then saving the static version (=snapshot) of the page.

To make sure that the bots find those snapshots you have to handle and supply the escaped_fragment , which would be a # by default or you overwrite it with <meta name="fragment" content="!"> if # is already used within your application.

If you use the middlewares provided by prerender.io (i have nothing to do with project, just using it) a lot of the difficult stuff is already handled.

more information here: https://developers.google.com/webmasters/ajax-crawling/docs/specification?hl=en

Dryer answered 22/9, 2015 at 12:34 Comment(3)
Hey Johannes can you please have a look at my followup question i use prerender: #32697049Largely
pushstate has something to do with indexing the page. as opposed to hash state, pushstate needs real urls so that the page can be bookmarked and reloaded. the idea is to re-create the same screens as a hard-load, but without reloading. if you don't have actual urls behind the SPA addres path, you should use fragments instead. if both SPA and "REST" match, google will have no trouble crawling your site, and your users will get fast page updates.Pneumatology
This is no longer correct. While Google has been slow to update their documentation they will properly index sites using pushstate without the need for html snapshots.Grenville
G
1

The answer from Johannes Ferner is not correct. It used to be, and Google has been really slow to update their docs, but they have officially announced that they do handle AJAX pages without the need for HTML snapshots as long as you are using pushstates. Bing has followed suite and also handle pushstates.

As an example of this, search for site:yoamoseguros.com and check the cached results there. The page is built in Angular and is loaded completely using AJAX and pushstates, and it shows and indexes fine.

And ignore the broken pages, I did a failed deploy a few days ago with a completely broken redirect that messed everything up, and Google had time to index the broken pages before I had a chance to remove them. The one time they indexed my stuff fast... :/


So if you are using pushstates (html5mode on) and your pages are not shown properly by Google then there is something else going on. Check your robots.txt, are you blocking Google from reading static content like js-files or images? It needs access to files like that to be able to index the page properly.

Also, make sure your fragments are just "#" and not "#!". In the latter case Google will assume that you want to use html snapshots and will try to find them, and might fail. So if you want to use the simpler pushstate version, make sure you are not using "#!".


As a final note, Facebook does NOT support pushstates. So Facebook will still need snapshots (or just hard-coded og-tags in index.html depending on what kind of content you have).

Grenville answered 25/9, 2015 at 12:5 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.