Google not crawling links in AngularJS application

Asked 12/10, 2016 at 22:35 Answered 26/10, 2016 at 14:6

I have an AngularJS application that is injected into 3rd party sites. It injects dynamic content into a div on the 3rd party page. Google is successfully indexing this dynamic content but does not appear to be crawling links within the dynamic content. The links would look something like this in the dynamic content:

<a href="http://www.example.com/support?title=Example Title&titleId=12345">Link Here</a>

I'm using query parameters for the links rather than an actual url structure like:

http://www.example.com/support/title/Example Title/titleId/12345

I have to use the query parameters as I don't want the 3rd party site to have to change their web server configuration to redirect unfound URLs.

When the link is clicked I use the $locationService to update the url in the browser and then my angular application responds accordingly. Mainly it shows just the relevant content based on the query params, sets the page title and meta description.

Many of the articles I have read use the route provider in angularJS and templates but I'm not sure why this would make a difference to the crawler?

I have read that google should view urls with query parameters as separate pages so I don't believe that should be the issue: https://webmasters.googleblog.com/2008/09/dynamic-urls-vs-static-urls.html

The only things I have not tried are 1. providing a sitemap with the urls that have the query parameters and 2. adding static links from other pages to the dynamic links to help google discover those pages.

Any help, ideas or insights would be greatly appreciated.

Reproachful answered 12/10, 2016 at 22:35 Comment(6)

should be uri encoding spaces in your url – Humidifier 12/10, 2016 at 22:38

Yes, they get encoded, just didn't encode them in my example – Reproachful 12/10, 2016 at 22:41

how do you know they aren't crawling them and how long have these links been active? And do sites they are have reasonable traffic? – Humidifier 12/10, 2016 at 22:44

And if you use $locationService to switch are these url's able to be found in <a> tags? – Humidifier 12/10, 2016 at 22:46

I know the links aren't followed because I have made changes to content and I see that content get indexed but the links are not crawled, I can confirm by checking web server logs (there is an AJAX request when link is hit). One site has reasonable traffic 400-600k unique visitors a month. The site I directly control does not have good traffic but I've used fetch & render on google web master tools and asked it to crawl links. Yes the urls the $locationService sets are in the <a> tags. – Reproachful 12/10, 2016 at 23:43

Try to read this developers.google.com/webmasters/ajax-crawling/docs/learn-more – Cummine 25/10, 2016 at 19:51

This happens because google crawlers are not able to get the static html from your url since your pages are dynamically rendered with Javascript, you can achieve what you want using the following :

Since #! is deprecated, You can tell google that your pages are rendered with javascript by using the following tag in your header

<meta name="fragment" content="!">

On finding the above tag google bots will request your urls with the _escaped_fragment_ query parameter from your server like

http://www.example.com/?_escaped_fragment_=/support?title=Example Title&titleId=12345

Then you need to rebuild your original url from the _escaped_fragment_ on your server and it will look like this again

http://www.example.com/support?title=Example Title&titleId=12345

Then you will need to serve the static HTML to the crawler for that url. You can do that using a headless browser to access the url. Phantom.js is a good option to render your page using the javascript and then give the contents into a file to create a HTML snapshot of your page. You can save the snapshot as well on your server for further crawling, so when google bots visit can you can directly serve the snapshot instead of re-rendering the page again.

Tobolsk answered 26/10, 2016 at 14:6 Comment(0)

The web crawler might be running at a higher priority than the AngularJS interpretation of your dynamic links as the web crawler loads the page. Using ng-href makes the dynamic link interpretation happen at a higher priority. Hope it works!

Door answered 19/10, 2016 at 21:7 Comment(1)

I haven't heard of trying the ng-href, I will try this. Thank you for your response. – Reproachful 21/10, 2016 at 18:3

If you use urls with # Nothing after the hash in the url gets sent to your server. Since Javascript frameworks originally used the hash as a routing mechanism, that's a main reason why Google created this protocol.

Change your urls to #! instead of just using #.

angular.module('myApp').config([
'$locationProvider', function($locationProvider) { $locationProvider.hashPrefix('!'); } ]);

Stuyvesant answered 20/10, 2016 at 10:26 Comment(1)

I'm not using # in the URL, and I'm pretty sure #! has been deprecated. – Reproachful 21/10, 2016 at 18:3

This is how Google and bing handle the ajax calls.

The documentation is mentioned here.

The overview as mentioned in the docs is as follows

The crawler finds a pretty AJAX URL (that is, a URL containing a #! hash fragment). It then requests the content for this URL from your server in a slightly modified form. Your web server returns the content in the form of an HTML snapshot, which is then processed by the crawler. The search results will show the original URL.

Step by Step guide is shown in the docs.

Since the Angular JS is designed for the Client Side so you will need to configure your Web server to summon a headless html browser to access your web page and deliver a hashbang url which will be given to the special google URL.

If you use hashbang URL then you would need to instruct the angular application to use them instead of regular hash values

 App.config(['$routeProvider', '$locationProvider', function($routes, $location) {

    $location.hashPrefix('!');

    $routes.when('/home',{
      controller : 'IndexCtrl',
      templateUrl : './pages/index.html'
});

as mentioned in the code example here

However if you do not wish to use hashtag url but still inform the google of the html content but still want to inform the google then you can use this meta tag as this

<meta name="fragment" content="!" />

and then configure the angular to use the htmlURL's

 angular.module('HTML5ModeURLs', []).config(['$routeProvider', function($route) {
  $route.html5Mode(true);
}]);

and then whichever method to be installed via module

var App = angular.module('App', ['HashBangURLs']);
//or
var App = angular.module('App', ['HTML5ModeURLs']);

Now you will need a headless browser to access the url You can use phantom.js to download the contents of the page ,run the javascript and then give the contents into a temporary file.

Phantomrunner.js which takes any url as input,downloads and parses the html into DOM and then checks the data status.

Test each page by using the function defined here

SiteMap can also be made as well as shown in this example

The best feature is you can use search console of verify your site url using

Google search console

Full attribution goes to the website and the author mentioned in this site

UPDATE 1

Your crawler needs the pages as -

- com/
- com/category/
- com/category/page/

By default, however, Angular sets your pages up as such:

- com
- com/#/category
- com/#/page

Approach 1

Hash bang allows Angular to know which HTML elements to inject with JS which can be done as mentioned before but since it has been depericated hence the another solution would be the following

Configure the $locationProvider and set up the base for relative links

You can use the $locationProvider as mentioned in these docs and set the html5mode to true

$locationProvider.html5Mode(true);

This lets Angular change the routing and URLs of our pages without refreshing the page

Set the base and head of your document as <base href="/">

The $location service will automatically fallback to the hashbang method for browsers that do not support the HTML5 History API.

Full attribution goes to the page and the author

Also to mention there are also some other measures and tests that you can take care of as mentioned in this document

Fennelflower answered 21/10, 2016 at 15:13 Comment(3)

I'm pretty sure the #! url scheme is deprecated, it even says so on the official documentation link you referenced. – Reproachful 21/10, 2016 at 18:2

yea in your question you mentioned the googlebot tag so i thought this might be the optimum solution.It is also mentioned in the documentation that crawling through the googlebot has been disallowed.As long as you dont block the google bot,you can render your web pages like mordern browsers.Ive also added the search console where you can check whether your url can be fetched or not. – Fennelflower 21/10, 2016 at 18:18

i've updated my question as to why angular app dis not able to crawl google urls,so maybe you can provide me with some feedback – Fennelflower 23/10, 2016 at 9:35

Recommended topics

Hot tags