AngularJS SEO for static webpages (S3 CDN)
Asked Answered
R

5

27

I've been looking into ways to improve SEO for angularJS apps that are hosted on a CDN like Amazon S3 (i.e. simple storage with no backend). Most of the solutions out there, PhantomJS, prerender.io, seo.js etc., rely on a backend to recognise the ?_escaped_fragment_ url that the crawler generates and then fetch the relevant page from elsewhere. Even grunt-html-snapshot ultimately needs you to do this, even though you generate the snapshot pages ahead of time.

This solution is basically relying on using cloudflare as a reverse proxy, which seems a bit of a waste given that most of the security apparatus etc. that their service provides is totally redundant for a static site. Setting up a reverse proxy myself as suggested here also seems problematic given that it would require either i) routing all AngularJS apps I need static html for through one proxy server which would potentially hamper performance or ii) setting up a separate proxy server for each app, at which point I may as well set up a backend, which isn't affordable at the scale I am working.

Is there anyway of doing this, or are statically hosted AngularJS apps with great SEO basically impossible until google updates their crawlers?


Reposted on webmasters following John Conde's comments.

Reathareave answered 13/4, 2014 at 13:30 Comment(11)
I wish I could +2 for this question. You shared a few interesting links that I hadn't heard of. Baking AngularJS to static HTML is the reverse of that I'm doing. My server runs PHP to generate Mustache templates from the same JSON that AngularJS. When a URL is loaded the static content is there, but AngularJS erases it and goes dynamic from that point on. Gives me my SEO without any URL fragment issues, but it's a lot of extra work.Leishmaniasis
This question appears to be off-topic because it is about SEOAntacid
@Mathew Foscarini. Thanks, yeah thats actually a really interesting idea - I guess you have the benefit there of supporting non-JS enabled browsers as well.Reathareave
@John Conde Would you mind explaining where this should go then? It is tagged with the SO SEO tag, and as far as I can see there is nowhere else on SE for SEO questions.Reathareave
Pro Webmasters would be a suitable place for itAntacid
Done, close this one if you wish - cf. webmasters.stackexchange.com/questions/60601/…Reathareave
@advert2013 the site is browsable with JS disabled, but it's in a static state. Not really user friendly. My understanding is that Google ranking of pages is measured by what it sees in the static state. So it's you need a lot of traffic from Google. Best to get it indexed as well as you can, but it's a pain. What I'm trying to do now is bootstrap as much of AngularJS as I can from the page's static state. This will help make what the server does less redundant.Leishmaniasis
You may want to have a look at brombone.Alemanni
@jriberio Yeah I tried them, unfortunately the situation is the same there too. You still need to rely on a server to recognise the crawler.Reathareave
hi @advert2013, I'm wondering what the nature of your site is? The intent of Angular is for building dynamic client side applications, as such it'll be hard to make it SEO compliant without jumping through hoops, as you're finding. Could you have some parts of your sites statically served to reap the SEO benefits (manually curated/Wordpress etc) and use Angular for a richer dynamic experience?Pilotage
@advert2013 Yes there is one way using pre-rendering, and it can work for all search engines. Check my answer below.Keverne
L
2

Here is a full overview of how to make your app SEO-friendly on a storage service such as S3, with nice urls (no #) and everything with grunt with the simple command to be performed after build:

grunt seo

It's still a puzzle of workarounds, but it's working and it's the best you can do. Thank you to @ericluwj and his blogpost who inspired me.

Overview

The goal & url structure

The goal is to create 1 html file per state in your angular app. The only major assumption is that you remove the '#' from your url by using html5history (which you should do !) and that all your paths are absolute or using angular states. There are plenty of posts explaining how to do so.

Urls end with a trailing slash like this http://yourdomain.com/page1/

Personally I made sure that http://yourdomain.com/page1 (no trailing slash) also reaches its destination, but that's off topic here. I also made sure that every language has a different state and a different url.

The SEO logic

Our goal is that when someone reaches your website through an http request:

  • If it's a search engine crawler: keep him on the page which contains the required html. The page also contains angular logic (eg to start your app) but the crawler cannot read that so he is intentionally stuck with the html you served him and will index that.
  • For normal humans and intelligent machines : make sure angular gets activated, erase the generated html and start your app normally

The grunt tasks

Here we go with the grunt tasks:

  //grunt plugins you will need:
  grunt.loadNpmTasks('grunt-prerender');
  grunt.loadNpmTasks('grunt-replace');
  grunt.loadNpmTasks('grunt-wait');
  grunt.loadNpmTasks('grunt-aws-s3');

  //The grunt tasks in the right order
  grunt.registerTask('seo', 'First launch server, then prerender and replace', function (target) {
    grunt.task.run([
      'concurrent:seo' //Step 1: in parrallel launch server, then perform so-called seotasks
    ]);
  });

  grunt.registerTask('seotasks', [
    'http', //This is an API call to get all pages on my website. Skipping this step in this tutorial.
    'wait', // wait 1.5 sec to make sure that server is launched
    'prerender', //Step 2: create a snapshot of your website
    'replace', //Step 3: clean the mess
    'sitemap', //Create a sitemap of your production environment
    'aws_s3:dev' //Step 4: upload
  ]);

Step 1: Launch local server with concurrent:seo

We first need to launch a local server (like grunt serve) so that we can take snapshots of our website.

//grunt config
concurrent: {
  seo: [
    'connect:dist:keepalive', //Launching a server and keeping it alive
    'seotasks' //now that we have a running server we can launch the SEO tasks
  ]
}

Step 2: Create a snapshot of your website with grunt prerender

The grunt-prerender plugins allows you to take a snapshot of any website using PhantomJS. In our case we want to take a snapshot of all pages of the localhost website we just launched.

//grunt config
prerender: {
  options: {
    sitePath: 'http://localhost:9001', //points to the url of the server you just launched. You can also make it point to your production website.
    //As you can see the source urls allow for multiple languages provided you have different states for different languages (see note below for that)
    urls: ['/', '/projects/', '/portal/','/en/', '/projects/en/', '/portal/en/','/fr/', '/projects/fr/', '/portal/fr/'],//this var can be dynamically updated, which is done in my case in the callback of the http task
    hashed: true,
    dest: 'dist/SEO/',//where your static html files will be stored
    timeout:5000,
    interval:5000, //taking a snapshot of how the page looks like after 5 seconds.
    phantomScript:'basic',
    limit:7 //# pages processed simultaneously 
  }
}

Step 3: Clean the mess with grunt replace

If you open the pre-rendered files, they will work for crawlers, but not for humans. For humans using chrome, your directives will load twice. Therefore you need to redirect intelligent browsers to your home page before angular gets activated (i.e., right after head).

//Add the script tag to redirect if we're not a search bot
replace: {
  dist: {
    options: {
      patterns: [
        {
          match: '<head>',
          //redirect to a clean page if not a bot (to your index.html at the root basically).
          replacement: '<head><script>if(!/bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent)) { document.location = "/#" + window.location.pathname; }</script>'
          //note: your hashbang (#) will still work.
        }
      ],
      usePrefix: false
    },
    files: [
      {expand: true, flatten: false, src: ['dist/SEO/*/**/*.html'], dest: ''} 
    ]
  }

Also make sure you have this code in your index.html on your ui-view element, which clears all the generated html directives BEFORE angular starts.

<div ui-view autoscroll="true" id="ui-view"></div>

<!-- this script is needed to clear ui-view BEFORE angular starts to remove the static html that has been generated for search engines who cannot read angular -->
<script> 
  if(!/bot|googlebot|crawler|spider|robot|crawling/i.test( navigator.userAgent)) { document.getElementById('ui-view').innerHTML = ""; }
</script>

Step 4: Upload to aws

You first upload your dist folder which contains your build. Then you overwrite it with the files you prerendered and updated.

aws_s3: {
  options: {
    accessKeyId: "<%= aws.accessKeyId %>", // Use the variables
    secretAccessKey: "<%= aws.secret %>", // You can also use env variables
    region: 'eu-west-1',
    uploadConcurrency: 5, // 5 simultaneous uploads
  },
  dev: {
    options: {
      bucket: 'xxxxxxxx'
    },
    files: [
      {expand: true, cwd: 'dist/', src: ['**'], exclude: 'SEO/**', dest: '', differential: true},
      {expand: true, cwd: 'dist/SEO/', src: ['**'], dest: '', differential: true},
    ]
  }
}

That's it, you have your solution ! Both humans and bots will be able to read your web-app

Leroylerwick answered 8/5, 2017 at 18:13 Comment(0)
K
3

Actually this is a task that is indeed very troublesome, but I have managed to get SEO working nicely with my AngularJS SPA site (hosted on AWS S3) at http://www.jobbies.co/. The main idea is to pre-generate and populate the content into the HTML. The templates will still be loaded when the page loads and the pre-rendered content will be replaced.

You can read more about my solution at http://www.ericluwj.com/2015/11/17/seo-for-angularjs-on-s3.html, but do note that there are a lot of conditions.

Keverne answered 26/11, 2015 at 9:55 Comment(0)
S
2

if you use ng-cloak in interesting ways there could be a good solution.

I haven't tried this myself, but it should work in theory

The solution is highly dependent on CSS, but it should perfectly well. For example you have three states in your angular app: - index (pathname : #/) - about (pathname : #/about) - contact (pathname : #/contact)

The base case for index can be added in too, but will be tricky so I'll leave it out for now.

Make your HTML look like this:

<body>
    <div ng-app="myApp" ng-cloak>
        <!-- Your whole angular app goes here... -->
    </div>
    <div class="static">
        <div id="about class="static-other">
            <!-- Your whole about content here... -->
        </div>
        <div id="contact" class="static-other">
            <!-- Your whole contact content here... -->
        </div>
        <div id="index" class="static-main">
            <!-- Your whole index content here... -->
        </div>
    </div>
</body>

(It's Important that you put your index case last, if you want to make it more awesome)

Next Make your CSS look something like this:-

[ng-cloak], .static { display: none; }
[ng-cloak] ~ .static { display: block; }

Just that will probably work well enough for you anyway. The mg-cloak directive will keep your angular app hidden when angular is not loaded and will show your static content instead. Google will get your static content in the HTML. As a bonus end-users can also see well styles static content while angular loads.

You can then get more creative if you start using :target pseudo selectors in your CSS. You can use actual links in your Static content, but just make them links to various ids. So in #index div make sure you have links to #about and #contact. Note the missing '/' in the links. HTML id's can't start with a slash.

Then make your CSS look like this:

[ng-cloak], .static { display: none; }
[ng-cloak] ~ .static { display: block; }
.static-other {display: none;}
.static-other:target {display: block;}
.static-other:target ~ .static-main {display: none;}

You now have a full functioning static app WITH ROUTINg that works before angular starts-up.

As an additional bonus, when angular starts up it is smart enough to convert #about to #/about automatically, and the experience shouldn't even break at all.

Also, not to forget the SEO problem has totally been solved, of course. I've not used this technique yet, as I've always had a server to configure, but I'm very interested in how this works out for you.

Hope this helps.

Stopoff answered 12/5, 2014 at 4:55 Comment(2)
Clever. If I understand you correctly what you are doing is basically showing static content only for SEO so you could for example add a sitemap to the list of static pages. But the routable parts of the angular app are still unsearchable right?Kraut
Well, I'm saying have all the routable parts of angular rendered as static parts. So everything is searchable. WITH hash URLs. You just skip the '/' in the CSS version of the URLs which Angular smartly reads correctly.Stopoff
L
2

Here is a full overview of how to make your app SEO-friendly on a storage service such as S3, with nice urls (no #) and everything with grunt with the simple command to be performed after build:

grunt seo

It's still a puzzle of workarounds, but it's working and it's the best you can do. Thank you to @ericluwj and his blogpost who inspired me.

Overview

The goal & url structure

The goal is to create 1 html file per state in your angular app. The only major assumption is that you remove the '#' from your url by using html5history (which you should do !) and that all your paths are absolute or using angular states. There are plenty of posts explaining how to do so.

Urls end with a trailing slash like this http://yourdomain.com/page1/

Personally I made sure that http://yourdomain.com/page1 (no trailing slash) also reaches its destination, but that's off topic here. I also made sure that every language has a different state and a different url.

The SEO logic

Our goal is that when someone reaches your website through an http request:

  • If it's a search engine crawler: keep him on the page which contains the required html. The page also contains angular logic (eg to start your app) but the crawler cannot read that so he is intentionally stuck with the html you served him and will index that.
  • For normal humans and intelligent machines : make sure angular gets activated, erase the generated html and start your app normally

The grunt tasks

Here we go with the grunt tasks:

  //grunt plugins you will need:
  grunt.loadNpmTasks('grunt-prerender');
  grunt.loadNpmTasks('grunt-replace');
  grunt.loadNpmTasks('grunt-wait');
  grunt.loadNpmTasks('grunt-aws-s3');

  //The grunt tasks in the right order
  grunt.registerTask('seo', 'First launch server, then prerender and replace', function (target) {
    grunt.task.run([
      'concurrent:seo' //Step 1: in parrallel launch server, then perform so-called seotasks
    ]);
  });

  grunt.registerTask('seotasks', [
    'http', //This is an API call to get all pages on my website. Skipping this step in this tutorial.
    'wait', // wait 1.5 sec to make sure that server is launched
    'prerender', //Step 2: create a snapshot of your website
    'replace', //Step 3: clean the mess
    'sitemap', //Create a sitemap of your production environment
    'aws_s3:dev' //Step 4: upload
  ]);

Step 1: Launch local server with concurrent:seo

We first need to launch a local server (like grunt serve) so that we can take snapshots of our website.

//grunt config
concurrent: {
  seo: [
    'connect:dist:keepalive', //Launching a server and keeping it alive
    'seotasks' //now that we have a running server we can launch the SEO tasks
  ]
}

Step 2: Create a snapshot of your website with grunt prerender

The grunt-prerender plugins allows you to take a snapshot of any website using PhantomJS. In our case we want to take a snapshot of all pages of the localhost website we just launched.

//grunt config
prerender: {
  options: {
    sitePath: 'http://localhost:9001', //points to the url of the server you just launched. You can also make it point to your production website.
    //As you can see the source urls allow for multiple languages provided you have different states for different languages (see note below for that)
    urls: ['/', '/projects/', '/portal/','/en/', '/projects/en/', '/portal/en/','/fr/', '/projects/fr/', '/portal/fr/'],//this var can be dynamically updated, which is done in my case in the callback of the http task
    hashed: true,
    dest: 'dist/SEO/',//where your static html files will be stored
    timeout:5000,
    interval:5000, //taking a snapshot of how the page looks like after 5 seconds.
    phantomScript:'basic',
    limit:7 //# pages processed simultaneously 
  }
}

Step 3: Clean the mess with grunt replace

If you open the pre-rendered files, they will work for crawlers, but not for humans. For humans using chrome, your directives will load twice. Therefore you need to redirect intelligent browsers to your home page before angular gets activated (i.e., right after head).

//Add the script tag to redirect if we're not a search bot
replace: {
  dist: {
    options: {
      patterns: [
        {
          match: '<head>',
          //redirect to a clean page if not a bot (to your index.html at the root basically).
          replacement: '<head><script>if(!/bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent)) { document.location = "/#" + window.location.pathname; }</script>'
          //note: your hashbang (#) will still work.
        }
      ],
      usePrefix: false
    },
    files: [
      {expand: true, flatten: false, src: ['dist/SEO/*/**/*.html'], dest: ''} 
    ]
  }

Also make sure you have this code in your index.html on your ui-view element, which clears all the generated html directives BEFORE angular starts.

<div ui-view autoscroll="true" id="ui-view"></div>

<!-- this script is needed to clear ui-view BEFORE angular starts to remove the static html that has been generated for search engines who cannot read angular -->
<script> 
  if(!/bot|googlebot|crawler|spider|robot|crawling/i.test( navigator.userAgent)) { document.getElementById('ui-view').innerHTML = ""; }
</script>

Step 4: Upload to aws

You first upload your dist folder which contains your build. Then you overwrite it with the files you prerendered and updated.

aws_s3: {
  options: {
    accessKeyId: "<%= aws.accessKeyId %>", // Use the variables
    secretAccessKey: "<%= aws.secret %>", // You can also use env variables
    region: 'eu-west-1',
    uploadConcurrency: 5, // 5 simultaneous uploads
  },
  dev: {
    options: {
      bucket: 'xxxxxxxx'
    },
    files: [
      {expand: true, cwd: 'dist/', src: ['**'], exclude: 'SEO/**', dest: '', differential: true},
      {expand: true, cwd: 'dist/SEO/', src: ['**'], dest: '', differential: true},
    ]
  }
}

That's it, you have your solution ! Both humans and bots will be able to read your web-app

Leroylerwick answered 8/5, 2017 at 18:13 Comment(0)
D
2

As AWS is offering Lambda@Edge as a service we can handle this issue without grunt or anything else. (Atleast for basic stuff)

I tried Lambda@Edge and it worked as expected, in my case I just had all the routes set to "/" in Lambda@Edge (Except for the files are present in s3 like css, images etc).

The event for the Lambda that I set to is "viewerRequest" and following is the code.

'use strict';

exports.handler = (event, context, callback) => {
    console.log("Event received is", JSON.stringify(event));
    console.log("Context received is", context);
    const request = event.Records[0].cf.request;
    if (request.uri.endsWith(".rt")) {
        console.log("URI is matching with .rt, the URI is ", request.uri);
        request.uri = "/";
    } else {
        console.log("URI is not ending with rt so letting it go URI is", request.uri);
    }
    console.log("Final request URI is", request.uri);
    callback(null, request);
};

Logs in the cloudwatch are little difficult to check as the logs are populated in the region of the cloudwatch which is nearer to the edge location which is handling the request.

For ex. Though this Lambda is deployed/written for us-east I see this in ap-south region as I am accessing the cloudfront from Singapore. Checked it in Google webmaster tools 'Fetch as google' options and the page is being rendered and viewed as expected.

Derrik answered 29/8, 2017 at 5:39 Comment(0)
E
0

I've been looking for days to find a solution for this. As far as I know there isn't nice solution to the problem. I hope firebase will eventually enable user-agent redirects. If you have the money you could use MaxCDN enterprise. They offer Edge Rules which includes redirects by user agent .

https://www.maxcdn.com/features/rules/

Encyst answered 6/1, 2016 at 21:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.