Here is a full overview of how to make your app SEO-friendly on a storage service such as S3, with nice urls (no #) and everything with grunt with the simple command to be performed after build:
grunt seo
It's still a puzzle of workarounds, but it's working and it's the best you can do. Thank you to @ericluwj and his blogpost who inspired me.
Overview
The goal & url structure
The goal is to create 1 html file per state in your angular app. The only major assumption is that you remove the '#' from your url by using html5history (which you should do !) and that all your paths are absolute or using angular states. There are plenty of posts explaining how to do so.
Urls end with a trailing slash like this
http://yourdomain.com/page1/
Personally I made sure that http://yourdomain.com/page1 (no trailing slash) also reaches its destination, but that's off topic here. I also made sure that every language has a different state and a different url.
The SEO logic
Our goal is that when someone reaches your website through an http request:
- If it's a search engine crawler: keep him on the page which contains the required html. The page also contains angular logic (eg to start your app) but the crawler cannot read that so he is intentionally stuck with the html you served him and will index that.
- For normal humans and intelligent machines : make sure angular gets activated, erase the generated html and start your app normally
The grunt tasks
Here we go with the grunt tasks:
//grunt plugins you will need:
grunt.loadNpmTasks('grunt-prerender');
grunt.loadNpmTasks('grunt-replace');
grunt.loadNpmTasks('grunt-wait');
grunt.loadNpmTasks('grunt-aws-s3');
//The grunt tasks in the right order
grunt.registerTask('seo', 'First launch server, then prerender and replace', function (target) {
grunt.task.run([
'concurrent:seo' //Step 1: in parrallel launch server, then perform so-called seotasks
]);
});
grunt.registerTask('seotasks', [
'http', //This is an API call to get all pages on my website. Skipping this step in this tutorial.
'wait', // wait 1.5 sec to make sure that server is launched
'prerender', //Step 2: create a snapshot of your website
'replace', //Step 3: clean the mess
'sitemap', //Create a sitemap of your production environment
'aws_s3:dev' //Step 4: upload
]);
Step 1: Launch local server with concurrent:seo
We first need to launch a local server (like grunt serve) so that we can take snapshots of our website.
//grunt config
concurrent: {
seo: [
'connect:dist:keepalive', //Launching a server and keeping it alive
'seotasks' //now that we have a running server we can launch the SEO tasks
]
}
Step 2: Create a snapshot of your website with grunt prerender
The grunt-prerender plugins allows you to take a snapshot of any website using PhantomJS. In our case we want to take a snapshot of all pages of the localhost website we just launched.
//grunt config
prerender: {
options: {
sitePath: 'http://localhost:9001', //points to the url of the server you just launched. You can also make it point to your production website.
//As you can see the source urls allow for multiple languages provided you have different states for different languages (see note below for that)
urls: ['/', '/projects/', '/portal/','/en/', '/projects/en/', '/portal/en/','/fr/', '/projects/fr/', '/portal/fr/'],//this var can be dynamically updated, which is done in my case in the callback of the http task
hashed: true,
dest: 'dist/SEO/',//where your static html files will be stored
timeout:5000,
interval:5000, //taking a snapshot of how the page looks like after 5 seconds.
phantomScript:'basic',
limit:7 //# pages processed simultaneously
}
}
Step 3: Clean the mess with grunt replace
If you open the pre-rendered files, they will work for crawlers, but not for humans. For humans using chrome, your directives will load twice. Therefore you need to redirect intelligent browsers to your home page before angular gets activated (i.e., right after head).
//Add the script tag to redirect if we're not a search bot
replace: {
dist: {
options: {
patterns: [
{
match: '<head>',
//redirect to a clean page if not a bot (to your index.html at the root basically).
replacement: '<head><script>if(!/bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent)) { document.location = "/#" + window.location.pathname; }</script>'
//note: your hashbang (#) will still work.
}
],
usePrefix: false
},
files: [
{expand: true, flatten: false, src: ['dist/SEO/*/**/*.html'], dest: ''}
]
}
Also make sure you have this code in your index.html on your ui-view element, which clears all the generated html directives BEFORE angular starts.
<div ui-view autoscroll="true" id="ui-view"></div>
<!-- this script is needed to clear ui-view BEFORE angular starts to remove the static html that has been generated for search engines who cannot read angular -->
<script>
if(!/bot|googlebot|crawler|spider|robot|crawling/i.test( navigator.userAgent)) { document.getElementById('ui-view').innerHTML = ""; }
</script>
Step 4: Upload to aws
You first upload your dist folder which contains your build. Then you overwrite it with the files you prerendered and updated.
aws_s3: {
options: {
accessKeyId: "<%= aws.accessKeyId %>", // Use the variables
secretAccessKey: "<%= aws.secret %>", // You can also use env variables
region: 'eu-west-1',
uploadConcurrency: 5, // 5 simultaneous uploads
},
dev: {
options: {
bucket: 'xxxxxxxx'
},
files: [
{expand: true, cwd: 'dist/', src: ['**'], exclude: 'SEO/**', dest: '', differential: true},
{expand: true, cwd: 'dist/SEO/', src: ['**'], dest: '', differential: true},
]
}
}
That's it, you have your solution ! Both humans and bots will be able to read your web-app