Single Page App + Amazon S3 + Amazon CloudFront + Prerender.io - how to set up?
Asked Answered
S

5

17
  1. I have single page app built with Backbone.js.
  2. I host app (app consists of static files only) on Amazon S3.
  3. I use CloudFront as a Bucket CDN.
  4. App is accessed by https://myapp.com -> https://abcdefgh34545.cloudfront.com -> https://myBucket.s3-eu-west-1.amazonaws.com/index.html

How I can use Prerender.io service with this stack? I have to somehow detect that WebSpider/WebRobot is accessing the page and redirect it to prerender.io...

Sigmatism answered 13/3, 2014 at 15:23 Comment(2)
Couldn't you configure cloudfront to cache the html by requested header value(s). Basically, you'd determine via headers if the request is coming from a bot and cache the prerendered version for that request and a non-prerendered version for browser requests.Frontispiece
Here is a full answer with only one grunt command: #23043836Patch
K
4

It's hard to use Prerender.io with a static Amazon S3 site.

You could stand up an nginx/apache server in front of s3: https://myapp.com -> https://mynginx-server.com -> https://myBucket.s3-eu-west-1.amazonaws.com/index.html

This solution is less ideal because you lose the closest-location benefit of cloudfront.

This is a good article about a custom solution: http://www.dave.cx/post/23/prerendering-angular-s3/

David was able to generate the static HTML and save them in S3, then use CloudFlare to detect _escaped_fragment_ in the URL and redirect it to the static HTML on S3.

Kubiak answered 22/3, 2014 at 22:57 Comment(3)
A reverse-proxy setup for S3 backed data is not ideal solution, but it only provides full controlled access to S3 files with cachining facilities and delivery optimisation like SPDY. CloudFront optimises the file geo-location only, but not delivery speed (TCP Congestion window and kernel settings).Cathrinecathryn
I agree. It's not ideal, but it's one of the only solutions for a completely S3 hosted site.Kubiak
Archived blog post can be found at web.archive.org/web/20140405234445/http://www.dave.cx/post/23/…Grating
A
16

You can use Lambda@Edge to configure CloudFront to send crawler HTTP requests directly to prerender.io.

The basic idea is to have a viewer-request handler which sets a custom HTTP header for requests which should be sent to prerender.io. For example this Lambda@Edge code:

        'use strict';
        /* change the version number below whenever this code is modified */
        exports.handler = (event, context, callback) => {
            const request = event.Records[0].cf.request;
            const headers = request.headers;
            const user_agent = headers['user-agent'];
            const host = headers['host'];
            if (user_agent && host) {
              if (/baiduspider|Facebot|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator/.test(user_agent[0].value)) {
                headers['x-prerender-token'] = [{ key: 'X-Prerender-Token', value: '${PrerenderToken}'}];
                headers['x-prerender-host'] = [{ key: 'X-Prerender-Host', value: host[0].value}];
              }
            }
            callback(null, request);
        };

The cloudfront distribution must be configured to pass through the X-Prerender-Host and X-Prerender-Token headers.

Finally a origin-request handler changes the origin server if X-Prerender-Token is present:

      'use strict';
      /* change the version number below whenever this code is modified */
      exports.handler = (event, context, callback) => {
           const request = event.Records[0].cf.request;
           if (request.headers['x-prerender-token'] && request.headers['x-prerender-host']) {
             request.origin = {
                 custom: {
                     domainName: 'service.prerender.io',
                     port: 443,
                     protocol: 'https',
                     readTimeout: 20,
                     keepaliveTimeout: 5,
                     customHeaders: {},
                     sslProtocols: ['TLSv1', 'TLSv1.1'],
                     path: '/https%3A%2F%2F' + request.headers['x-prerender-host'][0].value
                 }
             };
          }
          callback(null, request);
      };

There's a fully worked example at: https://github.com/jinty/prerender-cloudfront

Aguirre answered 8/2, 2018 at 15:46 Comment(2)
Why are two Lambdas needed? Why can't a single Lambda change the origin and set the Prerender headers?Orphaorphan
If i am not mistaken, i think it is because the different lambdas have diffent contexts, and you do not have the original url in the origin-request lambda but you do in the viewer-request lambdaJess
D
10

I managed to do this by not using Prerender at all but creating AWS Lambda function that:

  • Requests the origin page from CloudFront (it actually is always the same index.html)
  • Map the lambda function via API Gateway catch-all proxy
  • Study the path and figure out what resource page should be about (in my case it is simply /user/{name}, so I only have to do one use-case
  • Make REST API request to get the dynamic data for the user
  • Regex replace the existing meta-fields with the dynamic ones
  • Return the new index-file with new metas

Configure new origin (new lambda function) and behaviour (map /user/* requests to this new origin). Be sure to use "HTTPS only" Origin Protocol Policy for the origin, as API Gateway is only HTTPS, redirect here will cause the hostname to change.

(If you by accident used the redirect, then you will need to Invalidate "/*" as due to some CloudFront bug the configuration change will not help ; I spent multiple hours debugging this last night)

Demeanor answered 18/10, 2016 at 21:58 Comment(7)
This is a great solution! Looks like this has only been possible as of September 2016. Any chance you can post the Lambda function?Roseline
I'm curious how this works as well. Is there a gist or some other documentation you could point to?Longhorn
Here you go. I tried to remove the code that is our company specific, I hope it still works: jsfiddle.net/g711p2jhDemeanor
Love it. Thank you @Demeanor Giving this a try. So you setup the API Gateway to just catch all requests at that particular endpoint that the cloudfront was hitting?Medici
In our case it's only certain directory /xxx/ that is being handled like that. Root and other directories will get the static index.html file with static metadata-fields.Demeanor
I ran into the issue of not being able to forward the originating host through. We have several domains mapped to the CF endpoint. Have you hit this scenario by chance?Medici
Yeah, CF distribution doesn't take domain into consideration. However you could create own distribution for each domain but they would all point to the same origin (if that was your setup), then you can create own rules for each domain. Downside is that if you need to invalidate, you might need to do it multiple times, but Ansible and scripting comes to help in that case. We're invalidating from our gulp deploy task now.Demeanor
K
4

It's hard to use Prerender.io with a static Amazon S3 site.

You could stand up an nginx/apache server in front of s3: https://myapp.com -> https://mynginx-server.com -> https://myBucket.s3-eu-west-1.amazonaws.com/index.html

This solution is less ideal because you lose the closest-location benefit of cloudfront.

This is a good article about a custom solution: http://www.dave.cx/post/23/prerendering-angular-s3/

David was able to generate the static HTML and save them in S3, then use CloudFlare to detect _escaped_fragment_ in the URL and redirect it to the static HTML on S3.

Kubiak answered 22/3, 2014 at 22:57 Comment(3)
A reverse-proxy setup for S3 backed data is not ideal solution, but it only provides full controlled access to S3 files with cachining facilities and delivery optimisation like SPDY. CloudFront optimises the file geo-location only, but not delivery speed (TCP Congestion window and kernel settings).Cathrinecathryn
I agree. It's not ideal, but it's one of the only solutions for a completely S3 hosted site.Kubiak
Archived blog post can be found at web.archive.org/web/20140405234445/http://www.dave.cx/post/23/…Grating
P
1

Have a look at the full solution over here, creating snapshots of your website with grunt and serving them to search engines with nothing more than amazon S3:

AngularJS SEO for static webpages (S3 CDN)

Patch answered 12/6, 2017 at 13:28 Comment(0)
U
1

As mentioned, it seems the easiest way to do this is to configure CloudFront/Lambda@Edge to proxy requests to a prerender service. I've found a repo that seems to take care of quite a bit of the aforementioned work for you: https://github.com/sanfrancesco/prerendercloud-lambda-edge

This uses Lambda@Edge to prerender your app via a make deploy command. Unfortunately, this uses prerender.cloud, NOT prerender.io. Hopefully this isn't a blocker.

Unstressed answered 22/11, 2018 at 11:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.