How to improve SEO for Serverless Websites?
Asked Answered
F

4

8

I want to improve SEO (i.e., correctly index my pages on search engines) in a serverless architecture when my website is hosted on AWS S3.

As I'm using a JavaScript approach to routing (something akin to angular, but simpler) and getting dynamic content to fill metatags, I'm finding everything to be quite troublesome for scrapers without JavaScript support, like Facebook's.

I have default meta-tags already inserted and those are, of course, loaded just fine but I need the updated ones.

I know most people uses pre-rendering on a server or through something like Prerender.io but I really wanted to find an alternative that makes sense on a serverless approach.

I thought I had it figured out since Open Graph metatags allow for a "pointers" URL where you can have a "metatags-only" HTML if needed. So I was thinking of using a Lambda function to generate the HTML response with the right metatags on a GET request. The problem is since the Facebook scraper has no JavaScript support, how can I send the dynamic content on the GET request?

Fleta answered 20/11, 2016 at 22:11 Comment(0)
S
10

If you are using S3, you must prerender the pages before uploading them. You can't call Lambda functions on the fly because the crawler will not execute JavaScript. You can't even use Prerender.io with S3.

Suggestion:

  1. Host your website locally.
  2. Use PhanthomJS to fetch the pages and write a prerendered version.
  3. Upload each page to S3 following the page address*.

* E.g.: the address from example.com/about/us must be mapped as a us.html file inside a folder about in your bucket root.

Now, your users and the crawlers will see the exactly the same pages, without needing JavaScript to load the initial state. The difference is that with JavaScript enabled, your framework (Angular?) will load the JS dependencies (like routes, services, etc.) and take control like a normal SPA application. When the user click to browse another page, the SPA will reload the inner content without making a full page reload.

Pros:

  • Easy to setup.
  • Very fast to serve content. You can also use CloudFront to improve the speed.

Cons:

  • If you have 1000 pages (for e.g.: 1000 products that you sell in your store), you need make 1000 prerendered pages.
  • If your page data changes frequently, you need to prerender frequently.
  • Sometimes the crawler will index old content*.

* The crawler will see the old content, but the user will probably see the current content as the SPA framework will take control of the page and load the inner content again.


You said that you are using S3. If you want to prerender on the fly, you can't use S3. You need to use the following:

Route 53 => CloudFront => API Gateway => Lambda

Configure:
- Set the API Gateway endpoint as the CloudFront origin.
- Use "HTTPS Only" in the "Origin Policy Protocol" (CloudFront).
- The Lambda function must be a proxy.

In this case, your Lambda function will know the requested address and will be able to correctly render the requested HTML page.

Pros:

  • As Lambda has access to the database, the rendered page will always be updated.

Cons:

  • Much slower to load the webpages.
Spheno answered 21/11, 2016 at 1:46 Comment(1)
Wow very detailed answer! Thank you very much! If none other comes I'll definitely "give up" and take your suggestion. Thank you so much for your time.Glycogenesis
S
9

If you are willing to use CloudFront on top of your S3 bucket, there is a new possibility to solve your problem using prerender on the fly. Lambda@Edge is a new feature that allows code to be executed with low latency when a page is requested. With this, you can verify if the agent is a crawler and prerender the page for him.

01 Dec 2016 announcement: Lambda@Edge – Preview

Just last week, a comment that I made on Hacker News resulted in an interesting email from an AWS customer!

(...)

Here’s how he explained his problem to me:

In order to properly get indexed by search engines and in order for previews of our content to show up correctly within Facebook and Twitter, we need to serve a prerendered version of each of our pages. In order to do this, every time a normal user hits our site need for them to be served our normal front end from Cloudfront. But if the user agent matches Google / Facebook / Twitter etc., we need to instead redirect them the prerendered version of the site.

Without spilling any beans I let him know that we were very aware of this use case and that we had some interesting solutions in the works. Other customers have also let us know that they want to customize their end user experience by making quick decisions out at the edge.

This feature is currently in preview mode (dec/2016), but you can request AWS to experiement it.

Spheno answered 4/12, 2016 at 2:41 Comment(0)
V
2

Here's a solution that uses (and is approved by) prerender.cloud: https://github.com/sanfrancesco/prerendercloud-lambda-edge

This uses Lambda@Edge to prerender your app via a make deploy command.

Taken from the repo's README:

Server-side rendering (pre-rendering) via Lambda@Edge for single-page apps hosted on CloudFront with an s3 origin.

This is a serverless project with a make deploy command that:

  1. serverless.yml deploys 3 functions to Lambda (viewerRequest, originRequest, originResponse)
  2. deploy.js associates them with your CloudFront distribution
  3. create-invalidation.js clears/invalidates your CloudFront cache
Variolite answered 22/11, 2018 at 11:40 Comment(0)
H
0

There are actually couple of options. Mostly will require Cloudfront and Lambda@Edge. One possible way is to add some logic to your lambda@edge function to check the 'user-agent' header of the request to differentiate between requests from crawlers and regular users. If request is from crawler, you can present a crawler friendly response with meta tags optimized for such request.

This will definitely require some extra work, and it means a lambda@edge execution with almost every request. I hope that AWS give us an option to differentiate based on header on the future.

Hurlyburly answered 25/8, 2020 at 23:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.