Configuring any CDN to deliver only one file no matter what url has been requested
Asked Answered
L

9

16

I am currently working on a new project where the entire page should be implemented in HTML5/JS working against an API/JSON. Since the entire application should only consist of one HTML file (index.html) and a JS MVC application (maybe backboneJs) I am thinking about SEO and user friendly urls.

There I came across

window.document.pushstate('','title','/url');

With the help of that html5 feature I can define URLs without really leaving or reloading the page. BUT... I want to deploy the application into a CDN like Amazon CloudFount for performance reason and low expenses. I would not need any server infrastructure (besides the one I need for the API of course)

So can I configure a CDN (really any CDN like AWS, Azure, Akamai) to provide the same HTML file no matter what URL is called

http://www.example.com => delivers index.html

http://www.example.com/any_subpage => delivers index.html

and so on ...

an working example you can find at http://html5.gingerhost.com. But the creator of that page may use an .htaccess file or something familiar to map everything to the same file. I want to provide the same functionality in a CDN.

Lightless answered 7/9, 2012 at 18:12 Comment(8)
so basically you want the mod_rewrite functionality as contained in the .htaccess example on AWS CloudFront CDN.Tyrelltyrian
exactly... but I am not doomed to use the AWS CDN. It may also Azure or any other. I would just need that functionalityLightless
Don't think CloudFront supports this, but it's a great question.Jointworm
I am also thinking of using CloudFront for a project I'm working on. If I find a solution while digging through the docs I will be sure to update with an answer. :)Tyrelltyrian
So it doesn't matter if search-engines can't index your site and people with javascript disabled can't browse it?Gandzha
people with deactivatedjavascript i do not care about. the page should be indexed hsing the pushstate methodLightless
Is there a reason not to use a redirect? In addition, if you rewrite too early you may lose information in the statistics.Pneumothorax
where to redirect? based on this concept only one file exist, so everything would be redirects to index file, I just want to rewrite internally.Lightless
V
4

Any CDN should have the capability of defining an origin server. This server gets contacted by the CDN to serve the file if the edge location doesn't have it.

The good news is that the origin server can be anything that serves web pages, such as Apache, Nginx, etc. This means that you can apply any kind of rewriting rules you wish.

If you don't wish to set up the origin server yourself, you could look at hosting your (static) site on S3. Recently they have introduced web redirects which may help you to to serve the same file under a different "alias". Failing that, you could look at redefining the standard error document, but I'm not sure whether an error status code will still be sent.

Valenciavalenciennes answered 22/11, 2012 at 4:50 Comment(0)
K
3

CDNs are intended to deliver static content by serving the static resource from the closest geographical point possible to the client. CDN technology is not intended to do a redirect or server side processing of the request. You'll need something else involved here to do that part. The question is just whether that is a server side technology or some sort of load balancer/firewall request re-writing (to avoid having a server side technology).

I don't think there is a truly platform agnostic way of doing this. You'll always be tied to either a server side technology or a load balancer/firewall platform. But it also sounds like you may already have this constraint as you need somewhere to host your JSON API? Even if you haven't decided on the platform, pretty much any platform should allow you to do some basic routing. If you can serve JSON Http requests, you should be able to do some page routing too.

As a side note, I don't believe you want to return your "index.html" from absolutely all possible URLs at your domain. You'll want some list of valid URLs and invalid URLs. In which case you'll need to be pinging your back end anyway to validate the request URL. That further suggests to me that a server side technology will be better suited for this task then a blind "catch-all" redirect at a lower level.

My personal preference would be to use your favorite MVC framework to serve indexable content with your desired URL structure (pretty much all page loads) and then use your JSON api to work with that content after page load (any dynamic stuff you want to be able to do). The whole thing, both page loads and API, being served from the same server platform/environment.

Kreda answered 22/11, 2012 at 4:33 Comment(0)
L
1

Symlink your 404 page to the index page. That way, when a requested URL is not found on your web-content (about any link, as it appears in your case), the 404 page is served, which is in turn the index page itself.

# ln -s index.html 404.html

Legislatorial answered 9/9, 2012 at 7:37 Comment(1)
that would be a hack since there will be the 404 status deliveredLightless
D
1

Nginx http server can do this like:

location /{
    # serve a file
}

or you can customize your links like

location /my_html{
     # serve html file
}

location /cdn/{
     # serve rest files
}

you can even check urls by regexps

location ~ /cdn/.*\.js${
    # serve cdn
}
Dowable answered 22/11, 2012 at 0:40 Comment(1)
configuring nginx, would force me to setup my own cdn :)Lightless
E
1

We recently contacted edgecast.com (which is a cdn like cloudfront) and through their support they told me that this is indeed something that they can do, but not through their standard interface. I was told to simply contact them when we needed a wildcard route to a single file.

As for your your question: yes, it's possible. Just contact them through their live chat and they'll help you out, good luck!

Some more (negative) information: A catch-all-rule like this means the stupid favicon.ico-forced-request some browsers (read IE) do will be caught and the regular html-page will be downloaded again. In fact, all automatic requests (iframes also request a favicon, for example) against the root-domain will be caught and the regular html-file will be downloaded. This may or may not be a problem for you but for me all these hidden requests is making me rethink the solution and using a webserver behind to do the actual catch-all. Shame really.

Encephalogram answered 22/11, 2012 at 9:58 Comment(0)
O
1

In case you have your own domain that point at the CDN (I know CloudFront let you do that), you could use CloudFlare ( https://www.cloudflare.com/ ) as a reverse proxy between your users and the CDN.

Thanks to their free plan, you can create a rule that redirects everything to index.html. I think this is the only way to achieve what you want, given that CDNs are configured to serve only static existing files as you know.

Overtire answered 22/11, 2012 at 10:34 Comment(2)
mh sure, but that would kill the principle of a cdn, that uses edge servers to transmit data from the nearest point to the customer and it would make me force to use another service, I had to pay forLightless
You're right about the customer location CDN advantage. I don't want to promote CloudFlare in particular, but with the free plan, all static files are cached by CloudFlare that have its own CDN. I know that would obviously make the need of another CDN reduced to serving files once to CloudFlare. But if I understand well your question, you wouldn't have to pay for cloudflare at all, because your needs seem entirely covered by their free plan.Overtire
V
1

If you’re considering SEO and friendly URLs, you can accomplish some of that using pushState, sure. Just remember that:

  • When redirecting all routes to index.html you will also serve the exact same html content to the search engines no matter what URL they march in on. Then it wont matter how "SEO-friendly" your URL is.

  • If you’re thinking IE support, it doesn’t support the History API, so you’ll need a higher-level history framework or some other workaround for IE. And that will most likely include #-based URLs. So you will basically have two different URLs for each view, that’s something to consider when people share URLs or figuring out how search robots catches links to your site.

I would suggest considering the following two options before you go too far in finding a suitable cloud host:

  1. Off-load some of the data logic to the backend and serve at least some digestable content for each view. You can still remove or maybe parse that content in your app and do your pushstate/jsonAPI thing for better UX, but you will have "true", scannable URLs for the search engines, opera mini users and some other unfortunate browsers. These static pages do not have to serve the same functionality or even styles, just think of it as the last fallback.

  2. Forget about the CDN for the app, just use the CDN for static files, images, scripts etc. You can have a couple of fallbacks for the app itself, but it’s the media that really pulls the server. Doing so will put you in control over the app and the server behind it, but you can still use CDN for what it was meant for – serving static content.

Valerlan answered 24/11, 2012 at 22:55 Comment(0)
M
1

I'm in the same boat as you are and it seems that the cdn's are not supporting url rewriting. The following solution does not solve our "problem" exactly but comes very close to saving $ for hosting if you're using a "pull" CDN provider.

Initial load of the default page (index.html) will provide just a tiny piece of the html, basically the bare-bones html structure, like so:

<!doctype html>
<html lang="en">
<head>
    <title>something</title>
    <!-- Load the script "js/main.js" as our entry point -->
    <script data-main="js/main" src="http://mycdn.com/js/libs/require/require.js"></script>
</head>
<body>

</body>
</html>

The rest of the code would be loaded via some (async) module loader like require.js -- and all of that code would come from your CDN, including require.js.

However, even this tiny bit of html in no time will also come from the CDN if you're using pull CDN. The CDN "pull" provider will hit this page whenever it does not find a file for an html5 pushstate url in its cache.

On your server you have to have some kind of routing to route every request that matches a pattern where a file extension is not provided from the CDN to this one file.

Yes, the CDN will hit the site every time a new url is encountered (if you're using pull CDN) but after it gets it, it will distribute it to all the users from its cache and will not hit your site for the same url again. Also, the hit on your site from the CDN provider will be insignificant since you're serving a tiny bit of static html. And, if you set your file headers to never expire on this html file (this file should really never change) the file can be kept by the CDN provider for a very long time (depending on the provider), so the hits on your server would pretty much come down to a one time event per a unique url.

Matthaus answered 11/12, 2012 at 3:18 Comment(0)
R
0

This guy had a similar problem and used S3 / CloudFront. All his urls redirect to index.html with a statuscode 200.

This is a workaround as it involves setting index.html as an error page.

See details: https://kkob.us/2015/11/24/hosting-a-single-page-app-on-s3-with-proper-urls/

Reachmedown answered 25/10, 2016 at 16:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.