Can CloudFlare perform automatic failover to a different backend?
Asked Answered
E

3

19

I am looking for an easy way to fail over to a different DC quickly, does CloudFlare offer anything special in this regards with things like health checks or is it just like a standard DNS service?

Esteresterase answered 5/5, 2014 at 17:6 Comment(0)
S
14

We don't have automatic failover at this time (something we're looking at). We can support the additional DNS entries in your zone file, of course, but you would currently have to manually make the change in that circumstance.

Scaife answered 5/5, 2014 at 17:53 Comment(6)
Cloudflare has been looking at this for like 6 months already and still unable to do automatic failover. Come on, aren't you just nginx?Parmer
We have a lot of other things in the product pipeline that may take precedent over this. In addition, I don't believe we have figured out which plans would have the feature. You could still probably write some sort of script that would take care of this via the API in the interim.Scaife
@Scaife After your latest recent downtime I think it's fair to say that automatic failover should become a number 1 priority at Cloudflare. cloudflarestatus.com/incidents/m8hf8q8vs81dBircher
@Kelseydh how exactly would automatic failover for your site help in this case with their downtime? most of their customers likely dont need or use global dns failover or already have local load balancing in place.Nelle
@ManiGandham My understanding of DNS failover is limited.. but what I know is that my app was down for a number of hours when we used Cloudflare as a DNS, presumably because they didn't have automatic failover to resolve the DNS from another server. Despite what you may think, I don't believe most web apps actually have automatic failover protection -- and all clients always want uptime regardless of whether they need it. Many developers just stick an app on heroku, set up cloudflare for speed, and assume they're covered. Cloudflare's business is irrevocably tied to making sure they are...Bircher
That incident you link to wasn't CloudFlare's fault - it was another DNS provider (who only does DNS). If the DNS itself isnt working with CloudFlare (which hasn't happened yet in our history with them) then you would have to change your nameservers - no easy failover setup for that, for anyone. Outside of DNS related issues, you can do what you need with regards to server failures by using another 3rd party service or just using CF's API to switch out records with a health check.Nelle
S
16

Update: CloudFlare started a closed beta for the Traffic Manager feature which allows to do exactly this kind of failover:

https://www.cloudflare.com/traffic-manager/


AWS Failover:

The following solution seems to work well when you are hosting your backend system on AWS:

  1. I setup a AWS Route 53 zone with a separate domain (e.g. failover-example.com). Route 53 allows you to setup health checks on the backend server (e.g. the load balancer) with DNS failover. AWS will remove the unhealthy backend system from the DNS record list.
  2. In cloudflare I setup a CNAME for example.com record to failover-example.com and activate the cloudflare proxy on example.com.

The result is that the browser resolves the IP address of example.com to a cloudflare IP address. Cloudflare queries the AWS DNS server to lockup failover-example.com. Cloudflare fetches the content from the resolved IP address and returns the content back to the browser.

In my tests the switch to the other backend system occurs after ca. 20 seconds.

The separate domain is required because cloudflare does not route the traffic through the proxy when the CNAME is a subdomain of example.com.

I have tried to visualize the failover. In theory the failover works with any DNS failover capable service and not only with Route53:

enter image description here

The browser connects always with CloudFlare and hence a DNS failover of the backend system does never effect the browser of the user.

Scrounge answered 3/7, 2015 at 9:19 Comment(8)
How does caching of DNS records within the browsers themselves come into play with this suggestion? Even with a very low TTL on all DNS records, a browser like Chrome can take 15 minutes or more to failover. (Reference: joshwright.com/tips/intro-to-automated-failovers)Naturopathy
I have update the post with a visualization to make it more clear. Generally DNS failover can be problematic. However in this case it works, because CloudFlare seems to respect the TTL.Scrounge
I think since the IP of your domain name never changes from the CloudFlare IP, then you don't need to worry about the browser caching DNS. I am going to implement this now; along with some extra scripting using the AWS API to manipulate the weight of each record based on the load of the servers in the cluster. Thanks for pointing me in the right direction!Naturopathy
There was recently some update which allows to manipulate the Route 53 entries based on CloudWatch metrics: aws.amazon.com/about-aws/whats-new/2015/09/… Eventually you don't need any custom scripting. It depends on your architecture.Scrounge
@ThomasHunziker what happens if route 53 is down in this schema? CF still gets the content from Backen System A?, doesn't it need an A/AAAA record for it?Yvette
@Francesc I think you have raised a good question. The chances are small that the DNS service fails, however I had once the situation that the DNS provider failed, this can be problematic. I do not know what CloudFlare is doing. Eventually they cache the result and try to reach A even without a DNS record. But that's not clear. Btw here is the service level agreement of AWS Route 53: aws.amazon.com/route53/slaScrounge
have you guys tried this with servers in multiple continents? using route 53 geo? I wonder if the cloudflare would be resolving the right continent, or be latency based?Envision
I have not tried that. However I would assume that CloudFlare is holding a cache of the DNS records in each edge node. As such the geo routing feature of Route 53 should also work.Scrounge
S
14

We don't have automatic failover at this time (something we're looking at). We can support the additional DNS entries in your zone file, of course, but you would currently have to manually make the change in that circumstance.

Scaife answered 5/5, 2014 at 17:53 Comment(6)
Cloudflare has been looking at this for like 6 months already and still unable to do automatic failover. Come on, aren't you just nginx?Parmer
We have a lot of other things in the product pipeline that may take precedent over this. In addition, I don't believe we have figured out which plans would have the feature. You could still probably write some sort of script that would take care of this via the API in the interim.Scaife
@Scaife After your latest recent downtime I think it's fair to say that automatic failover should become a number 1 priority at Cloudflare. cloudflarestatus.com/incidents/m8hf8q8vs81dBircher
@Kelseydh how exactly would automatic failover for your site help in this case with their downtime? most of their customers likely dont need or use global dns failover or already have local load balancing in place.Nelle
@ManiGandham My understanding of DNS failover is limited.. but what I know is that my app was down for a number of hours when we used Cloudflare as a DNS, presumably because they didn't have automatic failover to resolve the DNS from another server. Despite what you may think, I don't believe most web apps actually have automatic failover protection -- and all clients always want uptime regardless of whether they need it. Many developers just stick an app on heroku, set up cloudflare for speed, and assume they're covered. Cloudflare's business is irrevocably tied to making sure they are...Bircher
That incident you link to wasn't CloudFlare's fault - it was another DNS provider (who only does DNS). If the DNS itself isnt working with CloudFlare (which hasn't happened yet in our history with them) then you would have to change your nameservers - no easy failover setup for that, for anyone. Outside of DNS related issues, you can do what you need with regards to server failures by using another 3rd party service or just using CF's API to switch out records with a health check.Nelle
I
14

To add -- in the mean time, I'd recommend looking at https://runbook.io

Several other DIY options:

You'd want to decide if these are the right options for you, of course.

Inconvertible answered 5/5, 2014 at 20:48 Comment(1)
I went with a DIY solution based on the blog.booru.org article. Updating the DNS through the CF API only takes a minute and you're back in business.Litta

© 2022 - 2024 — McMap. All rights reserved.