Web App: High Availability / How to prevent a single point of failure?

Asked 30/10, 2011 at 3:52 Answered 3/2, 2021 at 18:49

Solved load-balancing high-availability cluster-computing uptime

Can someone explain to me how high-availability ("HA") works for a web application ... because I assume HA means that there exist no single-point-of-failure.

However, even if a load balancer is used- isn't that the single point of failure?

Freehold answered 30/10, 2011 at 3:52 Comment(11)

Not when you have two load balancers set up to failover. – Metre 30/10, 2011 at 4:5

@Dave Newton, but how do 2 load balancers answer the single request coming in? I'm trying to imagine, so let's I want to visit example.com, my browser resolves the IP address and then sends a single request to the IP of example.com, how is it possible that multiple servers (load balancers) can "answer" the web request coming in from my browser? At some point, it's there a single piece of hardware that is the point of failure? – Freehold 30/10, 2011 at 19:26

They don't; one does. If one starts to fail, the other takes over. There are a variety of mechanisms to handle this, all beyond the scope of an SO question, really. Desmond already pretty much said all that. – Metre 30/10, 2011 at 19:30

Argh. I feel your frustration, nickb. It's very clear that just changing your IP address to point at a load-balancer (or a load-balancer-balancer, or a load-balancer-balancer-balancer) doesn't achieve high-availability, because then that load balancer can fail. Yet answers to this question all over the net seem to consist of either "Just add another layer of load balancing!" (which plainly doesn't help) or "This is a very complicated topic that you are too noob to understand". @DaveNewton has managed to provide both unhelpful dismissals, here. – Mccloud 5/4, 2018 at 16:9

@MarkAmery Fault-tolerance is well beyond the scope of an SO answer, even if it was on-topic. Nonetheless, despite your cries of "oh that doesn’t help" that’s the answer: scaling out balancers/servers/infra is the solution. – Metre 5/4, 2018 at 16:54

@DaveNewton No, it's really obviously not the solution. Making your IP resolve to a single entry-point load balancer is just as much of a single point of failure as having it resolve to a single web server, whether that load balancer has one or 100 more layers of load balancers behind it. What exactly is hard to understand here? The real solution clearly involves something other than just scaling out layers of load balancers. (I think it involves doing clever things with BGP, though that's way outside my area of expertise.) – Mccloud 6/4, 2018 at 12:58

@MarkAmery Which is why I said multiple balancers? I’m not sure what’s hard to understand here: to eliminate single points of failure you implement failovers. They can fail too—the point is to have redundancy and hope failures can be resolved. How do you think large websites work? Multiple points of entry, app servers, DBs. Switchable fabric to re-route requests, internal or external, when failures are detected. I don’t know of any mid- to large-scale site that has single anything. Shrug—it’s been working for every site I’ve been involved with, from 10sK to 10sM. – Metre 6/4, 2018 at 13:9

@DaveNewton "Which is why I said multiple balancers?" - co-ordinated how, if not by another load balancer in front of them? The entire question here is what mechanism there is by which it's possible to let one server (or load balancer) take over when another fails besides just sticking another SPOF in front of them. I have no idea what that mechanism is, which is why I ended up here; throwing more layers at the problem clearly doesn't solve it. Maybe it's the "switchable fabric" you allude to, although I don't know what "fabric" or "sK" or "sM" are and none of them yield to Googling. – Mccloud 6/4, 2018 at 13:25

@MarkAmery Those are numbers of users. I think we're talking past each other-but there are many resources you could scan to understand the basics of HA infrastructure. – Metre 6/4, 2018 at 14:58

@MarkAmery Agree with you, which is why I'm reading all through the end of the chat – Puck 17/8, 2018 at 13:23

Clearly it all comes down to ensuring the DNS-resolved first load balancer is HA. There must be a system to monitor its availability (like sentinel in redis), which -- e.g. by a quorum decision -- can decide the load balancer went down, and issue commands to a hot-standby replacement to take over (e.g. assume the IP DNS is resolving to). – Scala 26/3, 2020 at 16:24

I have found this article on the subject: http://www.tenereillo.com/GSLBPageOfShame.htm

Basically if you do not require long lasting sticky sessions you can configure your DNS servers to return multiple A records (IP addresses) for your website.

Web browsers are smart enough to try all the addresses until they find one that works.

Klug answered 7/3, 2013 at 5:31 Comment(2)

-1; this contradicts multiple sources I've seen (example: serverfault.com/a/328321/147556) that claim that returning multiple A records (AKA "round robin DNS") does not result in browsers (which are the main kind of HTTP clients we care about when talking about websites) rapidly cycling through the IPs to find one that works in the event of a failure, but instead incurs long timeouts, and that as such having multiple IPs in an A record is not a solution to "high-availability". Maybe everyone else is wrong, or maybe things have changed since 2010, but I cautiously assume not. – Mccloud 5/4, 2018 at 16:31

We can't even trust browsers to consistently run the same line of JavaScript. Not sure I'd be comfortable relying on them to round-robin a list of IPs. – Hirohito 23/4, 2018 at 22:17

In simple words high availability can be defined as running a system 24*7 without a downtime even if there are hardware and software failures. In other way a fault tolerance application. This helps ensure uninterrupted use of the application for it’s intended users.

Timoteo answered 17/8, 2015 at 13:13 Comment(0)

It works the following way that you setup two HA Proxy servers with heartbeat, so when one fails (stops responding to queries), it's being removed from the cluster. Requests from HA Proxy can be forwarded to web servers in round robin fashion, and if one web server fails, HA Proxy servers do not try to contact it until it's alive. Web servers are storing all dynamic information in database, which is replicated across two MySQL instances. As you can see, HA Proxy and Cluster MySQL (or simply MySQL replication) as well IP Clustering here is the key.

example high availabibility cluster

Aleksandrovsk answered 25/2, 2012 at 23:26 Comment(5)

But in your diagram, what I don't understand is, how does HAPRoxy work? When the Client DNS resolves, it can only resolve to a single machine. So are HAProxy somehow sharing the same IP address? – Freehold 2/5, 2013 at 20:35

@Freehold as Dave Newton responded above, the DNS can be configured to return multiple IP addresses for one external hostname. The client can then make multiple attempts to contact the service. See 'A RECORDS' and 'CNAME RECORDS' with respect to DNS configuration. – Goat 24/11, 2014 at 14:23

@Freehold You are right, the HA service can enable the HA Proxies to share a single virtual IP that the Client will connect to. The HA service for unix can be (u)carp and keepalived, RedHat Cluster Suite or Pacemaker, etc. See also: serverfault.com/questions/686878/… – Acre 11/3, 2017 at 18:13

But how is it possible that 2 load balancers share the same Virtual IP if they are in different datacenters with different networks. If I am not mistaken keepalived use VRRP protocol which works only if all the nodes are in the same network. – Serinaserine 26/5, 2022 at 8:31

@ArtashesKhachatryan there are a lot of ways. Keepalived use VRRP, which requires multicast and therefore should ideally confined to a single network. You could also do this with routing protocols like OSPF. – Orland 8/10, 2022 at 12:11

Sure it is when operated alone. Usual highly available setup includes 2 or more load balancers running in cluster in either active/active or active/passive configuration. To further increase the availability you can have 2 different Internet Service Providers (or geo distributed datacenters) each running a pair of clustered load balancers. Then you configure DNS A record resolving to 2 distinct public IP addresses which guarantees round-robin processing splitting DNS requests evenly (CloudFlare is very fast and reliable at this). There's also possibility to return IP address of datacenter closest to your originating geo location by using something like PowerDNS dnsdist This is what big players do to make their services highly available.

Please read https://docs.oracle.com/cd/E23824_01/html/821-1453/gkkky.html for more clearity. Actually both load balancer uses same vip(Virtual IP Address. https://techterms.com/definition/vip).

Fullfaced answered 3/2, 2021 at 18:49 Comment(0)

HA architecture is a entire field and multiple books were written on it, so it is hard to answer in a short paragraph.

To sum up the ideal situation, you would be using multiple servers, interconnected to a layer of multiple load balancers. The nodes and LB will be located in a few different data centers, and connected to different network backbone. Ideally the data centers will be located all over the world.

In short, all component will have redundancy, including the load balancers.

For a starting point, see Wikipedia for High Availability Cluster

Jarv answered 30/10, 2011 at 4:3 Comment(3)

But at some point, the single request from the users web browser will have to be split to multiple load balancers. At this point, wouldn't it be a single point of failure? Meaning, how is it possible for a single request to come into multiple load balancers? – Freehold 30/10, 2011 at 4:13

Yes, the user's request will end up in ONE　of the load balancer that is online, and it is possible the LB goes down at precisely the moment it is processing request and losing it. The important thing HA address is that if the user immediately retry he will end up in another LB that is online and be successful, so will the other users of the system. HA is concerned about the whole system being available (all failures transient), rather than any particular request being successful. – Jarv 30/10, 2011 at 4:22

How do you do that? DNS round robin? – Freehold 31/10, 2011 at 4:36

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags