Load Balancing in Amazon EC2?
Asked Answered
E

5

47

We've been fighting with HAProxy for a few days now in Amazon EC2; the experience has so far been great, but we're stuck on squeezing more performance out of the software load balancer. We're not exactly Linux networking whizzes (we're a .NET shop, normally), but we've so far held our own, attempting to set proper ulimits, inspecting kernel messages and tcpdumps for any irregularities. So far though, we've reached a plateau of about 1,700 requests/sec, at which point client timeouts abound (we've been using and tweaking httperf for this purpose). A coworker and I were listening to the most recent Stack Overflow podcast, in which the Reddit founders note that their entire site runs off one HAProxy node, and that it so far hasn't become a bottleneck. Ack! Either there's somehow not seeing that many concurrent requests, we're doing something horribly wrong, or the shared nature of EC2 is limiting the network stack of the Ec2 instance (we're using a large instance type). Considering the fact that both Joel and the Reddit founders agree that network will likely be the limiting factor, is it possible that's the limitation we're seeing?

Any thoughts are greatly appreciated!

Edit It looks like the actual issue was not, in fact, with the load balancer node! The culprit was actually the nodes running httperf, in this instance. As httperf builds and tears down a socket for each request, it spends a good amount of CPU time in the kernel. As we bumped the request rate higher, the TCP FIN TTL (being 60s by default) was keeping sockets around too long, and the ip_local_port_range's default was too low for this usage scenario. Basically, after a few minutes of the client (httperf) node constantly creating and destroying new sockets, the number of unused ports ran out, and subsequent 'requests' errored-out at this stage, yielding low request/sec numbers and a large amount of errors.

We also had looked at nginx, but We've been working with RighScale, and they've got drop-in scripts for HAProxy. Oh, and we've got too tight a deadline [of course] to switch out components unless it proves absolutely necessary. Mercifully, being on AWS allows us to test out another setup using nginx in parallel (if warranted), and make the switch overnight later on.

This page describes each of the sysctl variables fairly well (ip_local_port_range and tcp_fin_timeout were tuned, in this case).

Estrellaestrellita answered 4/11, 2008 at 0:10 Comment(3)
Marc, you should write up your experiences with configuring this stuff, and post them somewhere (does your company have a blog?). Sounds like it could be useful to a lot of people. Upvoted your question.Neddra
Your link is broken.Carrillo
@Carrillo thanks! Just updated it. I went digging around for a newer, more up-to-date source, it looks like the original site still has a pretty high PageRank, and the content's still decent, so I'm just correcting it to reflect the new URL.Estrellaestrellita
S
9

Not really an answer to your question, but nginx and pound both have good reputations as load-balancers. Wordpress just switched to nginx with good results.

But more specifically, to debug your problem. If you aren't seeing 100% cpu usage (including I/O wait), then you are network bound, yes. EC2 internally uses a gigabit network, try using an XL instance, so you have the underlying hardware to yourself, and don't have to share that gigabit network port.

Saporific answered 4/11, 2008 at 15:29 Comment(0)
C
20

Not answering the question directly, but EC2 now supports load balancing through Elastic Load Balancing rather than running your own load balancer in an EC2 instance.

EDIT: Amazon's Route 53 DNS service now offers a way to point a top-level domain at an ELB with an "alias" record. Since Amazon knows the current IP address of the ELB, it can return an A record for that current IP rather than having to use a CNAME record, while still being free to change the IP from time to time.

Chary answered 18/5, 2009 at 12:32 Comment(4)
Thanks for the heads up, I'm actually evaluating this at the moment. Cool stuff abounds (though the command-line tools leave a bit to be desired)!Estrellaestrellita
Unfortunately, the AWS load balancing (ELB) solution has a major flaw. It is designed to use CNAMES which prevent users from pointing a top-level domain directly at the load balancer. In other words you can point www.mydomain.com to ELB but not mydomain.com. For many that's a showstopper.Blastoff
Couldn't you redirect all calls to your website so that the www. would be typed in?Lucid
Yes, you could point mydomain.com at a server which just issues redirects to www.mydomain.com, which is then load balanaced. It's not quite the same as being able to point mydomain.com straight at the load balancer though.Chary
S
9

Not really an answer to your question, but nginx and pound both have good reputations as load-balancers. Wordpress just switched to nginx with good results.

But more specifically, to debug your problem. If you aren't seeing 100% cpu usage (including I/O wait), then you are network bound, yes. EC2 internally uses a gigabit network, try using an XL instance, so you have the underlying hardware to yourself, and don't have to share that gigabit network port.

Saporific answered 4/11, 2008 at 15:29 Comment(0)
S
3

Yes, You could use an off-site load balancer.. and on bare metal LVS is a great choice, but your latency will be awful! Rumour has it that Amazon is going to fix the CNAME issue. However they are unlikely to add https, indepth or custom health checks, feedback agents, url matching, cookie insertion (and some people with good architecture would say quite right too.) However thats why Scalr, RightScale and others are using HAProxy usually two of them behind a round robin DNS entry. Here at Loadbalancer.org we are just about to launch our own EC2 load balancing appaliance: http://blog.loadbalancer.org/ec2-load-balancer-appliance-rocks-and-its-free-for-now-anyway/ We are planning on using SSH scripts to intergrate with autoscaling in the same way rightscale does, any comments appreciated on the blog. Thanks

Snakebird answered 2/10, 2010 at 21:44 Comment(0)
C
1

I would look at switching to a off-site load balancer, not in the cloud and run something like IPVS on top of it. [The reason why it would be off of amazon's cloud is because of kernel stuff] If Amazon doesn't limit the source IP of packets coming out of the you could go with a unidirectional load balancing mechanism. We do something like this, and it gets us about 800,000 simultaneous requests [though we don't deal with latency]. I also would say use "ab2" (apache bench), as it is a little more user friendly, and easier to use in my humble opinion.

Carline answered 5/11, 2008 at 17:26 Comment(1)
You know you wrote your entire message in bold? It's quite hard to read.Willodeanwilloughby
L
0

Even though your issue solved. KEMP Technologies now have a fully blown load balancer for AWS. Might save you some hassle.

Logbook answered 17/11, 2014 at 21:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.