Terminating a high volume of SSL connections cost effectively
Asked Answered
D

4

14

I have recently set up a Node.js based web socket server that has been tested to handle around 2,000 new connection requests per second on a small EC2 instance (m1.small). Considering the cost of a m1.small instance, and the ability to put multiple instances behind a WebSocket capable proxy server such as HAProxy, we are very happy with the results.

However, we realised we had not done any testing using SSL yet, so looked into a number of SSL options. It became apparent that terminating SSL connections at the proxy server is ideal because then the proxy server can inspect the traffic and insert headers such as X-Forward-For so that the server knows which IP the request came from.

So I looked into a number of solutions such as Pound, stunnel and stud, all of which allowed incoming connections on 443 to be terminated, and then passed onto HAProxy on port 80, which in turn passes the connection onto the web servers. Unfortunately however, I found that sending traffic to the SSL termination proxy server on a c1.medium (High CPU) instance very quickly consumed all CPU resources, and only at a rate of 50 or so requests per second. I tried using all three of the solution listed above, and all of them performed roughly the same as I assume under the hood they all rely on OpenSSL anyway. I tried using a 64 bit very large High CPU instance (c1.xlarge) and found that performance only scale linearly with cost. So based on EC2 pricing, I'd need to pay roughly $600p/m for 200 SSL requests per second, as opposed to $60p/m for 2,000 non SSL requests per second. The former price becomes economically unviable very quickly when we start planning to accept 1,000s or 10,000s of requests per second.

I also tried terminating the SSL using Node.js' https server, and the performance was very similar to Pound, stunnel and stud, so no clear advantage to that approach.

So what I am hoping someone can help with is advising how I can get around this ridiculous cost we have to absorb to provide SSL connections. I have heard that SSL hardware accelerators provide much better performance as the hardware is designed for SSL encryption and decryption, but as we are currently using Amazon EC2 for all of our servers, using SSL hardware accelerators is not an option unless we have a separate data centre with physical servers. I am just struggling to see how the likes of Amazon, Google, Facebook can provide all their traffic over SSL when the cost of this is so high. There must be a better solution out there.

Any advice or ideas would be greatly appreciated.

Thanks Matt

Dafodil answered 30/1, 2012 at 16:43 Comment(14)
The word "terminating" is at least confusing in your context. I spent a minute or so trying to understand why you want to terminate SSL connections and why not just close a socket.Debbi
Too bad elb doesn't do web sockets! Have you tried restricting the set of ciphers that can be used to the ones that are computationally cheap?Undeviating
Have you tried using the Amazon elb with SSL to handle it? I use that for a couple of SaaS that I run. Works fine. Don't have the 2000 conn/sec requirement so don't know if it will doProtolanguage
Can you provide the command you are using for testing/benchmarking the server? Where are your test clients, also in the Amazon cloud? Have you tried testing from multiple clients at the same time?Tigges
I can't answer all in one comment, so one comment for each question... I used "terminating" because that is the terminology most load balancers use when talking about SSL, see the following links, aws.amazon.com/elasticloadbalancing, rackspace.com/cloud/cloud_hosting_products/loadbalancers/…, www.snapt-ui.com/haproxy/snapt-haproxy-ssl-termination-released/. Sorry if it was confusing.Gasparo
Amazon ELB to handle SSL - unfortunately ELB does not support web sockets. I have read this in various forums (forums.aws.amazon.com/thread.jspa?threadID=84606) and have also tested it myself, and it definitely does not work as of a few days ago anyway.Gasparo
@JamesLittle, I have tried two ways of testing/benchmarking the server. One simply using Apache bench and not upgrading the HTTP connection to a websocket connection, the overhead is almost identical to a websocket so is a good test. I have also developed a simple websocket client that opens 1,000 connections and times how long this takes. I have run this from the same machine and other machines with similar results.Gasparo
Does ELB work if you have it configured as just a plain SSL load balancer (i.e. not as an http one) ?Undeviating
@FrederickCheung unfortunately that option does not exist. ELB only asks you which port you wish to use.Gasparo
You might want to try this over on ServerFault.com - it's a lot better suited to that site than this one. The ostensible answer to your questions is Elastic Load Balancer........ that's supposed to be the mechanism that handles the sort of SSL termination you're looking for. But I see it's not working right with sockets ATM.Loosejointed
@Marcus_33, good idea, I will post on ServerFault.com nowGasparo
@MatthewO'Riordan where did this question land? What did you do?Siffre
@Jonesome, the solution we ended up with is ELB, most cost effective solution for us as we a) didn't have to manage our own auto-scaling and monitoring infrastructure, b) it did 90% of what we needed out of the box leaving us to focus on other important areas of our system.Gasparo
Nothing confusing about using the term SSL Termination. It's synonymous with SSL Offloading.Oira
S
6

I do not know much about the CPU power available on different EC2 instances, but I assume your problem lies not with your choice of TLS-terminating proxy software, but with their configuration. Without any configuration, I'm assuming all of them would offer all cipher suites they support, including (very) slow ones. And they'll probably let the client pick the one it likes best, too.

Not all TLS cipher suites are born equal, some have higher CPU costs than others, be it from the key exchange or the cipher itself. Depending on the software used, there should be a way to specify a string of ciphers the server accepts (and also a way to make the server insist on that). For OpenSSL these work this way: http://www.openssl.org/docs/apps/ciphers.html#CIPHER_STRINGS

If you're going for speed, at least make sure you're not using ciphers that employ Diffie-Hellman (the non-elliptic-curve kind) key-exchanges. To disable cipher suites using DH key exchange, make sure the string includes !DH at some point. You can test what string results in which ciphers being available with, for example, openssl ciphers -v 'HIGH:!aNULL:!DH:!ECDH'.

This string disables both normal Diffie-Hellman as well as Elliptic Curve Diffie-Hellman key exchanges. This probably only leaves RSA key exchange, depending on your OpenSSL version.

Regarding ciphers, you should probably test on your intended EC2 hardware. Without hardware acceleration, you should probably prefer RC4 over AES128 over AES256 over anything else, at least according to this benchmark.

I also suggest reading this wonderful post, especially the enlightening first diagram showing the impact of DH on TLS handshake performance.

Lastly, make sure you're using TLS session caching. That saves some CPU, too.

Spae answered 20/6, 2012 at 23:25 Comment(0)
S
1

I just realized Amazon's Elastic Load Balancer is super slow for SSL Termination... I did a simple test on www.blitz.io (no relation, just a customer) with 1 to 250 concurrent connections over 1 minute. It failed horribly... But if I do TCP 443 on front end of ELB and TCP 443 on backend with no certificate, it wipes out a small instance's CPU when running IIS and an SSL cert on that instance. I need just handshakes, it's a simple web service serving clients from all over the place. New connection setup and teardown every time.

How can I design a high traffic SSL web service, preferably with SSL all the way to the backend for strict security compliance?

Swartz answered 19/5, 2013 at 22:55 Comment(0)
N
0

The performance of Node.js' https server is very similar to Pound, stunnel and stud,and there is no clear advantage to that approach.

Negatron answered 6/2, 2012 at 9:59 Comment(1)
Well arguably there is then. Assuming Node's HTTPS performance is similar, then you would argue why you should use Pound / Stunnel / Stud in front of Node.js as it simply adds another bottleneck and component into the system.Gasparo
R
0

I'm also wondering how to do this effectively. AWS ssl termination is dreadfully slow, but perhaps there is some way to improve its performance. Stud seemed promising but like you mentioned, also has a large cpu cost.

Rabb answered 27/2, 2012 at 18:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.