OpenSSL SSL_write from multiple buffers / SSL_writev
Asked Answered
A

2

6

I've written a networking server that uses OpenSSL for SSL/TLS. The server sends and receives large blocks of data and performs various transformations in between. For performance reasons, transformations are done mainly using vector information (see iovec from POSIX) that avoids expensive memory moves (memcpy() etc.). When data are ready to be sent, I use writev() POSIX function that gathers data from the memory using these vectors and it sends that usually as one network packet.

Now with OpenSSL, it is not entirely possible because OpenSSL offers only SSL_write() function as far as I know. That means I have to call this function for every vector entry I want to send. It, unfortunately, causes that every vectored chunk of data is transmitted in its own SSL frame, and that introduces unwanted and unnecessary network overhead.

My question is: Is there SSL_writev() equivalent of writev()? Or in general, is there a technique how I can tell to OpenSSL to stash SSL_write() data into a one SSL application record (type 22) without sending it (and then of course some kind of flush() function)?

Edit: As discussed below, a viable approach is to consolidate vectored data into a big chunk prior a final single SSL_write() call. There is however connected overhead with 2 copies (1st during consolidation, 2nd when SSL_write() performs AES encryption). Theoretical SSL_writev() call doesn't introduce this overhead.

Arborvitae answered 5/7, 2016 at 8:28 Comment(12)
I think you will need a "pull up" function. I.e., one that combines multiple buffers into one.Longshore
That's exactly what I do today. But it is quite expensive b/c it moves a large chunks of data in the memory.Arborvitae
It seems like the large chunks amortize the cost of TCP overhead. Maybe you should allow the separate writes for large data, and only "pull up" smaller ones. Also, if you compile with -march=native and -O3, then you should get the SSE4 and AVX versions of memcpy and memmove on modern hardware. They are lightning fast because they move 16, 32 and 64 bytes at a time.Longshore
@ateska But it is quite expensive b/c it moves a large chunks of data in the memory. On Linux, writev() is actually implemented as just a wrapper around write() that allocates a temporary buffer, copies the writev() buffers into the temp buffer, then calls write(). If you're running on Linux and writev() is working for you without SSL, just write your own SSL_writev() wrapper.Vocalist
@AndrewHenle - good point with writev() implementation. I was hoping that it uses scatter/gather kernel feature.Arborvitae
@Longshore - agree, an efficient copying is important. Yet, it still means that SSL version will do 2 copies: 1st is "pull up", 2nd is AES (or similar) encryption during SSL write. Both can be indeed 'accelerated' by SSE4/AVX and AES-NI respectively. I'm looking for consolidating that into a 1 copy, that is an original idea behind SSL_writev().Arborvitae
@ateska - the "SSL version will do 2 copies: 1st is "pull up", 2nd is AES (or similar) encryption..." is a slightly different requirement. You should edit you question and add that information. Its a good requirement and question, but I did not pay it any mind due to the wording of the current question (which I thought was closer to the scatter/gather you mentioned).Longshore
@AndrewHenle fortunately this is not true, or the performance would suffer. Here is linux's writev() implementation, it solely uses iovecs and low-level iterators : git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/…Pound
@WillyTarreau It doesn't matter what the actual kernel system call may or may not do if the user-space implementation translates the program's writev() function call into the write() system call. And how do you know the "performance would suffer"? The writev() kernel must make multiple copies from user space to kernel space - one for each of the memory areas copied. The write() kernel code only has to copy from user space to kernel space once - and if the user is using direct IO, even that can be skipped. It's not clear performance would suffer at all.Vocalist
@AndrewHenle please read the code I pointed, you will see that it does not make multiple copies but iterates over vectors.Pound
One, I looked at the kernel code you linked. Again, that's irrelevant if the user-space program translates the writev() function to the write() system call - just like glibc does.. Not only that, "iterating over a vector" is implemented by doing multiple copies of data. You need to dig deeper into the Linux kernel. Stopping at do_writev() isn't enough. You need to go look at vfs_writev() and then do_iter_write(). And see that the code does multiple copies.Vocalist
@AndrewHenle great to read this. I'm not actually sure how relevant OP's copies even are in the first place. you have to decrypt, process and reencrypt all that data.. I imagine this one move is likely negligible, especially if you take advantage of SSE instructions (and that's without assuming he's breaking into kernel often for other stuff like memory allocation). is this something you'd agree with? I know x86 has AES instructions now but even still, we're talking about a 5% performance increase at best here no? plus even if he's sending jumbo frames we're talking ~9kb max, it's not much.Pfeifer
P
1

My strategy would be to use SSL for key exchange, close the SSL session, and encrypt+send the stuff myself on the regular socket. You could still do all you had been doing before including SSL, but then also get to choose exactly how you'd be encrypting your data.

If you have the benefit of reasonably recent hardware, you have x86's AES instructions. So I personally would try my hand at some inlining and manual loop folding to see how crazy fast I could make it.

Security warning: While OpenSSL and AES (or any other industry standard cryptography libraries/algorithms you may use) by themselves are pretty damn secure, doing this requires extreme attention and a good amount of knowledge. Using AES has many pitfalls, and cryptography is a generally hard subject with lots hidden dangers. Many people would recommend you don't do this at all, I personally recommend trying to at least assess where you are on the dunning krugger curve and gaining a full understanding of how OpenSSL does it and why.


Implementation Notes

These are pointers, they only go as far as what I know. The above warning should be taken very seriously- even if you have a cryptographer with a 20-year career by your side, consider the fact that 1. plenty of people do the same thing the wrong way their entire career, 2. even very smart people fall prey to considering only the scope they operate in (e.g. it's hard to consider side-channel attacks if you're just the guy working the math and aren't particularly well versed in computer architecture) and 3. even if they're einstein-level smart and well-versed in all adjacent subjects, remember einstein denied the existence of quantum mechanics. We're all humans and no human is perfect.

Key generation is quite possibly the most important thing here. If you do it wrong and someone reverse engineers your client, they could gain the ability to guess the keys of any communication session.
Obviously this means you shouldn't hardcode or reuse keys, but it also means you should be very careful about how you generate them in the first place.
If you use a pseudo-random number generator and you fail to seed it properly - either seeding it with a constant or seeding it with "current time" (as is often done in example code) - an attacker will be able to listen in on everything.
I would strongly advise using OpenSSL's key generation (and making damn sure you use it properly).
If you want to try a hand at DIYing it for fun, just want to learn a bit about it, or are in an environment where there is no current implementation, here is OpenSSL's "Random Numbers" wiki page. Often there is an instruction/peripheral that derives random numbers from hardware entropy sources. True randomness is not a trivial matter - here is tom scott's video on cloudflare's physical random generator, which begins to illustrate the difficulty and importance of the problem.

Now let's say you properly implemented your key generation, and you transferred the key through the SSL channel. And let's say there are no inherent security risks in closing that channel and maintaining communication (can't tell you with certainty there aren't).

Now you're doing AES stream encryption. The stream part is very very important because if you just encrypt every packet with the same key in the same way, then any data that's predictible across packets can be used to help break your encryption.
This is why OpenSSL uses specific modes of operation for AES that are determined to be secure. They're an added layer on top of the encryption itself which in the case of stream ciphers mitigates the aforementioned vulnerability.
In the case of TLS1.3, ciphersuites use either 'Galois/counter mode' (GCM) or 'counter with CBC-MAC' (CCM) variants for AES, which means that's likely what you should be trying to go with as well (there's also chacha20 but then you don't get hardware acceleration).

Additionally you have to make sure things like packet size don't inherently give away important information.

Lastly the implementation code itself has to be bullet proof. This goes for every level of your application, but remember that you're implementing a protocol. If your logic is funky or you mismanage a buffer, it could compromise everything from your private keys down to giving the attacker root access to your entire system and network - and them using it to ruin your nuke-grade uranium refinement centrifuges (see stuxnet). Something like this even happened with OpenSSL itself (see heartbleed).

Pfeifer answered 30/12, 2023 at 5:28 Comment(0)
D
-1

You can use a BIO_f_buffer() to achieve this. Wrap your network layer BIO in a BIO_f_buffer() filter BIO and set that as the write BIO for your SSL object. This will cause all data written out to stay in the buffer until you issue a flush on it.

Demmer answered 5/7, 2016 at 19:18 Comment(7)
BTW, you may want to wait to do this until after the handshake has completed - otherwise you will have to manually issue "flush" commands for each flight of handshake messages that are exchanged.Demmer
Agree, but it will still create an SSL application record (type 23) for each call of SSL_write. That means an unwanted network overhead that I want to avoid.Arborvitae
Yes it will - although from your description it sounded like your main concern was to avoid multiple network packets. That is different to a TLS record. Multiple records can be contained within a single TCP packet, or split up across many. By buffering and flushing in the way I propose that gives the network layer the best opportunity to transmit the data across the network in the most efficient way possible. The overhead is then limited to the additional record header bytes (5 bytes) plus the MAC size (depends on the ciphersuite). If your aim is to reduce the number of records then...Demmer
...you can do the same thing in reverse, i.e. put a BIO_f_buffer() in front of an SSL BIO and flush it through to the SSL layer when you are ready.Demmer
The general problem I have with this answer is that there is an unnecessary sacrifice in either the network overhead or a mem-copy operation. My goal is to avoid both (because I have a working solution on this level already). From an architectural point of view, it is quite straightforward: ...Arborvitae
I have a unencrypted buffer in a scatter/gather form because of the app-specific data transformations. I can use symmetric encryption function (e.g. AES) to 'go around' that scatter/gather buffer (see writev function), do its job and put an output to the other buffer, in one go, into one SSL record (type 23). No memcpy, no network overhead. To my current knowledge, no existing SSL implementation offers this feature.Arborvitae
There's Kernel TLS support now, starting from 4.13 which allows to literally use writev and seamlessly encrypt outgoing traffic with certain limitations. Details here: lwn.net/Articles/666509Salmi

© 2022 - 2025 — McMap. All rights reserved.