Can Haskell or OCAML handle sensitive data without leaking via garbage collection?
Asked Answered
S

2

15

I would do something like this (pseudo code):

1. load sensitive encrypted data from file
2. decrypt the data
3. do something with the unencrypted data
4. override the data safely / securely (for example with random data)

The time that the sensitive data lies plain (unencrypted) in memory should be as short as possible.

The plain data must not be leaked in any way.

A. Can such a program be written in Haskell or OCAML?

B. Can it be prevented that the data gets leaked, i.e. by being silently copied in the background by the garbage collector?

C. Can the plain data be properly overridden in memory?

As far as I know garbage collectors (GCs) can make copies of data silently in the background. I guess that is done by generational GC algorithms, but I don't know for sure.

I know that it still would be possible for an attacker to get the plain data if the attacker manages to get the memory of the program at the right time / state. I just consider to do that to raise security because I do not have the context (i.e. OS, swapping etc.) under control.

Significs answered 2/6, 2020 at 9:17 Comment(5)
I find this an interesting question, but at least as far as GHC is concerned I'm pretty sure you can't get any guarantees of that kind. Generally, the tendency for memory leaks is Haskell's single biggest weakness as of today, in my opinion.Jibber
I believe GHC supports "pinned" data that the garbage collector isn't allowed to move around. (It's for interop with external C libraries and the like.) There's a lot of manual memory management involved, but it seems like it might do what you're after.Mirandamire
There is already a data type like this called ScrubbedBytes, which is implemented in memory package and is used precisely for this purpose by cryptonite library: stackage.org/haddock/nightly-2020-06-01/memory-0.15.0/… It is allocated as pinned, so it doesn't move and memory is cleaned before being garbage collected.Forewoman
@Forewoman very intriguing; why don't you make this answer?Jibber
@Jibber In a process of writing it up ;)Forewoman
F
11

I already mentioned this in a comment, but I think it is a really good question and deserves an answer.

There is already a data type ScrubbedBytes that has the following properties:

  • It is allocated with newAlignedPinnedByteArray#, which means while the newly allocated MutableByteArray# is referenced anywhere in your code it will not be GCed, but it will also not going to get moved or copied around.
  • Upon allocation a weak pointer is created with mkWeak# and a finalizer gets added to the newly allocated memory. This means that whenever scrubbed bytes are no longer referenced in your code and before GC deallocates the memory a scrubber will get invoked that will write zeros into the memory.
  • Equality will not short circuit, thus guarding against timing attacks.

There is one small gotcha to this scrubber. There is a small chance that it will not get executed, in particular if a program exits right before the GC should cleanup the memory. (See more info on weak pointers.) Therefore, I would recommend implementing it using bracket pattern. Here is how you can get it done with primitive package:

import Control.Exception
import Control.Monad.Primitive (RealWorld)
import qualified Data.Primitive.ByteArray as BA

withScrubbedMutableByteArray ::
     Int -- ^ Number of bytes
  -> (BA.MutableByteArray RealWorld -> IO a)
  -- ^ Action to execute
  -> IO a
withScrubbedMutableByteArray c f = do
  mba <- BA.newPinnedByteArray c
  f mba `finally` BA.setByteArray mba 0 c (0 :: Word8)

Reason why using finally is safer is because you will have stronger guarantees that the memory will be zeroed out. For example user hitting Ctrl-C in a correct setup will not prevent scrubber from running.

Forewoman answered 2/6, 2020 at 16:12 Comment(6)
Can't you use finalize in your finally? That will allow the memory to be scrubbed and freed early, but ensure that it happens before you exit the block.Renata
@Renata In case of ScrubbedBytes I don't think it is possible because there is no way to access the created Weak pointer (it is simply discarded). In general though it would be possible, but there is no point, because finally will ensure that scrubbing will happen and there is no longer a need to rely on a weak pointerForewoman
The advantage of the weak pointer (as long as you can grab hold of it—sounds like that API could use a tweak) is that it can scrub and deallocate early if the code drops the reference before the finally block completes.Renata
That link for weak references is broken. Looks like a multiple-paste incident.Erethism
@Erethism thanx, fixed. @Renata yes early scrubbing could be an advantage. I am not an avid user of memory, so maybe someone can propose an improvement to the API ;)Forewoman
I just opened an issue for that.Renata
P
5

In OCaml it can be easily done using Bigarrays which are not governed by GC, never copied, and never examined by it. You can use Unix.map_file to load it and ocaml-struct to handle the loaded data nicely (if it is structured). OCaml is used extensively for writing low-level security-related code, see the mirage project (it has tons of cryptographic-related libraries), ocaml-tls a pure implementation of the TLS protocol in OCaml, and Project Everest which uses OCaml as the target language.

When decrypting/encrypting and otherwise processing the secret data you should be careful and do not put it in a boxed type, including strings and int64 integers. If you will take a look at mirage-crypto you will find out that all algorithms are implemented using integers only, which are represented as immediates and are never touched by GC.

Povertystricken answered 4/6, 2020 at 21:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.