How do I ensure Python "zeros" memory when it is garbage collected?

Asked 23/2, 2015 at 14:36 Answered 2/3, 2015 at 22:12

I'm running into some trouble with memory management related to bytes in Python3.2. In some cases the ob_sval buffer seems to contain memory that I cannot account for.

For a particular secure application I need to be able to ensure that memory is "zeroed" and returned to the OS as soon as possible after it is no longer being used. Since re-compiling Python isn't really an option, I'm writing a module that can be used with LD_PRELOAD to:

Disable memory pooling by replacing PyObject_Malloc with PyMem_Malloc, PyObject_Realloc with PyMem_Realloc, and PyObject_Free with PyMem_Free (e.g.: what you would get if you compiled without WITH_PYMALLOC). I don't really care if the memory is pooled or not, but this seems to be the easiest approach.
Wraps malloc, realloc, and free so as to track how much memory is requested and to memset everything to 0 when it is released.

At a cursory glance, this approach seems to work great:

>>> from ctypes import string_at
>>> from sys import getsizeof
>>> from binascii import hexlify
>>> a = b"Hello, World!"; addr = id(a); size = getsizeof(a)
>>> print(string_at(addr, size))
b'\x01\x00\x00\x00\xd4j\xb2x\r\x00\x00\x00<J\xf6\x0eHello, World!\x00'
>>> del a
>>> print(string_at(addr, size))
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x13\x00'

The errant \x13 at the end is odd but doesn't come from my original value so at first I assumed it was okay. I quickly found examples where things were not so good though:

>>> a = b'Superkaliphragilisticexpialidocious'; addr = id(a); size = getsizeof(a)
>>> print(string_at(addr, size))
b'\x01\x00\x00\x00\xd4j\xb2x#\x00\x00\x00\x9cb;\xc2Superkaliphragilisticexpialidocious\x00'
>>> del s
>>> print(string_at(addr, size))
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00))\n\x00\x00ous\x00'

Here the last three bytes, ous, survived.

So, my question:

What's going on with the leftover bytes for bytes objects, and why don't they get deleted when del is called on them?

I'm guessing that my approach is missing something similar to a realloc, but I can't see what that would be in bytesobject.c.

I've attempted to quantify the number of 'leftover' bytes that remain after garbage collection and it appears to be predictable to some extent.

from collections import defaultdict
from ctypes import string_at
import gc
import os
from sys import getsizeof

def get_random_bytes(length=16):
    return os.urandom(length)

def test_different_bytes_lengths():
    rc = defaultdict(list)
    for ii in range(1, 101):
        while True:
            value = get_random_bytes(ii)
            if b'\x00' not in value:
                break
        check = [b for b in value]
        addr = id(value)
        size = getsizeof(value)
        del value
        gc.collect()
        garbage = string_at(addr, size)[16:-1]
        for jj in range(ii, 0, -1):
            if garbage.endswith(bytes(bytearray(check[-jj:]))):
                # for bytes of length ii, tail of length jj found
                rc[jj].append(ii)
                break
    return {k: len(v) for k, v in rc.items()}, dict(rc)

# The runs all look something like this (there is some variation):
# ({1: 2, 2: 2, 3: 81}, {1: [1, 13], 2: [2, 14], 3: [3, 4, 5, 6, 7, 8, 9, 10, 11, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 83, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]})
# That is:
#  - One byte left over twice (always when the original bytes object was of lengths 1 or 13, the first is likely because of the internal 'characters' list kept by Python)
#  - Two bytes left over twice (always when the original bytes object was of lengths 2 or 14)
#  - Three bytes left over in most other cases (the exact ones varies between runs but never has '12' in it)
# For added fun, if I replace the get_random_bytes call with one that returns an encoded string or random alphanumerics then results change slightly: lengths of 13 and 14 are now fully cleared too. My original test string was 13 bytes of encoded alphanumerics, of course!

Edit 1

I had originally expressed concern about the fact that if the bytes object is used in a function it doesn't get cleaned up at all:

>>> def hello_forever():
...     a = b"Hello, World!"; addr = id(a); size = getsizeof(a)
...     print(string_at(addr, size))
...     del a
...     print(string_at(addr, size))
...     gc.collect()
...     print(string_at(addr, size))
...     return addr, size
...
>>> addr, size = hello_forever()
b'\x02\x00\x00\x00\xd4J0x\r\x00\x00\x00<J\xf6\x0eHello, World!\x00'
b'\x01\x00\x00\x00\xd4J0x\r\x00\x00\x00<J\xf6\x0eHello, World!\x00'
b'\x01\x00\x00\x00\xd4J0x\r\x00\x00\x00<J\xf6\x0eHello, World!\x00'
>>> print(string_at(addr, size))
b'\x01\x00\x00\x00\xd4J0x\r\x00\x00\x00<J\xf6\x0eHello, World!\x00'

It turns out that this is an artificial concern that isn't covered by my requirements. You can see the comments to this question for details, but the problem comes from the way the hello_forever.__code__.co_consts tuple will contain a reference to Hello, World! even after a is deleted from the locals.

In the real code, the "secure" values would be coming from an external source and would never be hard-coded and later deleted like this.

Edit 2

I had also expressed confusion over the behaviour with strings. It has been pointed out that they likely also suffer the same problem as bytes with respect to hard-coding them in functions (e.g.: an artifact of my test code). There are two other risks with them that I have not been able to demonstrate as being a problem but will continue to investigate:

String interning is done by Python at various points to speed up access. This shouldn't be a problem since the interned strings are supposed to be removed when the last reference is lost. If it proves to be a concern it should be possible to replace PyUnicode_InternInPlace so that it doesn't do anything.
Strings and other 'primitive' object types in Python often keep a 'free list' to make it faster to get memory for new objects. If this proves to be a problem, the *_dealloc methods in the Objects/*.c can be replaced.

I had also believed that I was seeing a problem with class instances not getting zeroed correctly, but I now believe that was an error on my part.

Thanks

Much thanks to @Dunes and @Kevin for pointing out the issues that obfuscated my original question. Those issues have been left above in the "edit" sections above for reference.

Eliason answered 23/2, 2015 at 14:36 Comment(9)

Python is probably interning the strings. – Fahland 23/2, 2015 at 14:58

Python is definitely interning the strings here, they are held in the function's list of constants -- hello_forever.__code__.co_consts. – Cremona 23/2, 2015 at 16:31

Have you considered altering the the _Py_Dealloc or Py_DECREF macros to zero the memory after deallocation? As opposed to messing around with memory allocation. – Cremona 23/2, 2015 at 17:33

@Dunes: I wasn't familiar with automatic interning; I'll take another look at those macros to see if I can make them work. At first glance it doesn't look promising since my earlier notes indicate Py_DECREF -> _Py_Dealloc -> tp_dealloc -> object_dealloc -> tp_free -> PyObject_Del -> PyObject_Free -> PyMem_FREE -> free (e.g.: if Py_DECREF where called then the memory should have been zeroed). I may well have missed something along the chain though. – Eliason 23/2, 2015 at 17:52

I kinda of missed the point where you said recompiling wasn't an option. In addition even objects of the same type can have varying sizes, and it seems the easiest and cleanest way to find the size of the object is to intercept calls to malloc. That is, I think your current approach is best. Though I think this would be a lot easier if you could recompile python. – Cremona 23/2, 2015 at 18:32

Turns out that for this specific requirement I'm not concerned about the case where a function has a hard-coded string and deletes it, this was an artificial problem that turned up while debugging the other issues; I'll update the question to reflect this. I am still concerned about the only partially zero'ed bytes and the fact that strings are often not being zero'd at all. @Kevin, are you referring to the same thing as @Dunes, or is there another kind of automatic interning happening? In real life, the actual strings would be coming from an external source (file or TCP), not hard-coded. – Eliason 23/2, 2015 at 19:1

To clarify, recompiling Python is an option, it is just one I'd generally prefer to avoid since it might lead to a hellish experience if someone else ever upgrades us to Python3.4 and misses or misunderstands my patch. That being said, we are version controlled so it isn't something that I must avoid at all costs. – Eliason 23/2, 2015 at 19:14

@TrevorWiley: CPython automatically interns any string literal, which will make testing harder for you but probably won't affect real-world usage. It also interns on calling the .intern() method. Other Python implementations could intern any string at any time, unless you've specifically examined them and confirmed they don't. – Fahland 23/2, 2015 at 19:17

@Kevin, this is CPython. Sounds like adding a replacement for PyUnicode_InternInPlace to my LD_PRELOAD library might be worth exploring. – Eliason 23/2, 2015 at 20:45

It turns out that the problem was an absolutely stupid mistake in my own code that did the memset. I'm going to reach out to @Calyth, who generously added a bounty to this question, before 'accepting' this answer.

In short and simplified, the malloc/free wrapper functions work like this:

Code calls malloc asking for N bytes of memory.
- The wrapper calls the real function but asks for N+sizeof(size_t) bytes.
- It writes N to the beginning of the range and returns an offset pointer.
Code uses the offset pointer, oblivious to the fact that it is attached to a slightly larger chunk of memory than was requested.
Code calls free asking to return the memory and passing in that offset pointer.
- The wrapper looks before the offset pointer to get the originally requested size of memory.
- It calls memset to ensure everything is set to zero (the library is compiled without optimization to prevent the compiler from ignoring the memset).
- Only then does it call the real function.

My mistake was calling the equivalent of memset(actual_pointer, 0, requested_size) instead of memset(actual_pointer, 0, actual_size).

I'm now facing the mind-boggling question of why there weren't always '3' leftover bytes (my unit tests verify that none of my randomly generated bytes objects contain any nulls) and why strings would not also have this problem (does Python over-allocate the size of the string buffer, perhaps). Those, however, are problems for another day.

The upshot of all of this, is that it turns out to be relatively easy to ensure that bytes and strings are set to zero once they are garbage collected! (There are a bunch of caveats about hard-coded strings, free lists, and so forth so anyone else who is trying to do this should read the original question, the comments on the question, and this 'answer' carefully.)

Eliason answered 2/3, 2015 at 22:12 Comment(5)

Out of curiosity, why in the world do you need to zero out your memory before it goes to garbage collection? Is this a security concern? – Tav 2/3, 2015 at 22:14

@KronosS, yes, this is a security precaution for two types of attacks. 1) By ensuring that memory is set to zero before it is returned to the OS, we protect against an application that allocates a bunch of memory and looks through it for things like SSL keys. 2) A common hacking approach is to try to exploit bugs in an application and get it to give a dump of the memory it has held; by avoiding Python's memory pools there is no memory held for longer than absolutely required so the vulnerability to this type of attack is reduced. (Note that gc.collect must be manually called at key points.) – Eliason 3/3, 2015 at 15:19

Probably a stupid comment. But I think most gc-based languages actually never guarantee that calling gc.collect() ensures that the garbage collector will run. Many runtime environments leave it open because they forsee that in the future, perhaps smart gc-scheduling strategies will be founded that will outperform programmer intervention. Perhaps it's not a good idea to leave security to the gc? You could implement an interface that sets bits to null etc. – Fob 10/3, 2015 at 11:49

@CommuSoft, in Python, explicitly calling gc.collect is guaranteed to collect except in some limited scenarios (uncollectable garbage) which we are being careful to avoid. – Eliason 11/3, 2015 at 13:10

@CommuSoft in Python the gc is rarely needed unless you have circular references. The kind of memory people typically care about (bytes, strings, large integers), is usually del'ed immediately on last ref. So del can be suficcient to protect certain things. – Kersey 14/6, 2019 at 20:24

In general you have no such guarantees that memory will be zeroed or even garbage collected in a timely manner. There are heuristics, but if you're worried about security to this extent, it's probably not enough.

What you could do instead is work directly on mutable types such as bytearray and explicitly zero each element:

# Allocate (hopefully without copies)
bytestring = bytearray()
unbuffered_file.readinto(bytestring)

# Do stuff
function(bytestring)

# Zero memory
for i in range(len(bytestring)):
    bytestring[i] = 0

Safely using this will require you to only use methods you know won't make temporary copies, which possibly means rolling your own. This doesn't prevent certain caches messing things up, though.

zdan gives a good suggestion in another question: use a subprocess to do the work and kill it with fire once it's done.

Contraoctave answered 23/2, 2015 at 19:55 Comment(9)

In our case, they are happy to call gc.collect() as soon as they are done with the objects (which will be bytes and strings, but not bytearrays). I've asked about providing a subclass of these types that would use, for instance, ctypes and memset to clear the memory when they are deleted but they won't work because they get passed into third-party Python code that might be making temporary copies. – Eliason 23/2, 2015 at 20:37

@Trevor: If the objects are immutable, copying should just return a reference to the original object. – Fahland 23/2, 2015 at 21:2

@Fahland This is probably partially paranoia, but I believe they are also worried about substrings 'leaking' in this way. – Eliason 23/2, 2015 at 21:5

@Veedrac, the problem with the subprocess approach is that the memory does get returned to the OS unzeroed. However, they may be able to combine a the malloc/realloc/free wrappers with a subprocess. I'll look into that. – Eliason 23/2, 2015 at 21:7

Is there any way to hook into the OS and have it zero freed pages automatically under certain circumstances? Combine that with the subprocess option and you should be 99% of the way there. – Fahland 23/2, 2015 at 21:9

If you're happy to use subclasses then you could always provide the subclass via an extension module so you can provide a tp_dealloc function that zeros the memory. – Cremona 24/2, 2015 at 0:5

@Eliason The good thing about the subprocess idea is that it gives a hard limit on how late the data can get freed, even in the face of unexpected copies, caching, cycles and probably even paging to disk. Guaranteeing these without killing a process is unlikely. At that point you only have to worry about the zeroing step. – Contraoctave 24/2, 2015 at 2:0

@Eliason And if you've got a separate process you could just replace free with something that zeroes for little extra cost. Heck, you could replace malloc with something that allocates from a fixed-size memory pool and zero it all on exit. – Contraoctave 24/2, 2015 at 2:17

I'm going to pursue the standalone process idea further, but due to time constraints I suspect that it will not be accepted unless I can show it is impossible to fix what is happening with the bytes objects in less time than it would to take re-arch. – Eliason 25/2, 2015 at 15:31

In short and simplified, the malloc/free wrapper functions work like this:

Code calls malloc asking for N bytes of memory.
- The wrapper calls the real function but asks for N+sizeof(size_t) bytes.
- It writes N to the beginning of the range and returns an offset pointer.
Code uses the offset pointer, oblivious to the fact that it is attached to a slightly larger chunk of memory than was requested.
Code calls free asking to return the memory and passing in that offset pointer.
- The wrapper looks before the offset pointer to get the originally requested size of memory.
- It calls memset to ensure everything is set to zero (the library is compiled without optimization to prevent the compiler from ignoring the memset).
- Only then does it call the real function.

My mistake was calling the equivalent of memset(actual_pointer, 0, requested_size) instead of memset(actual_pointer, 0, actual_size).

Eliason answered 2/3, 2015 at 22:12 Comment(5)

Out of curiosity, why in the world do you need to zero out your memory before it goes to garbage collection? Is this a security concern? – Tav 2/3, 2015 at 22:14

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags