I'm reading the documentation for Memory Management in Python C extensions, and as far as I can tell, there doesn't really seem to be much reason to use malloc
rather than PyMem_Malloc
. Say I want to allocate an array that isn't to be exposed to Python source code and will be stored in an object that will be garbage collected. Is there any reason to use malloc
?
EDIT: Mixed PyMem_Malloc
and PyObject_Malloc
corrections; they are two different calls.
Without the PYMALLOC_DEBUG
macro activated, PyMem_Malloc
is an alias of libc's malloc()
, having one special case: calling PyMem_Malloc
to allocate zero bytes will return a non-NULL pointer, while malloc(zero_bytes) might return a NULL value or raise a system error (source code reference):
/* malloc. Note that nbytes==0 tries to return a non-NULL pointer, distinct
- from all other currently live pointers. This may not be possible. */
Also, there is an advisory note on the pymem.h
header file:
Never mix calls to PyMem_ with calls to the platform malloc/realloc/ calloc/free. For example, on Windows different DLLs may end up using different heaps, and if you use PyMem_Malloc you'll get the memory from the heap used by the Python DLL; it could be a disaster if you free()'ed that directly in your own extension. Using PyMem_Free instead ensures Python can return the memory to the proper heap. As another example, in PYMALLOC_DEBUG mode, Python wraps all calls to all PyMem_ and PyObject_ memory functions in special debugging wrappers that add additional debugging info to dynamic memory blocks. The system routines have no idea what to do with that stuff, and the Python wrappers have no idea what to do with raw blocks obtained directly by the system routines then.
Then, there are some Python specific tunings inside PyMem_Malloc
PyObject_Malloc
, a function used not only for C extensions but for all the dynamic allocations while running a Python program, like 100*234
, str(100)
or 10 + 4j
:
>>> id(10 + 4j)
139721697591440
>>> id(10 + 4j)
139721697591504
>>> id(10 + 4j)
139721697591440
The previous complex()
instances are small objects allocated on a dedicated pool.
Small objects (<256 bytes) allocation with PyMem_Malloc
PyObject_Malloc
is quite efficient since it's done from a pool 8 bytes aligned blocks, existing one pool for each block size. There are also Pages and Arenas blocks for bigger allocations.
This comment on the source code explains how the PyObject_Malloc
call is optimized:
/*
* The basic blocks are ordered by decreasing execution frequency,
* which minimizes the number of jumps in the most common cases,
* improves branching prediction and instruction scheduling (small
* block allocations typically result in a couple of instructions).
* Unless the optimizer reorders everything, being too smart...
*/
Pools, Pages and Arenas are optimizations intended to reduce external memory fragmentation of long running Python programs.
Check out the source code for the full detailed documentation on Python's memory internals.
It's perfectly OK for extensions to allocate memory with malloc, or other system allocators. That's normal and inevitable for many types of modules--most modules that wrap other libraries, which themselves know nothing about Python, will cause native allocations when they happen within that library. (Some libraries allow you to control allocation enough to prevent this; most do not.)
There's a serious drawback to using PyMem_Malloc: you need to hold the GIL when using it. Native libraries often want to release the GIL when doing CPU-intensive calculations or making any calls that might block, like I/O. Needing to lock the GIL before allocations can be somewhere between very inconvenient and a performance problem.
Using Python's wrappers for memory allocation allows Python's memory debugging code to be used. With tools like Valgrind I doubt the real-world value of that, however.
You'll need to use these functions if an API requires it; for example, if an API is passed a pointer that must be allocated with these functions, so it can be freed with them. Barring an explicit reason like that for using them, I stick with normal allocation.
From my experience writing MATLAB .mex functions, I think the biggest determining factor in whether you use malloc or not is portability. Say you have a header file that performs a load of useful functions using internal c data types only (no necessary Python object interaction, so no problem using malloc), and you suddenly realise you want to port that header file to a different codebase that has nothing to do with Python whatsoever (maybe it's a project written purely in C), using malloc would obviously be a much more portable solution.
But for your code that is purely a Python extension, my initial reaction would be to expect the native c function to perform faster. I have no evidence to back this up :)
© 2022 - 2024 — McMap. All rights reserved.