Full Page Malloc
Asked Answered
A

5

6

I am trying to optimize the memory allocation of my program by using entire pages at a time.

I am grabbing the page size like this: sysconf(_SC_PAGESIZE); then calculating the total number of elements that will fit in a page like this: elements=pageSize/sizeof(Node);

I was thinking that when I actually go to malloc my memory I would use malloc(elements*sizeof(Node)); It seems like the multiplication and division of sifeof(Node) would cancel out, but with integer division, I do not believe that that is the case.

Is this the best way to malloc an entire page at a time?

Thanks

Ailssa answered 21/2, 2013 at 15:45 Comment(11)
Just use mmap/VirtualAlloc...Ar
why don't you just malloc the page size in one go and then fit as many nodes as you can on to it?Retort
You won't like the answer to your elements-per-page equation when/if your node size ever exceeds the page size, btw.Yvonneyvonner
Why do you want to malloc exactly the page size? This is not efficient.Colophon
@R.: What is your basis for the assertion that this is not efficient? Perhaps you are thinking that this is extra work in the program that is not necessary. However, if the program is already allocating space from time to time and is flexible regarding the amount allocated, then there is little or no extra cost within the program for choosing one size versus another. The question then becomes whether malloc is more efficient at some sizes. Is there reason to believe various malloc implementations do not have preferred sizes?Rodmur
There's no underlying reason to expect exact page-size allocations to be more efficient than other sizes, and they're likely to be less-efficient for various reasons. If malloc performs a new mmap to service the request, it will need to allocate at least 2 pages to give the caller one page, since without adjacent bookkeeping information, there's no way to make free efficient (O(1) time). That excess would not be available to the application and would be wasted. In practice, malloc will serve such small allocations from the heap...Colophon
...in which case the memory won't be wasted, but the malloc will touch 2 pages updating bookkeeping information even though you're just going to be using one, which will possibly cause an extra page fault if the second one is not yet backed or if it's been swapped out. In any case, like I said, I see no reason exact page-size allocation would be "better", and lots of reasons it could be worse. I would choose a side based on your program's needs, not based on wrong assumptions about hardware or library implementation behavior.Colophon
I want to do something similar to this. Why? Because I'm writing an allocator for a collection (e.g. - a dictionary) and I want the items in my collection to have locality with one another, rather than being spread all over memory like they would be if I just malloc'ed on demand in response to user requests. To @R..'s point, I'd like to know the size to call malloc with such that it all fits into a single page of memory, including malloc's bookkeeping overhead, without waste. It looks like a combination of an aligned alloc and figuring out malloc's overhead might give me what I want.Vaughnvaught
sysconf( _SC_PAGESIZE ) is insufficient. There can be more than one page size. Both Solaris and Linux support something like int getpagesizes(size_t pagesize[ ], int nelem);. NB the plural.Darreldarrell
@AndrewHenle Why do you say it is insufficient? Are you saying that sysconf( _SC_PAGESIZE ) will return an incorrect answer (i.e. - it won't be returned as a result from your unspecified syscall)?Vaughnvaught
@Vaughnvaught Just about everything posted here assumes "THE system page size". There is no singular page size. Systems support multiple page sizes. sysconf( _SC_PAGESIZE ) will only return one of those values.Darreldarrell
M
9

The malloc function doesn't have any concept of pagesize. Unless you are allocating pages that are ALSO aligned to a page-boundary, you will not get ANY benefit from calling malloc in this way. Just malloc as many elements as you need, and stop worrying about micro-optimising something that almost certainly won't give you any benefit at all.

Yes, the Linux kernel does things like this all the time. There are two reasons for that:

  1. You don't want to allocate blocks LARGER than a page, since that significantly increases the risk of allocation failure.
  2. The kernel allocation is made on a per-page basis, rather than like the C library, which allocates a large amount of memory in one go, and then splits it into small components.

If you really want to allocate page-size amount of memory, then use the result from sysconf(_SC_PAGESIZE) as your size argument. But it is almost certain that your allocation straddles two pages.

Magnetron answered 21/2, 2013 at 16:37 Comment(5)
Wrong, it is guaranteed to straddle two pages (or even three): malloc(3) has to add some space for bookkeeping to the space handed to your program.Cobham
Yes, I was intentionally ignoring the overhead. It is of course possible that this section lies outside of the 4KB page...Magnetron
@vonbrand: a malloc implementation could store the metadata somewhere else.Peloquin
@Peloquin It could, so his comment is too strong, but the vast majority of malloc implementations store their overhead just before the returned memory, so they can access it very quickly when needed.Vaughnvaught
what a terrible answer. Sometimes you WANT page aligned memory because you want then set that memory page to executableHendley
G
6

Your computation elements=pageSize/sizeof(Node); doesn't take account of the malloc() metadata that are added to any block/chunk of memory returned by malloc(). In many cases, malloc() will return a memory block likely aligned at least on min(sizeof(double),2 * sizeof(void *)) boundary (32 bytes is becoming quite common btw ...). If malloc() gets a memory block aligned on a page, adds its chunk (with padding), and you write a full page size of data, the last bytes are off the first page: so you're ending up using 2 pages.

Want a whole page, just for you, without concerns about wasting memory, without using mmap() / VirtualAlloc() as suggested in the comments ? Here you are:

int ret;
void *ptr = NULL;
size_t page_size = sysconf(_SC_PAGESIZE);

ret = posix_memalign(&ptr, page_size, page_size);
if (ret != 0 || ptr == NULL) {
    fprintf(stderr, "posix_memalign failed: %s\n", strerror(ret));
}

By the way, this is probably about micro-optimization. You probably still haven't checked your Node have a size multiple of a cache-line, nor how to improve cache-locality, nor found a way to reduce memory fragmentation. So you're probably going in the wrong way: make it works first, profil, optimize your algorithms, profil, micro-optimize at the last option.

Grannia answered 21/2, 2013 at 17:1 Comment(2)
Won't this straddle two pages too? At least the malloc bookkeeping will be on the end of the prior page, right? I guess if you think accessing that will be rare (i.e. - only on free), and therefore unimportant, then this might do the trick as desired.Vaughnvaught
Yes, it's very likely that a call to posix_memalign(&ptr, page_size, page_size) translate to an allocation of 2 memory pages, but the surplus can be used by malloc() to fulfill smaller allocation requests.Grannia
V
3

The C11 standard added the aligned_alloc call, so you can do something like:

#include <stdlib.h>
#include <unistd.h>

void *alloc_page( void )
{
    long page_size = sysconf( _SC_PAGESIZE );  /* arguably could be a constant, #define, etc. */

    return ( page_size > 0 ? aligned_alloc( page_size, page_size ) : NULL );
}

The problem with this approach, as others pointed out, is that typically the implementation of the standard alloc calls add some bookkeeping overhead that is stored just before the allocated memory. So, this allocation will usually straddle two pages: the returned page for you to use, and the very end of another page used by the allocator's bookkeeping.

That means when you free or realloc this memory, it may need to touch two pages rather than just the one. Also, if you allocate all or most of your memory this way, then you can "waste" a lot of virtual memory as roughly half of the pages allocated to your process at the OS level will only be used a tiny bit for the allocator's bookkeeping.

How important these issues are is hard to say generally, but preferably they would be avoided somehow. Unfortunately, I haven't figured out a clean, easy, and portable way to do that yet.

==============================

Addendum: If you could dynamically figure out malloc's memory overhead and assume it is always constant, then would asking for that much less usually give us what we want?

#include <stdlib.h>
#include <unistd.h>

/* decent default guesses (e.g. - Linux x64) */

static size_t Page_Size       = 4096;
static size_t Malloc_Overhead = 32;

/* call once at beginning of program (i.e. - single thread, no allocs yet) */

int alloc_page_init( void )  
{
    int     ret       = -1;
    long    page_size = sysconf( _SC_PAGESIZE );
    char   *p1        = malloc( 1 );
    char   *p2        = malloc( 1 );
    size_t  malloc_overhead;

    if ( page_size <= 0 || p1 == NULL || p2 == NULL )
        goto FAIL;

    malloc_overhead = ( size_t ) ( p2 > p1 ? p2 - p1 : p1 - p2 );  /* non-standard pointer math */

    if ( malloc_overhead > 64 || malloc_overhead >= page_size )
        goto FAIL;

    Page_Size       = page_size;
    Malloc_Overhead = malloc_overhead;
    ret             = 0;

FAIL:
    if ( p1 )
        free( p1 );

    if ( p2 )
        free( p2 );

    return ret;
}

void *alloc_page( void )
{
    return aligned_alloc( Page_Size - Malloc_Overhead, Page_Size - Malloc_Overhead );
}

Answer: probably not, because, for example, "As an example of the "supported by the implementation" requirement, POSIX function posix_memalign accepts any alignment that is a power of two and a multiple of sizeof(void *), and POSIX-based implementations of aligned_alloc inherit these requirements."

The above code would likely not request an alignment that is a power of 2 and will therefore likely fail on most platforms.

It seems this is an unavoidable problem with the typical implementations of standard allocation functions. So, it is probably best to just align and alloc based on the page size and likely pay the penalty of the allocator's bookkeeping residing on another page, or use an OS specific call like mmap to avoid this issue.

Vaughnvaught answered 21/3, 2018 at 16:35 Comment(2)
What one page size are you referring to? x86 and most other architectures support multiple page sizes. See https://mcmap.net/q/13918/-how-does-x86-paging-work for x86 details.Darreldarrell
@AndrewHenle Both bits of code dynamically interrogate the current page size. However, the second basically assumes the page size doesn't change during a run.Vaughnvaught
A
1

The standards provide no guarantee that malloc even has a concept of page size. However, it's not uncommon for malloc implementations to dole out entire pages when the allocation size requested is on the order of the page size (or larger).

There's certainly no harm in asking for an allocation that happens to be equal to the page size (or a multiple of the page size) and subdividing it yourself, though it is a little extra work. You might indeed get the behavior you desire, at least on some machines/compiler/library combinations. But you might not either. If you absolutely require page-sized allocations and/or page-aligned memory, you'll have to call an OS-specific API to get it.

Artificial answered 21/2, 2013 at 18:44 Comment(2)
C11 added aligned_alloc as part of the standard. You still probably need an OS specific way of knowing the page size, assuming it has one. en.cppreference.com/w/c/memory/aligned_allocVaughnvaught
@jschultz410: That's good to know. You should probably post that as an answer.Artificial
N
1

If your question is about how to alloc whole memory pages: Use mmap(), not malloc().
Reason:
malloc() must always add some metadata to every allocation, so if you do malloc(4096) it will definitely allocate more than a single page. mmap(), on the other hand, is the kernel's API to map pages into your address space. It's what malloc() uses under the hood.

If your question is about correct rounding: The usual trick to round a up to a multiple of N is to say rounded = (a + N-1)/N*N;. By adding N-1 first, you ensure that the division will round up in all cases. In the case that a is already a multiple of N, the added N-1 will be without effect; in all other cases, you get one more than with rounded = a/N*N;.

Nifty answered 21/3, 2018 at 17:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.