How to find holes in the address space?
Asked Answered
C

2

5

I have a set of files whose lengths are all multiples of the page-size of my operating system (FreeBSD 10). I would like to mmap() these files to consecutive pages of RAM, giving me the ability to treat a collection of files as one large array of data.

Preferably using portable functions, how can I find a sufficiently large region of unmapped address space so I can be sure that a series of mmap() calls to this region is going to be successful?

Charmer answered 1/1, 2016 at 21:22 Comment(5)
May I know what you're tryna do?Leastways
@cad See first paragraph. Basically, I have a dataset which is split into multiple files and I want to map it into a continuous memory region to treat it as one.Charmer
Can you mmap() the first file letting the o/s choose the address for you, and then try to map the other files contiguously with that? I'd expect that to work reasonably well — but I've not tested it on any system, least of all FreeBSD 10.Danish
@JonathanLeffler The data set is some 500 GB in size and each chunk is 50 MB. It's very likely that the OS fits the first mapping somewhere in the low address range without 500 GB free range above it.Charmer
It would be sensible to include such size information in the question. To be even contemplating 500 GiB in memory, you must be on a large 64-bit machine. That means there are large (even larger than 500 GiB) gaps in the memory map — the 64-bit address space is a million times bigger than that (with some space left over). You could probably argue that you could choose almost any well aligned address and probably get away with it. You might need to look at where your shared libraries, stack, heap are, just to make sure you stay clear of those. The 'try asking' approach in the answer is similar.Danish
D
6

Follow these steps:

  1. First compute the total size needed by enumerating your files and summing their sizes.
  2. Map a single area of anonymous memory of this size with mmap. If this fails, you lose.
  3. Save the pointer and unmap the area (actually, unmap may not be necessary if your system's mmap with a fixed address implicitly unmaps any previous overlapping region).
  4. Map the first file at this address with the appropriate MAP_FIXED flag.
  5. Increment the address by the file size.
  6. loop to step 4 until all files have been mmapped.

This should be fully portable to any POSIX system, but some OSes might have quirks that prevent this method. Try it.

Desmund answered 1/1, 2016 at 21:38 Comment(4)
Great idea! To get the initial probe-mapping, I can create a sparse-file of the desired length and map that as my operating system won't let me map more anonymous memory than I have RAM (as far as I'm concerned).Charmer
Oh yeah, you don't even need to unmap the area—mmap will happily map right over it with MAP_FIXED as far as I'm concerned.Charmer
This also has the side-effect of avoiding the situation where another thread races with you for the address range you just cleared for use.Charmer
@FUZxxl - I missed that. Comment deleted.Lemuelah
C
2

You could mmap a large region where the size is the sum of the sizes of all files, using MAP_PRIVATE | MAP_ANON, and protection PROT_NONE which would prevent the OS from unnecessarily committing the memory charges.

This will reserve but not commit memory.

You could then open file filename1 at [baseAddr, size1) and open filename2 at [baseAddr + size1, baseAddr + size1 + size2), and so on.

I believe the flags for this are MAP_FIXED | MAP_PRIVATE.

Cloudy answered 1/1, 2016 at 21:43 Comment(4)
FreeBSD has no MAP_NORESERVE.Charmer
I believe this should still work without MAP_NORESERVE. That would only cause reservation of swap space also, which is not an issue often.Cloudy
It's likely an issue here as the dataset is much larger than my RAM, but I need to check the behaviour of FreeBSD for this case first.Charmer
The alternate solution would cause the same behaviour for this caseCloudy

© 2022 - 2024 — McMap. All rights reserved.