We're experimenting with changing SQLite, an embedded database system, to use mmap() instead of the usual read() and write() calls to access the database file on disk. Using a single large mapping for the entire file. Assume that the file is small enough that we have no trouble finding space for this in virtual memory.
So far so good. In many cases using mmap() seems to be a little faster than read() and write(). And in some cases much faster.
Resizing the mapping in order to commit a write-transaction that extends the database file seems to be a problem. In order to extend the database file, the code could do something like this:
ftruncate(); // extend the database file on disk
munmap(); // unmap the current mapping (it's now too small)
mmap(); // create a new, larger, mapping
then copy the new data into the end of the new memory mapping. However, the munmap/mmap is undesirable as it means the next time each page of the database file is accessed a minor page fault occurs and the system has to search the OS page cache for the correct frame to associate with the virtual memory address. In other words, it slows down subsequent database reads.
On Linux, we can use the non-standard mremap() system call instead of munmap()/mmap() to resize the mapping. This seems to avoid the minor page faults.
QUESTION: How should this be dealt with on other systems, like OSX, that do not have mremap()?
We have two ideas at present. And a question regarding each:
1) Create mappings larger than the database file. Then, when extending the database file, simply call ftruncate() to extend the file on disk and continue using the same mapping.
This would be ideal, and seems to work in practice. However, we're worried about this warning in the man page:
"The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified."
QUESTION: Is this something we should be worried about? Or an anachronism at this point?
2) When extending the database file, use the first argument to mmap() to request a mapping corresponding to the new pages of the database file located immediately after the current mapping in virtual memory. Effectively extending the initial mapping. If the system can't honour the request to place the new mapping immediately after the first, fall back to munmap/mmap.
In practice, we've found that OSX is pretty good about positioning mappings in this way, so this trick works there.
QUESTION: if the system does allocate the second mapping immediately following the first in virtual memory, is it then safe to eventually unmap them both using a single big call to munmap()?
munmap
does a synchronousmsync
if I remember correctly. In factmsync
was always synchronous on Solaris 10 even whenMS_ASYNC
was specified. These were a couple of the last nails in Solaris coffin. – Nagelftruncate()
won't update the mapping. – Barque