Does madvise(___, ___, MADV_DONTNEED) instruct the OS to lazily write to disk?
Asked Answered
K

4

10

Hypothetically, suppose I want to perform sequential writing to a potentially very large file.

If I mmap() a gigantic region and madvise(MADV_SEQUENTIAL) on that entire region, then I can write to the memory in a relatively efficient manner. This I have gotten to work just fine.

Now, in order to free up various OS resources as I am writing, I occasionally perform a munmap() on small chunks of memory that have already been written to. My concern is that munmap() and msync()will block my thread, waiting for the data to be physically committed to disk. I cannot slow down my writer at all, so I need to find another way.

Would it be better to use madvise(MADV_DONTNEED) on the small, already-written chunk of memory? I want to tell the OS to write that memory to disk lazily, and not to block my calling thread.

The manpage on madvise() has this to say, which is rather ambiguous:

MADV_DONTNEED
Do  not expect access in the near future.  (For the time being, the 
application is finished with the given range, so the kernel can free
resources associated with it.)  Subsequent accesses of pages in this
range will succeed, but will result either in re-loading  of the memory
contents from the underlying mapped file (see mmap(2)) or
zero-fill-on-demand pages for mappings without an underlying file.
Karolinekaroly answered 19/2, 2013 at 21:59 Comment(3)
I wouldn't try this; MADV_DONTNEED on a file mapping may be interpreted as meaning that you want the OS to throw away changes to the file.Superdreadnought
@Zack, do you have a reference for MADV_DONTNEED discarding changes to a file?Ashling
@antonm man7.org/tlpi/code/online/dist/vmem/madvise_dontneed.c.html has a program that demonstrates it (not self-contained, unfortunately, but easy enough to modify). See also gnu.org/software/libc/manual/html_node/… ("MADV_DONTNEED: The region is no longer needed. The kernel may free these pages, causing any changes to the pages to be lost" (emphasis mine)) and this LKML thread from 2005: lkml.org/lkml/2005/6/28/188 .Superdreadnought
S
23

No!

For your own good, stay away from MADV_DONTNEED. Linux will not take this as a hint to throw pages away after writing them back, but to throw them away immediately. This is not considered a bug, but a deliberate decision.

Ironically, the reasoning is that the functionality of a non-destructive MADV_DONTNEED is already given by msync(MS_INVALIDATE|MS_ASYNC), MS_ASYNC on the other hand does not start I/O (in fact, it does nothing at all, following the reasoning that dirty page writeback works fine anyway), fsync always blocks, and sync_file_range may block if you exceed some obscure limit and is considered "extremely dangerous" by the documentation, whatever that means.

Either way, you must msync(MS_SYNC), or fsync (both blocking), or sync_file_range (possibly blocking) followed by fsync, or you will lose data with MADV_DONTNEED. If you cannot afford to possibly block, you have no choice, sadly, but to do this in another thread.

Stoichiometry answered 29/10, 2013 at 14:34 Comment(3)
I think you mean msync(MS_INVALIDATE... rather than madvise()Inshore
@Stoichiometry your assertive answer was referenced in Bryan Cantrill's 2015 Surge rant about Linux's MADV_DONTNEED behaviour.Hannan
One of the most off-putting, shrill, rude, potty-mouthed, disgusting rant that is passing off as a talk actually trying to teach the audience something. Couldn't sit through a second without wincing. Is this the format for such talks?Euridice
S
4

For recent Linux kernels (just tested on Linux 5.4), MADV_DONTNEED works as expected when the mapping is NOT private, e.g. mmap without MAP_PRIVATE flag. I'm not sure what's the behavior on previous versions of Linux kernel.

From release 4.15 of the Linux man-pages project's madvise manpage:

After a successful MADV_DONTNEED operation, the semantics of memory access in the specified region are changed: subsequent accesses of pages in the range will succeed, but will result in either repopulating the memory contents from the up-to-date contents of the underlying mapped file (for shared file mappings, shared anonymous mappings, and shmem-based techniques such as System V shared memory segments) or zero-fill-on-demand pages for anonymous private mappings.

Linux added a new flag MADV_FREE with the same behavior in BSD systems in Linux 4.5

which just mark pages as available to free if needed, but it doesn't free them immediately, making possible to reuse the memory range without incurring in the costs of faulting the pages again.

For why MADV_DONTNEED for private mapping may result zero filled pages upon future access, watch Bryan Cantrill's rant as mentioned in comments of @Damon's answer. Spoiler: it comes from Tru64 UNIX.

Sarawak answered 14/5, 2021 at 15:0 Comment(0)
A
2

As already mentioned, MADV_DONTNEED is not your friend. Since Linux 5.4, you can use MADV_COLD to tell the kernel it should page out that memory when there is memory pressure. This seems to be exactly what is wanted in this situation.

Read more here: https://lwn.net/Articles/793462/

Asoka answered 9/7, 2021 at 14:7 Comment(0)
T
0

first, madv_sequential enables aggressive readahead, so you don't need it. second, os will lazily write dirty file-baked memory to disk anyway, even if you will do nothing. but madv_dontneed will instruct it to free memory immediately (what you call "various os resources"). third, it is not clear that mmapping files for sequential writing has any advantage. you probably will be better served by just write(2) (but use buffers - either manual or stdio).

Thirza answered 20/2, 2013 at 2:26 Comment(1)
This answer is just wrong, see above answer for why.Ecru

© 2022 - 2024 — McMap. All rights reserved.