How to Disable Copy-on-write and zero filled on demand for mmap()
Asked Answered
F

1

6

I am implementing cp(file copy) command using mmap(). For that I mapped the source file in MAP_PRIVATE (As I just want to read)mode and destination file in MAP_SHARED mode(As I have to writeback the changed content of destination file).

While doing this I have observed performance penalty due to lots of minor page faults that occurs due to 2 reason. 1) Zero fill on demand while calling mmap(MAP_PRIVATE) for source file. 2) Copy on write while calling mmap(MAP_SHARED) for destination file.

Is there any way to disable Zero-fill-on-demand and Copy-on-write ?

Thanks, Harish

Ferrule answered 21/6, 2012 at 6:18 Comment(5)
I am surprised that you see a performance penalty for zero fill, how are you measuring it? You don't want to disable COW, it is fundamental to the way virtual memory works, and improves performance. Have you considered that using write(2) might be more efficient for the copy? Specify the private map as the buffer to write. It also avoids the step of expanding the new file, since write(2) will do it for you.Collarbone
I am measuring the minor page fault by getrusage().It show there are nearly 50000 minor page fault to copy 1gb of file with mmam()(nearly 25000 for read mmap(MAP_PRIVATE) and same for write mmap(MAP_SHARED)). Yes, I have checked write(2) is more efficient than mmap() for copying but I think mmap() can be efficient if we disable Zero-fill-on-demand and Copy-on-write.Ferrule
Harish, check madvice() and mlock() syscalls. They may affect number of page faults. And for fast file copy, check syscall sendfile().Cruciferous
@osgx,I have a control over the major page fault but the problem is with the minor page fault..Ferrule
1) does not happen except for the last partial page (if there is one), and I don't understand 2). Why using copy on write on the destination? Also, trying to improve cp performace under Linux is probably the best case in point for splice. Saves the roundtrip to user space alltogether.Kail
C
5

There is MMAP_POPULATE flag of mmap(2):

http://linux.die.net/man/2/mmap

MAP_POPULATE (since Linux 2.5.46) Populate (prefault) page tables for a mapping. For a file mapping, this causes read-ahead on the file. Later accesses to the mapping will not be blocked by page faults. MAP_POPULATE is only supported for private mappings since Linux 2.6.23.

It should pre-fault all pages in mmapped region. It should work for question (1), and may not work for question (2) (shared).

Cruciferous answered 21/6, 2012 at 14:8 Comment(2)
Note: MAP_POPULATE means no delays when you're using the mapping (unless it gets paged out by memory pressure), but it also means the mmap call itself blocks until the whole file is read in. It's often better to avoid MAP_POPULATE in favor of posix_madvise (or non-standardized madvise) using POSIX_MADV_WILLNEED, which is equivalent to MAP_POPULATE, but doesn't block. You can open/map the source file, advise it to load, and the OS will background read in bulk, rather than demand faulting.Electrothermal
You might block on reading from the mmap, but because the whole read in is scheduled up front, the read will already be in progress when you hit the unpopulated page; you won't be dispatching new I/O requests live.Electrothermal

© 2022 - 2024 — McMap. All rights reserved.