Can multiple threads see writes on a direct mapped ByteBuffer in Java?
Asked Answered
N

7

26

I'm working on something that uses ByteBuffers built from memory-mapped files (via FileChannel.map()) as well as in-memory direct ByteBuffers. I am trying to understand the concurrency and memory model constraints.

I have read all of the relevant Javadoc (and source) for things like FileChannel, ByteBuffer, MappedByteBuffer, etc. It seems clear that a particular ByteBuffer (and relevant subclasses) has a bunch of fields and the state is not protected from a memory model point of view. So, you must synchronize when modifying state of a particular ByteBuffer if that buffer is used across threads. Common tricks include using a ThreadLocal to wrap the ByteBuffer, duplicate (while synchronized) to get a new instance pointing to the same mapped bytes, etc.

Given this scenario:

  1. manager has a mapped byte buffer B_all for the entire file (say it's <2gb)
  2. manager calls duplicate(), position(), limit(), and slice() on B_all to create a new smaller ByteBuffer B_1 that a chunk of the file and gives this to thread T1
  3. manager does all the same stuff to create a ByteBuffer B_2 pointing to the same mapped bytes and gives this to thread T2

My question is: Can T1 write to B_1 and T2 write to B_2 concurrently and be guaranteed to see each other's changes? Could T3 use B_all to read those bytes and be guaranteed to see the changes from both T1 and T2?

I am aware that writes in a mapped file are not necessarily seen across processes unless you use force() to instruct the OS to write the pages down to disk. I don't care about that. Assume for this question that this JVM is the only process writing a single mapped file.

Note: I am not looking for guesses (I can make those quite well myself). I would like references to something definitive about what is (or is not) guaranteed for memory-mapped direct buffers. Or if you have actual experiences or negative test cases, that could also serve as sufficient evidence.

Update: I have done some tests with having multiple threads write to the same file in parallel and so far it seems those writes are immediately visible from other threads. I'm not sure if I can rely on that though.

Navarrette answered 9/8, 2011 at 20:27 Comment(7)
I read the api for MappedByteBuffer (Java 7) and they warn that it should only be used for read/write, not manipulations.Wyman
Yes, my question is about two threads writing. I don't know what you mean by "manipulations".Navarrette
JavaDoc of MappedByteBuffer: "All or part of a mapped byte buffer may become inaccessible at any time, for example if the mapped file is truncated. An attempt to access an inaccessible region of a mapped byte buffer will not change the buffer's content and will cause an unspecified exception to be thrown either at the time of the access or at some later time. It is therefore strongly recommended that appropriate precautions be taken to avoid the manipulation of a mapped file by this program, or by a concurrently running program, except to read or write the file's content."Wyman
Since you want the threads to read the data too, I thought you were doing something more complicated than just writing.Wyman
That javadoc is talking about doing things like editing the file from a separate process or using other normal file ops to change that file outside the buffer. I'm just reading and writing through mapped ByteBuffers.Navarrette
Sorry, I did misread the javadoc; you are not doing any "manipulations". However I don't know about the synchronization of the buffer and you might still have a problem there.Wyman
"... a new smaller ByteBuffer B_1 that a chunk of the file and gives this to thread T1" ?Shippee
K
17

Memory mapping with the JVM is just a thin wrapper around CreateFileMapping (Windows) or mmap (posix). As such, you have direct access to the buffer cache of the OS. This means that these buffers are what the OS considers the file to contain (and the OS will eventually synch the file to reflect this).

So there is no need to call force() to sync between processes. The processes are already synched (via the OS - even read/write accesses the same pages). Forcing just synchs between the OS and the drive controller (there can be some delay between the drive controller and the physical platters, but you don't have hardware support to do anything about that).

Regardless, memory mapped files are an accepted form of shared memory between threads and/or processes. The only difference between this shared memory and, say, a named block of virtual memory in Windows is the eventual synchronization to disk (in fact mmap does the virtual memory without a file thing by mapping /dev/null).

Reading writing memory from multiple processes/threads does still need some synch, as processors are able to do out-of-order execution (not sure how much this interacts with JVMs, but you can't make presumptions), but writing a byte from one thread will have the same guarantees as writing to any byte in the heap normally. Once you have written to it, every thread, and every process, will see the update (even through an open/read operation).

For more info, look up mmap in posix (or CreateFileMapping for Windows, which was built almost the same way.

Kono answered 11/8, 2011 at 4:11 Comment(2)
Excellent answer; peeled the sheet back and gave specifics to all of the "behavior is unspecified" language in the JDs. Thanks Paul.Mound
While this works and is relatively unlikely to change, it is not guaranteed. The JavaDoc for ByteBuffer states: Given a direct byte buffer, the Java virtual machine will make a best effort to perform native I/O operations directly upon it. That is, it will attempt to avoid copying the buffer's content to (or from) an intermediate buffer before (or after) each invocation of one of the underlying operating system's native I/O operations. "Attempt" and "best effort" mean implementations are not required/guaranteed to behave the way you want.Sisto
P
5

No. The JVM memory model (JMM) does not guarantee that multiple threads mutating (unsynchronized) data will see each others changes.

First, given all the threads accessing the shared memory are all in the same JVM, the fact that this memory is being accessed through a mapped ByteBuffer is irrelevant (there is no implicit volatile or synchronization on memory accessed through a ByteBuffer), so the question is equivalent to one about accessing a byte array.

Let's rephrase the question so its about byte arrays:

  1. A manager has a byte array: byte[] B_all
  2. A new reference to that array is created: byte[] B_1 = B_all, and given to thread T1
  3. Another reference to that array is created: byte[] B_2 = B_all, and given to thread T2

Do writes to B_1 by thread T1 get seen in B_2 by thread T2?

No, such writes are not guaranteed to be seen, without some explicit synchronization between T_1 and T_2. The core of the problem is that the JVM's JIT, the processor, and the memory architecture are free to re-order some memory accesses (not just to piss you off, but to improve performance through caching). All these layers expect the software to be explicit (through locks, volatile or other explicit hints) about where synchronization is required, implying these layers are free to move stuff around when no such hints are provided.

Note that in practice whether you see the writes or not depends mostly on the hardware and the alignment of the data in the various levels of caches and registers, and how "far" away the running threads are in the memory hierarchy.

JSR-133 was an effort to precisely define the Java Memory Model circa Java 5.0 (and as far as I know its still applicable in 2012). That is where you want to look for definitive (though dense) answers: http://www.cs.umd.edu/~pugh/java/memoryModel/jsr133.pdf (section 2 is most relevant). More readable stuff can be found on the JMM web page: http://www.cs.umd.edu/~pugh/java/memoryModel/

Part of my answer is asserting that the a ByteBuffer is no different from a byte[] in terms of data synchronization. I can't find specific documentation that says this, but I suggest that "Thread Safety" section of the java.nio.Buffer doc would mention something about synchronization or volatile if that was applicable. Since the doc doesn't mention this, we should not expect such behavior.

Propaedeutic answered 19/4, 2012 at 7:51 Comment(0)
W
3

The cheapest thing you can do is use a volatile variable. After a thread writes to the mapped area, it should write a value to a volatile variable. Any reading thread should read the volatile variable before reading the mapped buffer. Doing this produces a "happens-before" in the Java memory model.

Note that you have NO guarantee that another process is in the middle of writing something new. But if you want to guarantee that other threads can see something you've written, writing a volatile (followed by reading it from the reading thread) will do the trick.

Wheezy answered 17/2, 2012 at 5:32 Comment(0)
M
2

I would assume that direct memory provides the same guarantees or lack of them as heap memory. If you modify a ByteBuffer which shares an underlying array or direct memory address, a second ByteBuffer is another thread can see the changes, but is not guaranteed to do so.

I suspect even if you use synchronized or volatile, it is still not guaranteed to work, however it may well do so depending on the platform.

A simple way to change data between threads is to use an Exchanger

Based on the example,

class FillAndEmpty {
   final Exchanger<ByteBuffer> exchanger = new Exchanger<ByteBuffer>();
   ByteBuffer initialEmptyBuffer = ... a made-up type
   ByteBuffer initialFullBuffer = ...

   class FillingLoop implements Runnable {
     public void run() {
       ByteBuffer currentBuffer = initialEmptyBuffer;
       try {
         while (currentBuffer != null) {
           addToBuffer(currentBuffer);
           if (currentBuffer.remaining() == 0)
             currentBuffer = exchanger.exchange(currentBuffer);
         }
       } catch (InterruptedException ex) { ... handle ... }
     }
   }

   class EmptyingLoop implements Runnable {
     public void run() {
       ByteBuffer currentBuffer = initialFullBuffer;
       try {
         while (currentBuffer != null) {
           takeFromBuffer(currentBuffer);
           if (currentBuffer.remaining() == 0)
             currentBuffer = exchanger.exchange(currentBuffer);
         }
       } catch (InterruptedException ex) { ... handle ...}
     }
   }

   void start() {
     new Thread(new FillingLoop()).start();
     new Thread(new EmptyingLoop()).start();
   }
 }
Moss answered 9/8, 2011 at 20:37 Comment(1)
I'm very much NOT assuming that direct memory has the same guarantees as heap memory. If that's true, I'd like to see a reference to something that says so.Navarrette
N
1

One possible answer I've run across is using file locks to gain exclusive access to the portion of the disk mapped by the buffer. This is explained with an example here for instance.

I'm guessing that this would really guard the disk section to prevent concurrent writes on the same section of file. The same thing could be achieved (in a single JVM but invisible to other processes) with Java-based monitors for sections of the disk file. I'm guessing that would be faster with the downside of being invisible to external processes.

Of course, I'd like to avoid either file locking or page synchronization if consistency is guaranteed by the jvm/os.

Navarrette answered 10/8, 2011 at 15:3 Comment(1)
Try asking this on the Concurrency list. I had asked something similar with regards to on-heap byte[] - [markmail.org/thread/2sldp7bz4k5aq5ei] and [markmail.org/thread/wpcu55p6psarvyho]. But no idea how mapped memory handles concurrency.Bales
S
0

I do not think that this is guaranteed. If the Java Memory Model doesn't say that it's guaranteed it is by definition not guaranteed. I would either guard buffer writes with synchronized or queue writes for one thread that handles all writes. The latter plays nicely with multicore caching (better to have 1 writer for each RAM location).

Starch answered 9/8, 2011 at 20:36 Comment(1)
We're talking about memory that is mapped to a region of disk (off-heap). I see no reason to assume that it has the same constraints. Can you provide a reference to that effect?Navarrette
N
0

No, it's no different from normal java variables or array elements.

Nga answered 9/8, 2011 at 21:53 Comment(1)
Why isn't it different? The underlying data isn't Java variables or array elements, it is mapped memory provided by the operating system.Venireman

© 2022 - 2024 — McMap. All rights reserved.