Thread Pool, Shared Data, Java Synchronization

Asked 26/6, 2012 at 20:39 Answered 26/6, 2012 at 21:19

Say, I have a data object:

class ValueRef { double value; }

Where each data object is stored in a master collection:

Collection<ValueRef> masterList = ...;

I also have a collection of jobs, where each job has a local collection of data objects (where each data object also appears in the masterList):

class Job implements Runnable { 
     Collection<ValueRef> neededValues = ...; 
     void run() {
         double sum = 0;
         for (ValueRef x: neededValues) sum += x;
         System.out.println(sum);
     } 
}

Use-case:

for (ValueRef x: masterList) { x.value = Math.random(); }
Populate a job queue with some jobs.
Wake up a thread pool
Wait until each job has been evaluated

Note: During the job evaluation, all of the values are all constant. The threads however, have possibly evaluated jobs in the past, and retain cached values.

Question: what is the minimal amount of synchronization necessary to ensure each thread sees the latest values?

I understand synchronize from the monitor/lock-perspective, I do not understand synchronize from the cache/flush-perspective (ie. what is being guaranteed by the memory model on enter/exit of the synchronized block).

To me, it feels like I should need to synchronize once in the thread that updates the values to commit the new values to main memory, and once per worker thread, to flush the cache so the new values are read. But I'm unsure how best to do this.

My approach: create a global monitor: static Object guard = new Object(); Then, synchronize on guard, while updating the master list. Then finally, before starting the thread pool, once for each thread in the pool, synchronize on guard in an empty block.

Does that really cause a full flush of any value read by that thread? Or just values touched inside the synchronize block? In which case, instead of an empty block, maybe I should read each value once in a loop?

Thanks for your time.

Edit: I think my question boils down to, once I exit a synchronized block, does every first read (after that point) go to main memory? Regardless of what I synchronized upon?

Henden answered 26/6, 2012 at 20:39 Comment(4)

Seems like almost a perfect place for taking advantage of the volatile keyword – Stirpiculture 26/6, 2012 at 20:43

I'm writing once (effectively-constant), but potentially reading millions of times. Volatile is never cached locally. If I created the thread pool each time, the code would work fine w/o synchronization/volatile (since no prior cache would exist). – Henden 26/6, 2012 at 20:46

I don't see a need for volatile here. If ValueRef is effectively immutable, just make it actually immutable. Use Double. Create a new collection for each job before it is scheduled and wrap that in unmodifiableCollection (just as a reminder). What problem do you forsee? – Unplug 26/6, 2012 at 20:53

Immutable until the jobs are finished. Then the values are changed again and the jobs are restarted. – Henden 26/6, 2012 at 21:0

It doesn't matter that threads of a thread pool have evaluated some jobs in the past.

Javadoc of Executor says:

Memory consistency effects: Actions in a thread prior to submitting a Runnable object to an Executor happen-before its execution begins, perhaps in another thread.

So, as long as you use standard thread pool implementation and change the data before submitting the jobs you shouldn't worry about memory visibility effects.

Beldam answered 26/6, 2012 at 20:47 Comment(2)

This is because...in the worker threads, there is a synchronization block waiting for new jobs? And when that block exits, the threads entire cache is cleared? Could I just synchronize on something random and get the same effect? – Henden 26/6, 2012 at 21:9

@AndrewRaffensperger: It doesn't matter how it's implemented - there is a guarantee and it should be provided. Regarding the last question - basically so, but in makes no sense: without additional means of synchronization you cannot say that syncrhonized blocks in worker threads executed after synchronized block in the main thread; with additional means of synchronization it's redundant. – Beldam 26/6, 2012 at 21:26

What you are planning sounds sufficient. It depends on how you plan to "wake up thread pool."

The Java Memory Model provides that all writes performed by a thread before entering a synchronized block are visible to threads that subsequently synchronize on that lock.

So, if you are sure the worker threads are blocked in a wait() call (which must be inside a synchronized block) during the time you update the master list, when they wake up and become runnable, the modifications made by the master thread will be visible to these threads.

I would encourage you, however, to apply the higher level concurrency utilities in the java.util.concurrent package. These will be more robust than your own solution, and are a good place to learn concurrency before delving deeper.

Just to clarify: It's almost impossible to control worker threads without using a synchronized block where a check is made to see whether the worker has a task to implement. Thus, any changes made by the controller thread to the job happen-before the worker thread awakes. You require a synchronized block, or at least a volatile variable to act as a memory barrier; however, I can't think how you'd create a thread pool with using one of these.

As an example of the advantages of using the java.util.concurrency package, consider this: you could use a synchronized block with a wait() call in it, or a busy-wait loop with a volatile variable. Because of the overhead of context switching between threads, a busy wait can actually perform better under certain conditions—it's not necessary the horrible idea that one might assume at first glance.

If you use the Concurrency utilities (in this case, probably an ExecutorService), the best selection for your particular case can be made for you, factoring in the environment, the nature of the task, and the needs of other threads at a given time. Achieving that level of optimization yourself is a lot of needless work.

Aforetime answered 26/6, 2012 at 20:52 Comment(2)

I can't afford the overhead of java.util.concurrent. The data in my example is updated once, then becomes "constant" during multi-threaded evaluation. I'm interested in how that data becomes visible to the other pre-existing threads. It appears that any synchronization block, even w/o any synchronized happens-before relation, causes this visibility. Or maybe happens-before doesn't require any explicit synchronization, and "no jobs are run until all value changes are made" fits the bill. – Henden 26/6, 2012 at 21:28

@AndrewRaffensperger Right. If that's all you need, there is a java.util.concurrent utility with the minimum overhead required for correctness. It is a mistake to assume that the Concurrency utilities have higher overhead; in fact, they provide access to high-performance concurrency tools like compare-and-swap. Implementing this yourself in Java is going to be slower than the optimized native code behind the AtomicXXX classes. There are similar performance advantages in most of the other utilities. – Aforetime 26/6, 2012 at 21:55

Why don't you make Collection<ValueRef> and ValueRef immutable or at least don't modify the values in the collection after you have published the reference to the collection. Then you will not have any worry about synchronization.

That is when you want to change the values of the collection, create a new collection and put new values in it. Once the values have been set pass the collection reference new job objects.

The only reason not to do this would be if the size of the collection is so large that it barely fits in memory and you cannot afford to have two copies, or the swapping of the collections would cause too much work for the garbage collector (prove that one of these is a problem before you use a mutable data structure for threaded code).

Inflect answered 26/6, 2012 at 21:19 Comment(1)

Right, I could always reconstruct the ValueRef's or reconstruct the thread pool, and my problem disappears. But in my actual implementation, the data structure is very complex, and code is called frequently enough that rebuilding the thread pool each evaluation would be too much overhead. – Henden 26/6, 2012 at 21:25

Recommended topics

Hot tags