Analyzing of x86 output generated by JIT in the context of volatile

Asked 17/7, 2017 at 19:2 Answered 18/7, 2017 at 8:35

Solved java jvm volatile memory-barriers

I am writting this post in connection to Deep understanding of volatile in Java

public class Main {
    private int x;
    private volatile int g;


    public void actor1(){
       x = 1;
       g = 1;
    }


    public void actor2(){
       put_on_screen_without_sync(g);
       put_on_screen_without_sync(x);
    }
}

Now, I am analyzing what JIT generated for above piece of code. From our discussion in my previous post we know that output 1, 0 is impossible because:

write to volatile v causes that every action a preceeding v causes that a will be visible (will be flushed to memory) before v will be visible.

   .................(I removed not important body of method).....

  0x00007f42307d9d5e: c7460c01000000     (1) mov       dword ptr [rsi+0ch],1h
                                                ;*putfield x
                                                ; - package.Main::actor1@2 (line 14)

  0x00007f42307d9d65: bf01000000          (2) mov       edi,1h
  0x00007f42307d9d6a: 897e10              (3) mov       dword ptr [rsi+10h],edi
  0x00007f42307d9d6d: f083042400          (4) lock add  dword ptr [rsp],0h
                                                ;*putfield g
                                                ; - package.Main::actor1@7 (line 15)

  0x00007f42307d9d72: 4883c430            add       rsp,30h
  0x00007f42307d9d76: 5d                  pop       rbp
  0x00007f42307d9d77: 850583535116        test      dword ptr [7f4246cef100h],eax
                                                ;   {poll_return}
  0x00007f42307d9d7d: c3                  ret

Do I understand correctly that it works because x86 cannot make StoreStore reordering? If it could it would require additional memory barrier, yes?

EDITED AFTER EXCELLENT @Eugene's answer:

 int tmp = i; // volatile load
 // [LoadStore]
 // [LoadLoad]

Here, I see what do you mean- it is clear: every action below (after) volatile read (int tmp = i) doesn't be reordered.

 // [StoreLoad] -- this one
 int tmp = i; // volatile load
 // [LoadStore]
 // [LoadLoad]

Here, you put one more barrier. It ensures us that no action will be reordered with int tmp = i. But, why it is important? Why I have doubts? From what I know volatile load guarantees:

Every action after volatile load won't be reordered before volatile load is visible.

I see you write:

There needs to be a sequential consistency

But, I cannot see why sequential consistency is required.

Ruffle answered 17/7, 2017 at 19:2 Comment(4)

What a? What v? Did you mean x and g? – Willettawillette 17/7, 2017 at 19:24

Now, a is any action above v- for example it is an action: x = 1. v is a store: g = 1 – Ruffle 17/7, 2017 at 19:28

The JMM wasn't made for x86 or any other specific architecture and doesn't reason in terms of loadload or storestore. It is the responsibility of a JVM to implement the JMM with the instructions available on each architecture. – Babu 17/7, 2017 at 23:13

@assylias, I know it. I'm trying to investigate why memory barrier is placed after g = 1. It seems to be errorneous but it isn't in fact. I just try to understand why. – Ruffle 18/7, 2017 at 5:54

A couple of things, first will be flushed to memory - that's pretty erroneous. It's almost never a flush to main memory - it usually drains the StoreBuffer to L1 and it's up to the cache coherency protocol to sync the data between all caches, but if it's easier for you to understand this concept in these terms, it's fine - just know that is slightly different and faster.

It's a good question of why the [StoreLoad] is there indeed, maybe this will clear up things a bit. volatile is indeed all about fences and here is an example of what barriers would be inserted in case of some volatile operations. For example we have a volatile load:

  // i is some shared volatile field
  int tmp = i; // volatile load of "i"
  // [LoadLoad|LoadStore]

Notice the two barriers here LoadStore and LoadLoad; in plain english it means that any Load and Store that come after a volatile load/read can not "move up" the barrier, they can not be re-ordered "above" that volatile load.

And here is the example for volatile store.

 // "i" is a shared volatile variable
 // [StoreStore|LoadStore]
 i = tmp; // volatile store

It means that any Load and Store can not go "below" the load store itself.

This basically builds the happens-before relationship, volatile load being the acquiring load and volatile store being the releasing store (this also has to do with how Store and Load cpu buffers are implemented, but it's pretty much out of the scope of the question).

If you think about it, it makes perfect sense about things that we know about volatile in general; it says that once a volatile store has been observed by a volatile load, everything prior to a volatile store will be observed also and this is on-par with memory barriers. It makes sense now that when a volatile store takes place, everything above it can not go beyond it, and once a volatile load happens, everything below it can not go above it, otherwise this happens-before would be broken.

But that's not it, there's more. There needs to be sequential consistency, that is why any sane implementation will guarantee that volatiles themselves are not re-ordered, thus two more fences are inserted:

 // any store of some other volatile
 // can not be reordered with this volatile load
 // [StoreLoad] -- this one
 int tmp = i; // volatile load of a shared variable "i"
 // [LoadStore|LoadLoad]

And one more here:

// [StoreStore|LoadStore]
i = tmp; // volatile store
// [StoreLoad] -- and this one

Now, it turns out that on x86 3 out of 4 memory barriers are free - since it is a strong memory model. The only one that needs to be implemented is StoreLoad. On other CPU's, like ARM for example, lwsycn is one instruction used - but I don't know much about them.

Usually an mfence is a good option for StoreLoad on x86, but the same thing is guaranteed via lock add (AFAIK in a cheaper way), that is why you see it there. Basically that is the StoreLoad barrier. And yes - you are right in your last sentence, for a weaker memory model - the StoreStore barrier would be required. On a side-note that is what is used when you safely publish a reference via final fields inside a constructor. Upon exiting the constructor there are two fences inserted: LoadStore and StoreStore.

Take all this with a grain of salt - a JVM is free to ignore these as long as it does not break any rules: Aleksey Shipilev has a great talk about this.

EDIT

Suppose you have this case :

[StoreStore|LoadStore]
int x = 4; // volatile store of a shared "x" variable

int y = 3; // non-volatile store of shared variable "y"

int z = x; // volatile load
[LoadLoad|LoadStore]

Basically there is no barrier that would prevent the volatile store to be re-ordered with the volatile load (i.e.: the volatile load would be performed first) and that would cause problems obviously; sequential consistency thus being violated.

You are sort of missing the point here btw (if I am not mistaken) via Every action after volatile load won't be reordered before volatile load is visible. Re-ordering is not possible with the volatile itself - other operations are free to be re-ordered. Let me give you an example:

 int tmp = i; // volatile load of a shared variable "i"
 // [LoadStore|LoadLoad]

 int x = 3; // plain store
 int y = 4; // plain store

The last two operations x = 3 and y = 4 are absolutely free to be re-ordered, they can't float above the volatile, but they can be re-ordered via themselves. The above example would be perfectly legal:

 int tmp = i; // volatile load
 // [LoadStore|LoadLoad]

 // see how they have been inverted here...
 int y = 4; // plain store
 int x = 3; // plain store

Caskey answered 18/7, 2017 at 8:35 Comment(9)

thanks for your impressive answer. I edited my post and ask one detail. – Ruffle 18/7, 2017 at 18:53

I have difficulties with your notation. What’s the difference between variables you declare in the code examples and those you don’t? When you write int tmp = i;, is it supposed to be a “volatile load”, because i is suposed to be a shared, volatile variable while tmp is supposed to be a local variable (hence, not a store operation)? Then, why is int x = 3; a “plain load”? Either, it is a store, if x is shared, or it is just nothing, if x is a local variable. In either case, I don’t see why it should be impossible to move the operation before the volatile load. – Kooima 3/8, 2018 at 12:35

@Kooima thank you for the comments, found the time to correct some missing parts. As far as the last question: we can't move any stores or loads above a volatile load, because this will break happens before. A single example proving this would be enough. Let's say I have this: x = 3; y = 4 where x is non volatile and y is volatile; in another thread I have z = y, x = 6; if that x = 6 would be allowed to float above z = y it would mean that once I read z to be 4, x could be 6 - but it really has to be 3. At least this is my understanding of this. – Caskey 15/10, 2018 at 3:28

@Caskey just think about it a second. When you execute z = y, x = 6, what value will x have? Obviously, it will be 6, regardless of whether you move the store or not. For any other thread, there is no guaranty anyway, they may read 3 or 6. So moving x = 6 before the z = y does not affect the program’s behavior at all. – Kooima 15/10, 2018 at 7:24

@Kooima I have no idea what to think of this, initially I wanted to comment with z = y; if(x == 3) {do something}; x = 6; so x = 6 would float across the barrier, but this would not be allowed because of sequential consistency, I tried thinking of more example where this would break, but could not. It's just... very weird that so many places actually say this has to happen. I wish I had an ARM cpu to test this right now – Caskey 16/10, 2018 at 0:35

That’s a naive view on it. When you have code like if(x == 3) {do something}; x = 6;, an optimizer still could transform it to something like (local) tmp$x = x; x = 6; if(tmp$x == 3) {do something};, without violating the sequential consistency. Besides that, it’s worth keeping in mind that barriers are not Java’s memory model. So the behavior of HotSpot, which mostly follows the JSR 133 Cookbook, is not sufficient to conclude what a conforming optimizing JVM could do when not being that conservative. – Kooima 16/10, 2018 at 7:12

@Kooima I know that barriers are not the JMM, even Shipilev has a great talk about this (it's in russian though); the naivety comes from lack of such deep understanding, it's not on purpose. coming to your example, are you saying that x = 6 is now allowed to float the barrier? cause if you do that would mean that tmp$x could be seen as 6 and thus if(tmp$x == 3) would fail; if that floating across the barrier would not be allowed, that if would succeed; in my understanding that has to succeed. I might need to let this one go for a short while and come to read it again... – Caskey 16/10, 2018 at 15:55

You are assuming that it was impossible to write to x without subsequently reading the older value of x. The tmp$x was just one example of how this is possible. Of course, in this simple example, x = 6 can not get moved before the tmp$x = x. But this is entirely unrelated to the volatile load of z = y. Even worse, in your answer, there is no load of x. There is only a store of x after the store of y. Though, you have written int x = 3; // plain load, despite you’re obviously writing to the variable. As said in my first comment, the problems start with that inconsistency – Kooima 16/10, 2018 at 17:15

@Kooima thank you for the patience, I think I understand what you mean; but now I don't know if such above the barriers operations would take place. This would either mean that the VM has to analyze the code to prove that those barriers are not needed or may the processor might, I don't really know. Either way, this has gotten me a lot further with my understanding thx to you – Caskey 17/10, 2018 at 18:22

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags