Floating point determinism between Apple A5 and Apple A6 CPUs
Asked Answered
K

3

8

I am developing a multiplayer game with Box2D physics for iOS. The multiplayer is using lock-step method as usual. The game updates physics world fixed-timely. There is no desync among iOS devices with same CPU.

However, when testing with new iOS devices with Apple A6 chip, desync happened. Viewing my log file gives me the impression that desync happens quite fast and it was probably because of some floating point operation that I could not find out which yet.

I can guarantee that only Box2D is the only module needed to be synchronized in the design of the game, and all mutliplayer commands and inputs are not out-of-sync according to my log.

I have tried changing all transcendental functions: sinf, cosf, pow, sqrtf, atan2f to double version, but without any luck.

Is there any way to force Apple A6 treating floating point numbers as same as Apple A5 like some compiler options?

I will really appreciate any answer.

Klaus answered 15/12, 2012 at 2:57 Comment(5)
Are you really sure your simulation is being offset that much by such small things?Sigismond
i doubt that an A6 will deliver another float value.Hollingsworth
Although it is a real pain to do, the best way to find out what's different is to log the world state for every time step on both endpoints, using a hex-representation of floats to make sure you can text-diff the files to find even a single bit of difference. Run the exact same input every run, to find which time step they diverge in, then you can think about what happened in between those two time steps.Potboy
Thanks all for your help so far. I really appreciate! --- To Vaughan Hilts: Although small in difference, when objects collide it yields difference quite quickly. When nothing collides, it's hard to notice any change with naked eyes. ---- To Alex Wien: I have tried testing with random numbers, and yes, results seem to be the same on both A5 and A6. There could be something I did not find out yet. --- To iforce2d: I am going to log all math calculation of Box2D to spot out the difference. In fact, this desync happens almost instantly so, there is no need for me to do replaying.Plossl
I have tried logging at every call to transcendental functions, without luck. Yes, desync happens there but it happens right at the arguments passed to the function!Plossl
D
5

A number of math library functions use different algorithms on the A5 and A6. If they differ by more than an ulp or two, you may have found a bug; please report it. Otherwise, the variation is likely within the expected tolerances of good-quality math library. For a glimpse into the reasons of why this is so, the best reference is Ian Ollmann's email to the mac-games-dev mailing list several years ago, "the math library is not a security tool", which addressed this exact issue in the context of Mac OS X. (the tl;dr version is that the goal of delivering bit-identical results across architectures, which some game developers want, is fundamentally in conflict with delivering high-accuracy answers as efficiently as possible on all architectures, which all developers [and users, since it benefits responsiveness and battery life] want; something has to give, and for a general-purpose system library the latter necessarily takes priority). The Apple developer forums would be another good place to look for information.

Duenna answered 16/12, 2012 at 23:48 Comment(3)
Thank Stephen very much for the link. It is very helpful. I am going to apply crlibm to test. I will report my results as soon as I can (or somehow) fix this problem.Plossl
Unfortunately, float variation seems to be everywhere, not only because of transcendental functions but also of different CPU processors. I asked the same question on Apple developer forum, and they told me that I should have used another method to make the game. I am really stuck here. The only way left for me would be separating players of different iOS versions and CPUs. But that would be lame...Plossl
arm64 will bring in a whole 'nother kettle of fish with fma, subnormals, difference compiler scheduling reordering adds and the possibility of rounding control sneaking into your floating point model. Give up on this approach before it drives you mad, I say. Appoint one machine the arbiter of reality and have everyone sync up with it occasionally. Be tolerant of small errors. It's a game. Perfection is overrated.Theosophy
L
3

it is actually Nguyen Truong Chung again.

Thank you all very much for your answers so far. I really appreciate your answers, which enlightened me the path to continue debugging! At the moment, I somehow found out the reason of the desync, but without concrete solutions yet. I wish to provide you with information I got, and hopefully I can get some more in-sights.

1. Finding:

I have this function that use cos. I printed the log like this:

void rotateZ( float angle )

{

 if( angle )

 {

      const float sinTheta = sin( angle );

      const float cosTheta = cos( angle );



      // I logged here

      myLog( "Vector3D::SelfRotateZ(%x) %x, %x", *(unsigned int*)&angle, *(unsigned int*)&cosTheta, *(unsigned int*)&sinTheta );



      ....
 }

}

Desync happened like this:

On iPad4: Vector3D::SelfRotateZ(404800d2) bf7ff708, 3c8782bc On iPhone4: Vector3D::SelfRotateZ(404800d2) bf7ff709, 3c8782bc

2. Re-testing:

And the story does not stop here because:

  1. I tried these line of code at the beginning of the game:

{ unsigned int zz = 0x404800d2;

float yy = 0;

memcpy( &yy, &zz, 4 );

const float temp1 = cos( yy );


printf( "%x\n", *(unsigned int*)&temp1;

}

  1. I ran the code above on the same iPhone4, and guess what? I got this: bf7ff708

  2. I put that code in the update loop of the game and the result I got was still bf7ff708 at every loop.

  3. What is more? The value 0x404800d2 is an initialize value of the game, so every time the game starts, the two desync lines above are always present there.

3: The questioning:

So, I decided to forget what happened above, and temporarily replaced sin, cos function with simple Taylor implementations I found on dreamcode.net. The desync no longer happened.

It seems that the cos function is not even deterministic on the same iPhone 4 (OS version 5).

My question is: Do we have an explanation why cos function returns different result for the same input on a same phone? Here I have the input 0x404800d2, and two different outputs: bf7ff708 and bf7ff709. However, I cannot reproduce the result bf7ff709 by simply coding.

I guess I need the source code of math functions of the OS (floating-point version) in order to understand this clearly. Is above problem I found enough as a bug report?

Laticialaticiferous answered 18/12, 2012 at 10:2 Comment(0)
L
1

It's actually Nguyen Truong Chung again. :)

Thank you very much for your help so far.

I just want to report that the desync is fixed after I rewrote all transcendental functions like cos, sin, sqrt, pow, atan2, atan, asin, acos, etc. ( as many as possible ).

Laticialaticiferous answered 19/12, 2012 at 15:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.