Solving floating-point rounding issues C++

Asked 11/4, 2011 at 22:1 Answered 12/4, 2011 at 3:33

Solved c++scientific-computing matrix-multiplication floating-point-precision

I develop a scientific application (simulation of chromosomes moving in a cell nucleus). The chromosomes are divided in small fragments that rotate around a random axis using 4x4 rotation matrices.

The problem is that the simulation performs hundreds of billions of rotations, therefore the floating-point rounding errors stack up and grow exponentially, so the fragments tend to "float away" and detach from the rest of the chromosome as time passes.

I use double precision with C++. The soft runs on CPU for the moment but will be ported for CUDA, and simulations can last for 1 month at most.

I have no idea how I could somehow renormalize the chromosome, because all fragments are chained together (you can see it as a doubly linked-list), but I think that would be the best idea, if possible.

Do you have any suggestions ? I feel a bit lost.

Thank you very much,

EDIT: Added a simplified sample code. You can assume all matrix math are classical implementations.

// Rotate 1000000 times
for (int i = 0; i < 1000000; ++i)
{
    // Pick a random section start
    int istart = rand() % chromosome->length;

    // Pick the end 20 segments further (cyclic)
    int iend = (istart + 20) % chromosome->length;

    // Build rotation axis
    Vector4 axis = chromosome->segments[istart].position - chromosome->segments[iend].position;
    axis.normalize();

    // Build rotation matrix and translation vector
    Matrix4 rotm(axis, rand() / float(RAND_MAX));
    Vector4 oldpos = chromosome->segments[istart].position;

    // Rotate each segment between istart and iend using rotm
    for (int j = (istart + 1) % chromosome->length; j != iend; ++j, j %= chromosome->length)
    {
        chromosome->segments[j].position -= oldpos;
        chromosome->segments[j].position.transform(rotm);
        chromosome->segments[j].position += oldpos;
    }
}

Blenny answered 11/4, 2011 at 22:1 Comment(2)

Numerical analysis and stability are huge fields. There's no one right answer. Without seeing some example code, it's difficult to give any specific advice. – Rexanne 11/4, 2011 at 22:4

BTW this sounds like a cool project. – Hoang 11/4, 2011 at 22:26

You need to find some constraint for your system and work to keep that within some reasonable bounds. I've done a bunch of molecular collision simulations and in those systems the total energy is conserved, so every step I double check the total energy of the system and if it varies by some threshold, then I know that my time step was poorly chosen (too big or too small) and I pick a new time step and rerun it. That way I can keep track of what's happening to the system in real time.

For this simulation, I don't know what conserved quantity you have, but if you have one, you can try to keep that constant. Remember, making your time step smaller doesn't always increase the accuracy, you need to optimize the step size with the amount of precision you have. I've had numerical simulations run for weeks of CPU time and conserved quantities were always within 1 part in 10^8, so it is possible, you just need to play around some.

Also, as Tomalak said, maybe try to always reference your system to the start time rather than to the previous step. So rather than always moving your chromosomes keep the chromosomes at their start place and store with them a transformation matrix that gets you to the current location. When you compute your new rotation, just modify the transformation matrix. It may seem silly, but sometimes this works well because the errors average out to 0.

For example, lets say I have a particle that sits at (x,y) and every step I calculate (dx, dy) and move the particle. The step-wise way would do this

t0 (x0,y0)
t1 (x0,y0) + (dx,dy) -> (x1, y1)
t2 (x1,y1) + (dx,dy) -> (x2, y2)
t3 (x2,y2) + (dx,dy) -> (x3, y3)
t4 (x3,30) + (dx,dy) -> (x4, y4)
...

If you always reference to t0, you could do this

t0 (x0, y0) (0, 0)
t1 (x0, y0) (0, 0) + (dx, dy) -> (x0, y0) (dx1, dy1)
t2 (x0, y0) (dx1, dy1) + (dx, dy) -> (x0, y0) (dx2, dy2)
t3 (x0, y0) (dx2, dy2) + (dx, dy) -> (x0, y0) (dx3, dy3)

So at any time, tn, to get your real position you have to do (x0, y0) + (dxn, dyn)

Now for simple translation like my example, you're probably not going to win very much. But for rotation, this can be a life saver. Just keep a matrix with the Euler angles associated with each chromosome and update that rather than the actual position of the chromosome. At least this way they won't float away.

Cartilage answered 11/4, 2011 at 23:5 Comment(4)

+1 for checking energy. I was going to suggest that, from the perspective of ensuring the numerics converge to a particular solution. (However, I would discourage referencing to the start time for a long-term simulation, as the floating-point time value will lose precision and possibly stall.) – Selectivity 11/4, 2011 at 23:23

Conserving energy is a great way to go, if at least to notice when your system goes wrong. How to correct for a gain/loss of energy can of course be challenging. It's also not perfect, since part of the system may gain, and another part lose, equaling 0. – Jacquie 12/4, 2011 at 3:59

Just to clarify, energy isn't conserved in a system like this, since it isn't closed. Instead, entropy is maximized: every simulation step should have a relatively higher probability of reducing the total energy, and then random fluctuations restore the temperature to normal. – Selectivity 12/4, 2011 at 7:59

I just tried the transformation matrix idea, it's even worse, in fact, because the vector translations get replaced by translation matrices that increase the number of multiplications and therefore the precision issues. On the other hand, the energy idea seems very much usable. The chromosome tends to stabilize itself to the lowest energy level, so including a special feature "if-you-change-your-size-then-you're-gaining-energy-dude" should help the system converge while avoiding errors. Thanks !: – Blenny 12/4, 2011 at 8:41

Write your formulae so that the data for timestep T does not derive solely from the floating-point data in timestep T-1. Try to ensure that the production of floating-point errors is limited to a single timestep.

It's hard to say anything more specific here without a more specific problem to solve.

Hoang answered 11/4, 2011 at 22:4 Comment(1)

@Heandel: I see the issue from your code (and remember getting frustrated by it a few times). Off the top of my head I'm not really sure what you can do about it, other than using some bignum library to maximise the precision available for your floating-point types. I might be missing something obvious, though; I haven't done graphics maths in ages. – Hoang 11/4, 2011 at 22:27

The problem description is rather vague, so here are some rather vague suggestions.

Option 1:

Find some set of constraints such that (1) they should always hold, (2) if they fail, but only just, it's easy to tweak the system so that they do, (3) if they do all hold then your simulation isn't going badly crazy, and (4) when the system starts to go crazy the constraints start failing but only slightly. For instance, perhaps the distance between adjacent bits of chromosome should be at most d, for some d, and if a few of the distances are just slightly greater than d then you can (e.g.) walk along the chromosome from one end, fixing up any distances that are too big by moving the next fragment towards its predecessor, along with all its successors. Or something.

Then check the constraints often enough to be sure that any violation will still be small when caught; and when you catch a violation, fix things up. (You should probably arrange that when you fix things up, you "more than satisfy" the constraints.)

If it's cheap to check the constraints all the time, then of course you can do that. (Doing so may also enable you to do the fixup more cheaply, e.g. if it means that any violations are always tiny.)

Option 2:

Find a new way of describing the state of the system that makes it impossible for the problem to arise. For instance, maybe (I doubt this) you can just store a rotation matrix for each adjacent pair of fragments, and force it always to be an orthogonal matrix, and then let the positions of the fragments be implicitly determined by those rotation matrices.

Option 3:

Instead of thinking of your constraints as constraints, supply some small "restoring forces" so that when something gets out of line it tends to get pulled back towards the way it should be. Take care that when nothing is wrong the restoring forces are zero or at least very negligible, so that they don't perturb your results more badly than the original numeric errors did.

Wynd answered 11/4, 2011 at 22:14 Comment(0)

I think it depends on the compiler you are using.

Visual Studio compiler support the /fp switch which tells the behavior of the floating point operations

you can read more about it. Basically, /fp:strict is the harshest mode

Borne answered 11/4, 2011 at 22:6 Comment(0)

I guess it depends on the required precision, but you could use 'integer' based floating point numbers. With this approach, you use an integer and provide your own offset for the number of decimals.

For example, with a precision of 4 decimal points, you would have

float value -> int value 1.0000 -> 10000 1.0001 -> 10001 0.9999 -> 09999

You have to be careful when you do your multiply and divide and be careful when you apply your precision offsets. Other wise you can quickly get overflow errors.

1.0001 * 1.0001 becomes 10001 * 10001 / 10000

Spenser answered 11/4, 2011 at 22:9 Comment(0)

If I read this code correctly, at no time is the distance between any two adjacent chromosome segments supposed to change. In that case, before the main loop compute the distance between each pair of adjacent points, and after the main loop, move each point if necessary to have the proper distance from the previous point.

You may need to enforce this constraint several times during the main loop, depending on circumstances.

Falzetta answered 11/4, 2011 at 23:44 Comment(0)

Basically, you need to avoid the accumulation of error from these (inexact) matrix operators and there are two major ways of doing so in most applications.

Instead of writing the position as some initial position operated on many times, you can write out what the operator would be explicitly after N operations. For instance, imagine you had a position x and you were adding a value e (that you couldn't represent exactly.) Much better than computing x += e; a large amount of times would be to compute x + EN; where EN is some more accurate way of representing what happens with the operation after N times. You should think whether you have some way of representing the action of many rotations more accurately.
Slightly more artificial is to take your newly found point and project off any discrepancies from the expected radius from your center of rotation. This will guarantee that it doesn't drift off (but won't necessarily guarantee that the rotation angle is accurate.)

Moppet answered 12/4, 2011 at 3:33 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags