Cache-friendly memory access in tight physics and collision loops
Asked Answered
Z

3

6

I am writing a physics engine and having a hard time figuring out a good way to design my data storage.

The functionality that I want:

  • Have a class that represents a PhysicsBody
  • Have a class that represents a collision volume (Lets say a box)
  • Each physics body can have one more collision volumes attached to it
  • Possible to have physics body without a collision volume
  • Optional: CollisionVolume without a physics body. (think Trigger Volumes)

Right now I have basically two loops. One that updates the physics bodies in the simulation. It updates their position/velocity/rotation. The second loop performs collision detection over all the collision volumes. Its just a nested for loop that checks for collisions between each pair of collision volumes. (I know it can be done better but that's a separate topic)

I know that the ideal way is to store objects in contiguous arrays.

std::vector<PhysicsBody> m_bodies;
std::vector<CollisionVolume> m_colliders;

Problems I found with this approach:

  • Its hard to maintain PhysicsBody -> CollisionVolume relationship. For example, if I want to remove a CollisionVolume from my vector, I would swap it with the last one and pop back. Data gets moved and if I stored an index to a CollisionVolume in PhysicsBody, its no more valid.
  • Anytime I destroy a PhysicsBody, the destructor will check if there was any collision volume attached to it and appropriately remove that from the Physics system as well. Problem is that a vector will make internal copies and destroy them and when that happens it will wreak havoc by removing collision volumes that weren't supposed to be removed.
  • CollisionVolume is actually a base class(doesn't have to be) and other classes derive from it like boxes/spheres and what not. I could potentially not use inheritance and come up with some other complicated design, but this is something to keep in mind.

I tried hard to find a way around it but ended up storing pointers instead:

std::vector<PhysicsBody*> m_bodies;
std::vector<CollisionVolume*> m_colliders;

The best solution to minimizing cache misses that I came up with was overloading new/delete and storing these objects in a memory pool just for the physics system.

Are there any other better solutions to this? Obviously performance is key.

Zosi answered 30/4, 2015 at 6:24 Comment(0)
C
2

A fundamental question: In the absence of threads running and modifying data from different cores(CPUs), where do you see the need to care about cache-coherency costs?

Cache-coherency protocol is triggered only when a line gets dirtied on a core different from the reader core or vice-versa.

It appears that you actually meant cache-locality? Is that right?

With the coherency Vs. locality out of the way, here is my take:

The moment you go to the vector, you lost direct control of managing locality. You can get back some of it by having a memory pool to draw from. Still, you would have to contend with relocation associated with resize operations.

Do you know the number of elements upfront? If yes, you can do this.

vector<T> myVec;
myVec.reserve(NUM_ELEMS);

followed by in-place new for each object from a contiguous area of memory.

myvec[i] = ...

The memory for the vector and the elements could entirely come from a single pool too. This can be achieved by passing in a custom allocator while instantiating the std::vector. Please see the following:

Comity answered 30/4, 2015 at 7:24 Comment(2)
You are correct. I did mean cache friendly. Eventually it would be nice to have this multithreaded but I don't see myself doing that anytime soon. I will fix the title!Zosi
I don't know the number of elements upfront, which is why I had to go the vector route. If I were to store objects into the vector then relocation costs could be expensive, but major relocations would be infrequent. A vector ensures that the underlying data is always in contiguous block of memory, so locality is preserved. But it doesn't solve the other problems.Zosi
P
0

An easy way to avoid losing mem locality and fuse memory blocks together to better hit the cache is to simply keep the physics elements you removed in the vector to avoid invalidating indices, but mark them as removed so that you can reclaim those vacant spaces on subsequent insertions.

Here, if you want to go the full route of making your own container, it really helps if you want to still invoke the destructors on these removed objects to understand how STL containers are implemented using 'placement new' and manual invocations of the destructor to avoid requiring things like an assignment operator for type T.

You can additionally push them to a list of vacant indices to reclaim whenever you insert a new one, or even faster, treat one these elements as a union to a list pointer, like so:

union PhysicsNode
{
    PhysicsBody body;
    PhysicsNode* next_free;
};
PhysicsNode* free_physics_nodes;

Same goes for collision node. In this case, you're treating this PhysicsNode like it's a PhysicsBody when it is 'occupied', and like a singly-linked list node when it's 'vacant' or 'free' to be reclaimed.

Unfortunately trying to tackle the problem at this level often has you messing with object-oriented design, constructor and destructor mechanisms, etc.

So when you want this kind of efficiency along with all the object-oriented programming goodies, that's when you might be tempted to solve this problem, basically in the same way, at the memory allocator level.

Patio answered 1/5, 2015 at 14:42 Comment(0)
S
0

I would like to point out significant difference in data model whether you plan to use vector engine (GPU) for computation or not. Typical data layout follows OOP and is called Array-of-Structures. For use vector engine of some kind, data to be reorganized into Structure-of-Arrays. More on topic from Intel

https://software.intel.com/en-us/articles/creating-a-particle-system-with-streaming-simd-extensions

Svend answered 1/5, 2015 at 15:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.