Looking for a good space-partitioning data structure to generate millions of atomic bonds quickly from
Asked Answered
F

1

7

I'm performing some MD simulations involving systems of millions of atoms.

I have written some code to generate a file which is just a listing of XYZ atom coordinates. Now I need to generate bonds between the atoms. If two atoms are within a certain distance of each other, that is considered a bond.

Example XYZ file:

1 0 0
2 0 0
7 0 0
10 0 0
9 0 0

So I've got five atoms. If my distance threshold is 2 units, then my bond listing will be:

1 2
3 5
4 5

(where the numbers correspond to the index of the coordinate in the XYZ file).

The naive approach to generate this list is just:

for i = 1:numAtoms
    for j = i+1:numAtoms
        if distance(atom[i], atom[j]) < 2
            bonds.push [i, j]

However, this quickly hits an algorithmic limit and is slow even in highly optimized C for millions of atoms, at least for as frequently as I'll be doing this process.

The only experience I have with space-partitioning data structures is with kd-trees when I wrote a photon mapper once, so I don't really know what the best solution to this problem is. I'm sure there's probably something out there that's optimal for this though.

I should also mention that my simulation box is periodic, meaning that an atom at (0.5, 0, 0) will bond to an atom at (boxWidth - 0.5, 0, 0) with a distance threshold such as 2.

Freeborn answered 30/1, 2013 at 22:43 Comment(1)
You could use the exact same kd-tree you used from your photon mapper. You would then search for all particles that within the threshold radius from a particle (for each particle). You would also have to modify the search algorithm to take into account the boundary conditions. This seems the easiest to implement in your case since you have the kd-tree data structure already.Laccolith
B
7

Simple solutions are the first to try. They are quick to code, and easy to test. If that does not give you the required performance, then you can try something trickier.

So, you can seriously prune the search space by assigning grid co-ordinates to your atoms. Nothing technical. Like a poor-man's octree...

All you need to do is have a grid size of 2, and search all atoms within the local grid and each neighbouring grid. Obviously the grid is 3D. An atom's grid location is nothing more complex than dividing each its co-ordinates by the grid size.

Obviously you do a preliminary pass and create a list (or vector) of atoms belonging to each cell. You can store the lists in a map indexed by the 3D grid co-ordinate. Then for each atom, you can just lookup the local lists and do the bond tests.

Also, don't use square-root for your distance. Operate with distance squared instead. This will save a bucket-load of processing.

Ballenger answered 30/1, 2013 at 22:56 Comment(2)
Thanks for accepting my answer =) Did you have success with this approach, or did you need to do something more tricky?Ballenger
As usual, the simplest approach is the best. I tried your suggestion along with a VP Tree. The VP Tree required too much maintenance and cognitive overhead. The grid is hard to beat in terms of "just working".Freeborn

© 2022 - 2024 — McMap. All rights reserved.