I have large 2D arrays with unsorted (X,Y) points, for which I need to know which points are in close proximity to each other (nearest-neighbor lookup). I have used cKDTree and query_ball_tree with succes for arrays with around 500,000 (X,Y) points. However, when I try the same algorithm for datasets of more than 1,000,000 points, query_ball_tree results in a MemoryError.
I use 64-bit Windows with 16Gb of internal memory, and have tried both 32-bit and 64-bit versions of Python and the extension modules (scipy and numpy).
def Construct_SearchTree(AllXyPoints):
KDsearch = cKDTree(AllXyPoints)
return KDsearch.query_ball_tree(KDsearch, Maxdist)
My questions:
1) does anybody know of an alternative to cKDTree / query_ball_tree that consumes less memory? Speed is less important in this case that memory usage.
2) I hoped that switching from 32-bit to 64-bit python & extensions would solve the MemoryError. What could be the reason that it didn't?
Thanks for your incoming help and advice.