I have two sets of points, called path
and centers
. For each point in path
, I would like an efficient method for finding the ID of the closest point in centers
. I would like to do this in R. Below is a simple reproducible example.
set.seed(1)
n <- 10000
x <- 100*cumprod(1 + rnorm(n, 0.0001, 0.002))
y <- 50*cumprod(1 + rnorm(n, 0.0001, 0.002))
path <- data.frame(cbind(x=x, y=y))
centers <- expand.grid(x=seq(0, 500,by=0.5) + rnorm(1001),
y=seq(0, 500, by=0.2) + rnorm(2501))
centers$id <- seq(nrow(centers))
x
and y
are coordinates. I would like to add a column to the path
data.frame that has the id of the closest center for the given x and y co-ordinate. I then want to get all of the unique ids.
My solution at the moment does work, but is very slow when the scale of the problem increases. I would like something much more efficient.
path$closest.id <- sapply(seq(nrow(path)), function(z){
tmp <- ((centers$x - path[z, 'x'])^2) + ((centers$y - path[z, 'y'])^2)
as.numeric(centers[tmp == min(tmp), 'id'])
})
output <- unique(path$closest.id)
Any help on speeding this up would be greatly appreciated.
I think data.table
might help, but ideally what I am looking for is an algorithm that is perhaps a bit smarter in terms of the search, i.e. instead of calculating the distances to each center and then only selecting the minimum one... to get the id...
I am also happy to use Rcpp
/Rcpp11
as well if that would help improve performance.
My minimum acceptable time to perform this kind of calculation out would be 10 seconds, but obviously faster would be better.
n
andnrow(centers)
? – Nucleasesapply(seq(100),function(z){
it takes 9.11 seconds, so linearly extrapolating the sample data which has 10000 it should take 911 seconds which is about 15 minutes 11 seconds... – Sphericalas.numeric(centers[tmp==min(tmp),'id'])
withwhich.min(tmp)
, achieving the same result. But I gather you need a speedup of 100, so I have 50 still to go... – Claiborncenters
representative of your truecenters
object? i.e. can we assume that for eachy
, a point exists at eachx
? – Copyeditcenters
include allxy
combinations of your set of uniquex
and your set of uniquey
? i.e., like the output ofexpand.grid
. – Copyedit