For each point in one data set, calculate distance to nearest point in second data set
Asked Answered
G

2

5

Trying to find, for each point in a SpatialPointsDataFrame, the distance to the closest point in a second SpatialPointsDataFrame (equivalent to the "nearest" tool in ArcGIS for two SpatialPointDataFrames).

I can do the naive implementation by calculating all pairwise distances using gDistance and taking the min (like answer 1 here), but I have some huge datasets and was looking for something more efficient.

For example, here's a trick with knearneigh for points in same dataset.

Cross-posted on r-sig-geo

Glauce answered 19/5, 2016 at 20:48 Comment(10)
It looks like spDists in the sp package might work for what you want. It's first two arguments appear to be different matrices that could be used to represent two sets of points as in your example. Worth a look anyway.Canella
@Imo thanks! Looks like it's still calculating every pair, so may have same performance issue. Will check against gDistance, but seems like they're doing roughly the same thing.Glauce
Please don't cross-post.Selectivity
@JoshO'Brien Why not? If I get an answer on one page I always bring it back to the other, but lists serve different populations -- this way knowledge diffuses!Glauce
Mostly because it ends up being an added burden on the support community. For instance I wouldn't have spent the time to put together this answer if I'd known Michael Sumner had already given you an answer over on R-sig-geo. (Of course, it's my bad for not reading your question all the way through to see the cross-posting.) That said, thanks at least for leaving the note that you had cross-posted.Selectivity
@JoshO'Brien Ah, ok! Thanks for the heads up -- I'll refrain the the future. But thank you for the answer!Glauce
You bet. I'm happy to have learned about nabor.Selectivity
@JoshO'Brien For what it's worth, your answer seems to include a more generalized library -- I have a set of GIS in R tutorials at nickeubank.com/gis-in-r I'll be sure to add SearchTrees to it!Glauce
@JoshO'Brien up: nickeubank.com/wp-content/uploads/2015/10/… . Thanks!Glauce
See my answer here : #21978220 . The Dist matrix is what you want I think .....Rienzi
B
11

The SearchTrees package offers one solution. Quoting from its documentation, it, "provides an implementation of the QuadTree data structure [which it] uses to implement fast k-Nearest Neighbor [...] lookups in two dimensions."

Here's how you could use it to quickly find, for each point in a SpatialPoints object b, the two nearest points in a second SpatialPoints object B

library(sp)
library(SearchTrees)

## Example data
set.seed(1)
A <- SpatialPoints(cbind(x=rnorm(100), y=rnorm(100)))
B <- SpatialPoints(cbind(x=c(-1, 0, 1), y=c(1, 0, -1)))

## Find indices of the two nearest points in A to each of the points in B
tree <- createTree(coordinates(A))
inds <- knnLookup(tree, newdat=coordinates(B), k=2)

## Show that it worked
plot(A, pch=1, cex=1.2)
points(B, col=c("blue", "red", "green"), pch=17, cex=1.5)
## Plot two nearest neigbors
points(A[inds[1,],], pch=16, col=adjustcolor("blue", alpha=0.7))
points(A[inds[2,],], pch=16, col=adjustcolor("red", alpha=0.7))
points(A[inds[3,],], pch=16, col=adjustcolor("green", alpha=0.7))

enter image description here

Blennioid answered 19/5, 2016 at 22:30 Comment(4)
Hi Josh, I have the same question, but your answer doesn't actually answer the full question. This provides the coordinates to of the closest points in A, but not the distance from the points in B to the points in A. Could you possibly add the extra code that would provide a vector (or 2 vectors) of the min distances? A vector that could then be added back as a data column within the shapefile A?Creaturely
@LeahBevis This should get you the distances to each of the points indexed in inds: t(sapply(seq_len(nrow(inds)), function(i) spDists(B[i, ], A[inds[i, ],]))). HTH.Selectivity
Thanks so much! This was brilliant. For future users, I didn't understand the inds: bit, and didn't need the t(), but it worked when I applied it like this: distkm <- sapply(seq_len(nrow(inds)), function(i) spDists(tzprice.shp[i, ], market.shp[inds[i, ],]))Creaturely
Also, I THINK the answer is in KM, even though I didn't specify longlat = FALSE w/in spDists. Output was the same whether I specified that option or not. Also, feel free to check out this related post : ) Still wondering about fastest way to get distance to nearest lines in meters/km. #62335829Creaturely
G
0

Another suggestion from R-Sig-Geo is the knn function in the nabor library.

Glauce answered 19/5, 2016 at 22:49 Comment(1)
Hi Nick, I found your cross-posting (always nice to actually give the link; I found this stat.ethz.ch/pipermail/r-sig-geo/2016-May/024452.html) but it doesn't seem to provide distance in km or m. Seems like it's some sort of Euclidean distance based on the coordinates? Did you actually use this knn function (or the trick from Josh above) to calculate physical distances? If so, could you share? Thanks!!Creaturely

© 2022 - 2024 — McMap. All rights reserved.