How do I find the polygon nearest to a point in R?
Asked Answered
S

2

14

I have a spatial points data frame and a spatial polygons data frame. For example, my polygons would be a polygon for each block in Manhattan. And the points are people, which are scattered all over, sometimes falling in the middle of a street, which is not part of a polygon.

I know how to check if a point is contained inside a polygon, but how could I assign points to their closest polygon?

## Make some example  data
set.seed(1)
library(raster)
library(rgdal)
library(rgeos)
p <- shapefile(system.file("external/lux.shp", package="raster"))
p2 <- as(1.5*extent(p), "SpatialPolygons")
proj4string(p2) <- proj4string(p)
pts <- spsample(p2-p, n=10, type="random")

## Plot to visualize
plot(pts, pch=16, cex=.5,col="red")
plot(p, col=colorRampPalette(blues9)(12), add=TRUE)

enter image description here

Sabelle answered 10/10, 2014 at 21:38 Comment(4)
First you bring some code and data, .... then we fix it.Sheffie
See how to create a reproducible example to make it easier for us to answer your questionSally
I'm not sure how to do it since this isn't really an error and I don't have permission to publish my data. I'll try to create some data.Sabelle
There, I added some reproducible data.Ibadan
D
19

Here's an answer that uses an approach based on that described by mdsumner in this excellent answer from a few years back.

One important note (added as an EDIT on 2/8/2015): rgeos, which is here used to compute distances, expects that the geometries on which it operates will be projected in planar coordinates. For these example data, that means that they should be first transformed into UTM coordinates (or some other planar projection). If you make the mistake of leaving the data in their original lat-long coordinates, the computed distances will be incorrect, as they will have treated degrees of latitude and longitude as having equal lengths.

library(rgeos)

##  First project data into a planar coordinate system (here UTM zone 32)
utmStr <- "+proj=utm +zone=%d +datum=NAD83 +units=m +no_defs +ellps=GRS80"
crs <- CRS(sprintf(utmStr, 32))
pUTM <- spTransform(p, crs)
ptsUTM <- spTransform(pts, crs)

## Set up containers for results
n <- length(ptsUTM)
nearestCanton <- character(n)
distToNearestCanton <- numeric(n)

## For each point, find name of nearest polygon (in this case, Belgian cantons)
for (i in seq_along(nearestCanton)) {
    gDists <- gDistance(ptsUTM[i,], pUTM, byid=TRUE)
    nearestCanton[i] <- pUTM$NAME_2[which.min(gDists)]
    distToNearestCanton[i] <- min(gDists)
}

## Check that it worked
data.frame(nearestCanton, distToNearestCanton)
#       nearestCanton distToNearestCanton
# 1             Wiltz           15342.222
# 2        Echternach            7470.728
# 3            Remich           20520.800
# 4          Clervaux            6658.167
# 5        Echternach           22177.771
# 6          Clervaux           26388.388
# 7           Redange            8135.764
# 8            Remich            2199.394
# 9  Esch-sur-Alzette           11776.534
# 10           Remich           14998.204

plot(pts, pch=16, col="red")
text(pts, 1:10, pos=3)
plot(p, add=TRUE)
text(p, p$NAME_2, cex=0.7)

enter image description here

Dialysis answered 10/10, 2014 at 22:20 Comment(6)
It took a while to run since I have lots of data, and it worked flawlessly! Can't thank you enough, Josh. Amazing.Sabelle
I still haven't processed all I need. I processed about 100,000 points and 1,000 polygons which is about 1% of what I have to do. It took about 30 minutes.Sabelle
@Lucho -- Yikes. doesn't seem like that's gonna be fast enough. It might be worth asking about faster alternative open source (QGIS?) or ArcGIS (if you've got it) methods over on gis.stackexchange.com .Ibadan
That sounds like a great idea, thanks! I've just started learning about this but I do have ArcGIS so I'll give that a try.Sabelle
This works perfectly, except note that lUTM must be replaced with pUTM. I actually implemented this with a SpatialLinesDataFrame instead of a SpatialPolygonsDataFrame. It works just as well for the distance output. I didn't save the identity of the closest line, of course, this wouldn't make sense for lines.Scrawny
@LeahBevis Thanks. Fixed that. (It was a typo from a slight revision I had made to the post yesterday). Glad this helped!Ibadan
F
2

I'm way late to the party here, but I just found this thread, and for what it is worth, offer this suggestion. The nn2 function from the RANN package allows you to search (for closest points) over only some limited radius, which can save considerable time. My suggestion is to add points over the polygons, associate the points with the polygons, then search for the closest point. It looks like the gDistance method is faster when there are not many points, but the nn2 method scales up better to larger problems, because it searches a limited radius (of course, it will fail to find a match if no points are inside that radius, so the radius must be correctly chosen). I'm new at this, so this might not be optimal. It would be nice if gDistance would allow for a restricted search.

## Make some example  data
set.seed(1)
library(raster)
library(rgdal)
library(rgeos)
library(RANN)
library(spatialEco)

p <- shapefile(system.file("external/lux.shp", package="raster"))

##  Project data into a planar coordinate system (here UTM zone 32)
utmStr <- "+proj=utm +zone=%d +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
crs <- CRS(sprintf(utmStr, 32))
pUTM <- spTransform(p, crs)

# the points of interest (all within some threshold distance of the polygons)
ptsUTM <- spsample(gBuffer(pUTM,width=2000)-pUTM, n=10000, type="random")
## Plot to visualize
plot(ptsUTM, pch=16, cex=.5,col="red")
plot(pUTM, col=colorRampPalette(blues9)(12), add=TRUE)

# the gDistance method
starttime <- Sys.time()
## Set up container for results
n <- length(ptsUTM)
nearestCantons <- character(n)
## For each point, find name of nearest polygon (in this case, Belgian cantons)
for (i in seq_along(nearestCantons)) {
    nearestCantons[i] <- pUTM$NAME_2[which.min(gDistance(ptsUTM[i,], pUTM, byid=TRUE))]
}
Sys.time()-starttime

# the nn2 method
starttime <- Sys.time()
## create search points and associate with polygon attributes
rp <- spsample(pUTM,n=10000,type="regular")
rp2 <- point.in.poly(rp,pUTM)
# search for nearest point (with radius)
nn <- nn2(coordinates(rp2),coordinates(ptsUTM),k=1,searchtype="radius",radius=5000)$nn.idx
nearestCantons2 <- rp2$NAME_2[nn]
Sys.time()-starttime

Faizabad answered 29/3, 2019 at 15:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.