Using R, I would like to overlay some spatial points and polygons in order to assign to the points some attributes of the geographic regions I have taken into consideration.
What I usually do is to use the command over
of the sp
package. My problems is that I'm working with a large number of geo-referenced events that happened all over the globe and in some cases (especially in coastal areas), the longitude and latitude combination falls slightly outside the country/region border.
Here a reproducible example based on in this very good question.
## example data
set.seed(1)
library(raster)
library(rgdal)
library(sp)
p <- shapefile(system.file("external/lux.shp", package="raster"))
p2 <- as(0.30*extent(p), "SpatialPolygons")
proj4string(p2) <- proj4string(p)
pts1 <- spsample(p2-p, n=3, type="random")
pts2<- spsample(p, n=10, type="random")
pts<-rbind(pts1, pts2)
## Plot to visualize
plot(p, col=colorRampPalette(blues9)(12))
plot(pts, pch=16, cex=.5,col="red", add=TRUE)
# overlay
pts_index<-over(pts, p)
# result
pts_index
#> ID_1 NAME_1 ID_2 NAME_2 AREA
#>1 NA <NA> <NA> <NA> NA
#>2 NA <NA> <NA> <NA> NA
#>3 NA <NA> <NA> <NA> NA
#>4 1 Diekirch 1 Clervaux 312
#>5 1 Diekirch 5 Wiltz 263
#>6 2 Grevenmacher 12 Grevenmacher 210
#>7 2 Grevenmacher 6 Echternach 188
#>8 3 Luxembourg 9 Esch-sur-Alzette 251
#>9 1 Diekirch 3 Redange 259
#>10 2 Grevenmacher 7 Remich 129
#>11 1 Diekirch 1 Clervaux 312
#>12 1 Diekirch 5 Wiltz 263
#>13 2 Grevenmacher 7 Remich 129
Is there a way to give to the over
function a sort of tolerance in order to capture also the points that are very close to the border?
NOTE:
Following this I could assign to the missing point the nearest polygon, but this is not exactly what I am after.
EDIT: nearest neighbor solution
#adding lon and lat to the table
pts_index$lon<-pts@coords[,1]
pts_index$lat<-pts@coords[,2]
#add an ID to split and then re-compose the table
pts_index$split_id<-seq(1,nrow(pts_index),1)
#filtering out the missed points
library(dplyr)
library(geosphere)
missed_pts<-filter(pts_index, is.na(NAME_1))
pts_missed<-SpatialPoints(missed_pts[,c(6,7)],proj4string=CRS(proj4string(p)))
#find the nearest neighbors' characteristics
n <- length(pts_missed)
nearestID1 <- character(n)
nearestNAME1 <- character(n)
nearestID2 <- character(n)
nearestNAME2 <- character(n)
nearestAREA <- character(n)
for (i in seq_along(nearestID1)) {
nearestID1[i] <- as.character(p$ID_1[which.min(dist2Line (pts_missed[i,], p))])
nearestNAME1[i] <- as.character(p$NAME_1[which.min(dist2Line (pts_missed[i,], p))])
nearestID2[i] <- as.character(p$ID_2[which.min(dist2Line (pts_missed[i,], p))])
nearestNAME2[i] <- as.character(p$NAME_2[which.min(dist2Line (pts_missed[i,], p))])
nearestAREA[i] <- as.character(p$AREA[which.min(dist2Line (pts_missed[i,], p))])
}
missed_pts$ID_1<-nearestID1
missed_pts$NAME_1<-nearestNAME1
missed_pts$ID_2<-nearestID2
missed_pts$NAME_2<-nearestNAME2
missed_pts$AREA<-nearestAREA
#missed_pts have now the characteristics of the nearest poliygon
#bringing now everything toogether
pts_index[match(missed_pts$split_id, pts_index$split_id),] <- missed_pts
pts_index<-pts_index[,-c(6:8)]
pts_index
ID_1 NAME_1 ID_2 NAME_2 AREA
1 1 Diekirch 4 Vianden 76
2 1 Diekirch 4 Vianden 76
3 1 Diekirch 4 Vianden 76
4 1 Diekirch 1 Clervaux 312
5 1 Diekirch 5 Wiltz 263
6 2 Grevenmacher 12 Grevenmacher 210
7 2 Grevenmacher 6 Echternach 188
8 3 Luxembourg 9 Esch-sur-Alzette 251
9 1 Diekirch 3 Redange 259
10 2 Grevenmacher 7 Remich 129
11 1 Diekirch 1 Clervaux 312
12 1 Diekirch 5 Wiltz 263
13 2 Grevenmacher 7 Remich 129
This is exactly the same output as the one proposed by @Gilles in his answer. I am just wondering if there is something more efficient than all this.
over
that gives the possibility to avoid additional coding to allocate the points that are very close to the border of the polygon – Apogeotropism