Distance of point feature to nearest polygon in R

Asked 8/5, 2013 at 19:10 Answered 13/6, 2014 at 10:0

Solved r polygon distance point nearest-neighbor

I working on a project at the moment, where I have a point feature -- the point feature includes a 142 points -- and multiple polygon (around 10). I want to calculate the distance between every single point and the nearest polygon feature in R.

My current approach is tedious and a bit long winded. I am currently planning to calculate the distance between every single point and every single polygon. For example, I would calculate the distance between the 142 points and Polygon A, the distance between the 142 points and Polygon B, the distance between 142 points and Polygon C, etc. Here is a sample code of one of these distance calculations:

dist_cen_polya <- dist2Line(centroids_coor, polygonA_shp)

After doing these calculations, I would write a code to choose the minimum/nearest distance between every single point and the closest polygon. The issue is that this procedure is tedious.

Does anyone know a package/code which would minimize the effort/computational time of the calculation? I would really like to use a package that compare a single to point to the nearest polygon feature or calculates the distance between a point and all polygons of interest?

Thank you.

Leonialeonid answered 8/5, 2013 at 19:10 Comment(8)

Judging by your last paragraph, you seem to have a mathematical problem: find a better algorithm than making the foll set of comparisons, right? That may be more suitable for the math SE. – Ammoniac 8/5, 2013 at 19:34

The spatstat package may be able to do what you want. I'm not an expert with that toolset, so can't confirm for sure. – Blumenthal 8/5, 2013 at 19:39

With the numbers involved here, 10 polygons and 142 points (1420 distances!) brute force shouldn't be a problem. The plyr package should help you if you don't like for loops. – Nomarch 8/5, 2013 at 19:43

If your polygons are small enough to be approximated as points (that is distance from point to polygon center is close enough to the distance from point to edge of polygon) you could use a Voronoi tessellation (aka Dirichlet tessellation) on the centers of the polygons (using the deldir package), and then whichever Voronoi tile a point is in will correspond to it's closest polygon. – Nomarch 8/5, 2013 at 19:46

@ shujaa I would have to do the same procedure for another twenty set of points and polygons. So I would be at close to 20,000 distances. I am hoping to make it less long winded. – Leonialeonid 8/5, 2013 at 20:7

@ Shujaa also the polygons are relatively large (country shapefiles), using Voronoi tessellation would not work well. – Leonialeonid 8/5, 2013 at 20:8

@Leonialeonid check out spatstat and nncross. – Furfuran 8/5, 2013 at 21:1

@Leonialeonid also check out rgeos and gDistance. And which.min will be helpful. I'd go through a worked example but you haven't posted any reproducible data. – Furfuran 8/5, 2013 at 21:27

Here I am using the gDistance function in the rgeos topology library. I am using a brute force double loop but it is surprisingly fast. It takes less than 2 seconds for 142 points and 10 polygons. I am sure that there is a more elgant way to perform the looping.

   require(rgeos)

    # CREATE SOME DATA USING meuse DATASET
    data(meuse)
      coordinates(meuse) <- ~x+y
        pts <- meuse[sample(1:dim(meuse)[1],142),]  
    data(meuse.grid) 
      coordinates(meuse.grid) = c("x", "y") 
        gridded(meuse.grid) <- TRUE 
          meuse.grid[["idist"]] = 1 - meuse.grid[["dist"]]    
        polys <- as(meuse.grid, "SpatialPolygonsDataFrame")
          polys <- polys[sample(1:dim(polys)[1],10),]   
    plot(polys)
      plot(pts,pch=19,cex=1.25,add=TRUE)      

    # LOOP USING gDistance, DISTANCES STORED IN LIST OBJECT
    Fdist <- list()
      for(i in 1:dim(pts)[1]) {
        pDist <- vector()
          for(j in 1:dim(polys)[1]) { 
            pDist <- append(pDist, gDistance(pts[i,],polys[j,])) 
          }
        Fdist[[i]] <- pDist
      } 

    # RETURN POLYGON (NUMBER) WITH THE SMALLEST DISTANCE FOR EACH POINT  
    ( min.dist <- unlist(lapply(Fdist, FUN=function(x) which(x == min(x))[1])) ) 

    # RETURN DISTANCE TO NEAREST POLYGON
    ( PolyDist <- unlist(lapply(Fdist, FUN=function(x) min(x)[1])) ) 

    # CREATE POLYGON-ID AND MINIMUM DISTANCE COLUMNS IN POINT FEATURE CLASS
    pts@data <- data.frame(pts@data, PolyID=min.dist, PDist=PolyDist)

    # PLOT RESULTS
    require(classInt)
    ( cuts <- classIntervals(pts@data$PDist, 10, style="quantile") )
       plotclr <- colorRampPalette(c("cyan", "yellow", "red"))( 20 )
         colcode <- findColours(cuts, plotclr)
    plot(polys,col="black")
      plot(pts, col=colcode, pch=19, add=TRUE)

The min.dist vector represents the row number of the polygon. For instance you could subset the nearest polygons by using this vector as such.

near.polys <- polys[unique(min.dist),]

The PolyDist vector contain the actual Cartesian minimum distances in the projection units of the features.

Tantalite answered 9/5, 2013 at 0:12 Comment(2)

@ Jeffrey Evans I do really thank you for this solution. I have used my data with the code, and it really works well! Thanks!! – Leonialeonid 9/5, 2013 at 14:9

@ Jeffrey Evans, seems that the byid argument in gDistance gives the same result as your double loop? gDistance(pts,polys, byid=T) - a 10 x 142 matrix which you can then manipulate to get the minimum distances, etc. – Arabelle 10/10, 2013 at 20:48

In a polygon you have quite a lot of lines. The distance between a polygon is either zero, if the point lies within the polygon or lies on an edge.

So actually you are looking for two cases:

Check if the point lies inside of any polygon (remember it might be more then a single polygon
Get a collection of all edges and calculate each distance of the point from the edge. The nearest distance gives you the distance of the polygon the edge belongs too.

So this is a simple algorithm that if we consider 10 edges per polygon takes O(10 * 10) * 142 for all of your points. That makes 100 * 142 = 14200 operations. => O(m * deltaE) * n (m is number of polygones, deltaE is the average number of edges per polygon, n is the number of points).

Now we check if we can speed this up. The first thing comming to mind is that we can use bounding box checks or bounding circles for each polygon.

Another idea is to prepare the nearest edges for each polygon for a set of angles. For example if you have 8 angles (every 45°) you can remove all edges from the list that are superseeded by another edge (therefore any point of the removed edge will yield always a greater distance than any point of the other edge of the same polygon.

This way typically you can might reduce the complexity in great margin. If you think of a rectangle you only have one or two edges instead of 4. If you think of a regular 8 edge polygon you might end up with typically one or two and at max 3 edges per polygon.

If you add the normal vector to each edge you can calculate if the point might be inside and you have to perform a scanline or whatever check (or you know its konvex) to be sure.

There are also mapping indexes possible like separating the two dimensional space by x and y in a equidistant mannor. Thisway you only have to test polygones lying in 9 sectors.

The next version might be using an R-tree that each bounding box (circle) of each node has to be checked for the minimum expected distance and the maximum expected distance. So one does not need to check the polygons of a node that result in far greater minimum distances than the maximum distances of another node.

Another thing is if you have a given tree like order like you have with map data. In a street map you always have world -> region -> country -> county -> city -> city sector ->...

Thisway you can search for the nearest location in a map of the whole world containing millions of polygons in a reasonable time mostly <10ms.

So to speak you have a lot of options here. And preprocessing your polygon list and extract relevant edges either by using binary space partition trees of polygons or use an angular approach or even go with something even more fancier. Its up to you. I expect that you end up by doing something in the logarithmic range like O(log(n) * log(deltaE)) becoming O(log(n)) as an average complexity.

Nealy answered 13/6, 2014 at 10:0 Comment(0)

Recommended topics

Hot tags