Matching georeferenced data with shape file in R
Asked Answered
I

1

7

I have a georeferenced event data set, in a data frame of the form:

LONGITUDE LATITUDE VAR1
33.4      4.4      5
33.4      4.4      3
33.4      4.4      1
30.4      4.2      2
28.4      5.1      2

It counts fatalities in events which are georeferenced. Apart from it, I have a shape file of the provinces in a country, like this:

> str(shapefile)
'data.frame':   216 obs. of  5 variables:
 $ CONSTI_COD: num  1 2 3 4 5 6 7 8 9 10 ...
 $ Area      : num  20 11.7 10.7 223.3 38.7 ...
 $ PROVINCE_NAME  : Factor w/ 216 levels "CENTRAL","COAST",..: 4 4 4 4 4 4 4 4 2 2 ...
 $ Shape_Leng: num  0.193 0.152 0.201 0.872 0.441 ...
 $ Shape_Area: num  0.001628 0.000947 0.000867 0.018135 0.003145 ...

 ..@ polygons   :List of 216
   .. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
   .. .. .. ..@ Polygons :List of 1
   .. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
   .. .. .. .. .. .. ..@ labpt  : num [1:2] 36.9 -1.3
   .. .. .. .. .. .. ..@ area   : num 0.00163
   .. .. .. .. .. .. ..@ hole   : logi FALSE
   .. .. .. .. .. .. ..@ ringDir: int 1
   .. .. .. .. .. .. ..@ coords : num [1:151, 1:2] 36.8 36.8 36.8 36.9 36.9 ...
   .. .. .. ..@ plotOrder: int 1
   .. .. .. ..@ labpt    : num [1:2] 36.9 -1.3
   .. .. .. ..@ ID       : chr "0"
   .. .. .. ..@ area     : num 0.00163
 [...etc]

What I need to do is to place the event data within the provinces, i.e. add a fourth column to the first data frame which states in which province happened each event, based on the coordinates. So I would have something like this:

LONGITUDE LATITUDE VAR1 PROVINCE
33.4      4.4      5    CENTRAL
33.4      4.4      3    CENTRAL
33.4      4.4      1    CENTRAL
30.4      4.2      2    COAST
28.4      5.1      2    COAST

Is this posible? I think I found some time ago a post that explained how to do this (outside Stack Overflow), but I cannot find it now.

Thanks!

(Sorry if there is a similar question over here. I made a search, but I didn't find an answer, maybe because I don't really know what I'm looking for. I would really appreciate a link to a similar post.)

Idaline answered 11/6, 2014 at 23:8 Comment(1)
Knowing the terminology makes all the difference when searching - I remember it being very frustrating not knowing the important keywords!Bookmark
B
6

What you're talking about is a "spatial join" (or "spatial intersection" or "overlay"). This is pretty straightforward with the help of the over function from the sp package.

Here's an example.

First, let's download and import a polygon shapefile of the world's countries.

download.file(paste0('http://www.naturalearthdata.com/http//',
                     'www.naturalearthdata.com/download/110m/cultural/',
                     'ne_110m_admin_0_countries.zip'), 
              f <- tempfile())
unzip(f, exdir=tempdir())
library(rgdal)
countries <- readOGR(tempdir(), 'ne_110m_admin_0_countries')

Now we'll create some random coordinate data that fall within the extent of the polygon shapefile. We then define the columns x and y as coordinates, and assign the same CRS as that of the polygons (although this may not be the case for your data; be sure to assign correct coordinate systems).

pts <- data.frame(x=runif(10, -180, 180), y=runif(10, -90, 90),
                  VAR1=LETTERS[1:10])
coordinates(pts) <- ~x+y  # pts needs to be a data.frame for this to work
proj4string(pts) <- proj4string(countries)

plot(countries)
points(pts, pch=20, col='red')

shp

Now we can perform the spatial overlay:

over(pts, countries)$admin

#  [1] <NA>      <NA>      Turkey    <NA>     
#  [5] Macedonia <NA>      China     Argentina
#  [9] <NA>      Canada   
# 177 Levels: Afghanistan Albania ... Zimbabwe

Note that in this case, some of the random points fell in the ocean (i.e. outside polygons). When intersected with the polygon object, these points return NA.

Now we cbind the desired attribute to pts:

cbind.data.frame(pts, country=over(pts, countries)$admin)

#             x          y VAR1   country
# 1   -52.59404 -37.422879    A      <NA>
# 2   -33.88867 -40.194482    B      <NA>
# 3    38.84383  37.272460    C    Turkey
# 4   -84.04949   7.118878    D      <NA>
# 5    20.98272  40.920470    E Macedonia
# 6  -155.32951 -37.612497    F      <NA>
# 7    99.40166  38.630049    G     China
# 8   -61.84025 -27.412885    H Argentina
# 9   -37.65287  -3.666080    I      <NA>
# 10 -112.81197  59.959475    J    Canada
Bookmark answered 11/6, 2014 at 23:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.