Import shapefile and join to coordinates to create spatial dataframe in R
Asked Answered
D

3

0

Before maptools and rgdal were deprecated, I used this code to import a shapefile, centroids csv and commuting count data, to create a flow map using the code below (I removed the ggplot code for legibility). I am trying to recreate this flow map with recent commuting data to compare changes between both time periods.

I read that read_sf/st_read was used to replace the readOGR function, so I used st_read to import the shapefile.

To join the shapefile to the centroid points, I need to get the polygon ID which is the common field for this join. However, the line soton@data$id = rownames(soton@data) which I'm using to get the polygon IDs that are stored as row names is throwing this error:

Error in soton@data : no applicable method for @ applied to an object of class "sf"

This code worked fine with maptools and rgdal, and I am using the exact same shapefile from before. I'll really appreciate help on tweaking my code to work with the new packages. Thank you.

library(tidyverse)  
library(plyr)  
library(ggplot2)  
library(maptools) -- replaced with library(sf)  
library(rgdal)    -- removed this and loaded sp package  
setwd("your working directory")
soton=st_read(dsn="SouthamptonMSOA",layer="SouthamptonOA")
ODmat=read.csv("ODSouthampton.csv")
centroids=read.csv("CentroidSouthampton.csv")
popdat=read.csv("SouthamptonWZ.csv")
ODmat$frx=0
ODmat$fry=0
ODmat$tox=0
ODmat$toy=0
for (i in 1:dim(ODmat)[1]){
ODmat$frx[i]=centroids$East[which(centroids$Code==ODmat$Origin[i])]
ODmat$fry[i]=centroids$North[which(centroids$Code==ODmat$Origin[i])]
ODmat$tox[i]=centroids$East[which(centroids$Code==ODmat$Destination[i])]
ODmat$toy[i]=centroids$North[which(centroids$Code==ODmat$Destination[i])]}
soton@data$id = rownames(soton@data)
soton.points = fortify(soton, region="id")
soton.df = join(soton.points, soton@data, by="id")
ODmatsub=subset(ODmat,Commuters>50)
figplot = ggplot()+
  # ... 

UPDATE: In the comments, it was suggested that I convert the centroids data to an sf object before joining to the shapefile. I have done this but now stuck on how to join the shapefile to the centroids data, bearing in mind that I also have to join the origin-destination data (ODSouthampton) to the centroids to create the flow lines.

Here's a sample of the centroids data(csv):

Code Name East North
E02003549 Southampton 001 442184.28 116144.73
E02003550 Southampton 002 439810.63 116092.19
E02003551 Southampton 003 443938.17 115842.76
E02003552 Southampton 004 439018.21 115633.94
E02003553 Southampton 005 443248.97 115497.68

Sample of the ODSouthampton data (csv):

Origin Destination Commuters
E02003549 E02003549 22
E02003549 E02003550 25
E02003549 E02003551 13
E02003549 E02003552 12
E02003549 E02003553 24

I also have workplace population data which I will display as points on the map, and it's in a csv format. Will I need to convert all 3 csv files to sf objects? Sample of the population data is below:

MSOA_Code WZ_Population POINT_X POINT_Y
E02003549 2149 442184.28 116144.73
E02003550 3227 439810.63 116092.19
E02003551 1998 443938.17 115842.76
E02003552 1596 439018.21 115633.94
E02003553 1772 443248.97 115497.68
Doby answered 5/3 at 12:23 Comment(11)
For more context, st_read returns the geometry of the shapefile as MULTIPOLYGON.Doby
If your csv file contains centroid coordinates, read it as data.frame and convert to sf object with sf::st_as_sf(data.frame.from.csv, coords = ("lon", "lat"), crs = ...) and use sf::st_join(your_shape_data, your_centroids_data) to join them.Elinorelinore
Thank you @Grzegorz. I input the first code like this: centroids=sf::st_as_sf(data.frame.CentroidSouthampton.csv, coords = ("East", "North"), crs = OSGB 1936 / British National Grid) This is throwing an "unexpected" error. Could you help correct my code? I assumed I was to change lon and lat to the column names for the coordinates in the centroid csv. Thank you so much for your help.Doby
Also before your code, the error was being thrown from soton@data$id = rownames(soton@data) which is the code to get the row names for the polygon IDs in the shapefile. Is there a way that works to get the row names?Doby
You need to supply crs = an EPSG code, not the PCS name. Try crs = 27700. And if you supply some sample rows of you csv files, I'm happy to show you the entire workflow without needing a loop.Owlet
Thank you @L Tyrone. I changed the crs but the error I get is still that there is an unexpected ',' in this line: centroids=sf::st_as_sf(data.frame.CentroidSouthampton.csv, coords = ("East", "North"), crs = 27700)Doby
Coordinates need to be a vector object so do coords = c("East", "North"),.Owlet
Thank you. But now it says: Error: object 'data.frame.CentroidSouthampton.csv' not found but this is the name of the centroids csv fileDoby
As I mentioned, you are going to need to supply samples of your data in order to get an answer.Owlet
...but if the csv is in your working directory, you need to put the file name in quotes e.g. "file.csv"Owlet
I have added samples of the data to the initial question. Thank you @LTyroneDoby
E
1

Let me explain all steps:

Data preparation, as we don't have access to your SouthamptonMSOA neither csv files:

# b <- osmdata::getbb("Southampton")
# boundaries <- osmdata::opq(b, timeout = 25*60) |>
#   osmdata::add_osm_feature(key = "boundary", value = "administrative") |>
#   osmdata::osmdata_sf()
# 
# boundaries$osm_multipolygons |>
#   sf::st_write(dsn = "data/south.gpkg")
# 
# boundaries$osm_multipolygons |>
#   sf::st_centroid() |>
#   sf::st_coordinates() |>
#   write.csv(file = "data/centroids.csv")

You can skip above part because you have data on hand.

Let's read in the layer:

sh <- sf::st_read(dsn = "data/south.gpkg")
#> Reading layer `south' from data source `data/south.gpkg' using driver `GPKG'
#> Simple feature collection with 19 features and 129 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -1.957281 ymin: 50.70574 xmax: -0.7293437 ymax: 51.38392
#> Geodetic CRS:  WGS 84

Just to show the content:

sh |>
  subset(select = c("name", "admin_level"))
#> Simple feature collection with 19 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -1.957281 ymin: 50.70574 xmax: -0.7293437 ymax: 51.38392
#> Geodetic CRS:  WGS 84
#> First 10 features:
#>              name admin_level                           geom
#> 1     Test Valley           8 MULTIPOLYGON (((-1.308833 5...
#> 2       Eastleigh           8 MULTIPOLYGON (((-1.312912 5...
#> 3     Southampton           6 MULTIPOLYGON (((-1.354701 5...
#> 4      New Forest           8 MULTIPOLYGON (((-1.347534 5...
#> 5       Hampshire           6 MULTIPOLYGON (((-0.9320425 ...
#> 6       Bursledon          10 MULTIPOLYGON (((-1.309554 5...
#> 7  Hamble-le-Rice          10 MULTIPOLYGON (((-1.349241 5...
#> 8           Hound          10 MULTIPOLYGON (((-1.349241 5...
#> 9       Hedge End          10 MULTIPOLYGON (((-1.320232 5...
#> 10       West End          10 MULTIPOLYGON (((-1.354701 5...

Let's read in csv file and show it's content:

c <- read.csv("data/centroids.csv") 
c
#>    X.1         X        Y
#> 1    1 -1.510652 51.12843
#> 2    2 -1.327202 50.93302
#> 3    3 -1.398709 50.91700
#> 4    4 -1.624719 50.85839
#> 5    5 -1.287777 51.06381
#> 6    6 -1.315738 50.88791
#> 7    7 -1.323930 50.86144
#> 8    8 -1.347316 50.87588
#> 9    9 -1.303966 50.91370
#> 10  10 -1.329964 50.93579
#> 11  11 -1.325985 50.97084
#> 12  12 -1.413366 50.96088
#> 13  13 -1.468417 50.95084
#> 14  14 -1.497882 50.98122
#> 15  15 -1.412990 50.86973
#> 16  16 -1.448976 50.89033
#> 17  17 -1.494418 50.91691
#> 18  18 -1.495827 50.85305
#> 19  19 -1.359211 50.95967

Now, we convert csv (data.frame) to sf object:

c <- c|>
  sf::st_as_sf(coords = c("X", "Y"), crs = "EPSG:4326")

c
#> Simple feature collection with 19 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -1.624719 ymin: 50.85305 xmax: -1.287777 ymax: 51.12843
#> Geodetic CRS:  WGS 84
#> First 10 features:
#>    X.1                   geometry
#> 1    1 POINT (-1.510652 51.12843)
#> 2    2 POINT (-1.327202 50.93302)
#> 3    3   POINT (-1.398709 50.917)
#> 4    4 POINT (-1.624719 50.85839)
#> 5    5 POINT (-1.287777 51.06381)
#> 6    6 POINT (-1.315738 50.88791)
#> 7    7  POINT (-1.32393 50.86144)
#> 8    8 POINT (-1.347316 50.87588)
#> 9    9  POINT (-1.303966 50.9137)
#> 10  10 POINT (-1.329964 50.93579)

Let's use only 2 polygons from our data set, with admin_level == 6

sh |>
  subset(admin_level == "6", select = c("name", "geom")) 
#> Simple feature collection with 2 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -1.957281 ymin: 50.70574 xmax: -0.7293437 ymax: 51.38392
#> Geodetic CRS:  WGS 84
#>          name                           geom
#> 3 Southampton MULTIPOLYGON (((-1.354701 5...
#> 5   Hampshire MULTIPOLYGON (((-0.9320425 ...

And finally we join them together:

sh |>
  subset(admin_level == "6", select = c("name", "geom")) |>
  sf::st_join(c)
#> Simple feature collection with 19 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -1.957281 ymin: 50.70574 xmax: -0.7293437 ymax: 51.38392
#> Geodetic CRS:  WGS 84
#> First 10 features:
#>            name X.1                           geom
#> 3   Southampton   3 MULTIPOLYGON (((-1.354701 5...
#> 5     Hampshire   1 MULTIPOLYGON (((-0.9320425 ...
#> 5.1   Hampshire   2 MULTIPOLYGON (((-0.9320425 ...
#> 5.2   Hampshire   4 MULTIPOLYGON (((-0.9320425 ...
#> 5.3   Hampshire   5 MULTIPOLYGON (((-0.9320425 ...
#> 5.4   Hampshire   6 MULTIPOLYGON (((-0.9320425 ...
#> 5.5   Hampshire   7 MULTIPOLYGON (((-0.9320425 ...
#> 5.6   Hampshire   8 MULTIPOLYGON (((-0.9320425 ...
#> 5.7   Hampshire   9 MULTIPOLYGON (((-0.9320425 ...
#> 5.8   Hampshire  10 MULTIPOLYGON (((-0.9320425 ...

Or opposite way, joining boundaries to points:

sh <- sh |>
  subset(admin_level == "6", select = c("name", "geom")) 
sh
#> Simple feature collection with 2 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -1.957281 ymin: 50.70574 xmax: -0.7293437 ymax: 51.38392
#> Geodetic CRS:  WGS 84
#>          name                           geom
#> 3 Southampton MULTIPOLYGON (((-1.354701 5...
#> 5   Hampshire MULTIPOLYGON (((-0.9320425 ...

csh <- c |>
  sf::st_join(sh)

tmap::tm_shape(sh) +
  tmap::tm_polygons() +
  tmap::tm_shape(csh) +
  tmap::tm_dots("name")

Created on 2024-03-05 with reprex v2.1.0

Elinorelinore answered 5/3 at 16:57 Comment(1)
I really appreciate this @Grzegorz. I am now stuck at the join, so I added samples of the datasets I need joined to the shapefile in the initial question. Please have a look and if you can, guide me on how to achieve the join of ODSouthampton file (flowlines) the centroids data, and also join the centroids to the shapefile. Thank youDoby
E
0

Based on your data. I'm leaving previous answer as a general one how to join spatial data with csv.

Please note, in below example boundaries are taken from {geographr} package.

centroids <- read.csv("~/projekty/test/data/centroids.csv")

boundaries <- geographr::boundaries_msoa11 |>
  subset(msoa11_code %in% centroids$Code)

population <- read.csv("~/projekty/test/data/population.csv") |>
  subset(select = c("MSOA_Code", "WZ_Population"))

From population data set I have taken subset of MSOA code and population only, centroids are not necessary, as we will add population info to our boundaries/polygons.

boundaries <- boundaries |>
  dplyr::left_join(population, by = c("msoa11_code" = "MSOA_Code")) |>
  sf::st_transform(crs = "EPSG:27700")

Now, we will read in ODSoupthampton data and create linestrings by joining with centroids for origin and destination, converting it to spatial data frame and assigning EPSG:27700 CRS.

od <- read.csv("~/projekty/test/data/ODSouthampton.csv", strip.white = TRUE) |>
  dplyr::left_join(centroids, by = c("Origin" = "Code")) |>
  dplyr::rename(c("o_East" = "East", "o_North"= "North")) |>
  dplyr::left_join(centroids, by = c("Destination" = "Code")) |>
  dplyr::rename(c("d_East" = "East", "d_North"= "North")) |>
  dplyr::mutate(geom = paste0("LINESTRING(", o_East, " ", o_North, ", ", d_East, " ", d_North, ")")) |>
  sf::st_as_sf(wkt = "geom", crs = "EPSG:27700") |>
  subset(select = c(1:3, 10))

And final plot. Please note that 22 commuters are from E02003549 to E02003549 which obviously doesn't make sense to plot, but..

tmap::tm_shape(boundaries) +
  tmap::tm_polygons("WZ_Population") +
  tmap::tm_shape(od) +
  tmap::tm_lines(lwd = "Commuters",
                 lwd.scale = tmap::tm_scale_categorical())

Created on 2024-03-05 with reprex v2.1.0

Elinorelinore answered 5/3 at 18:47 Comment(0)
D
0

I resolved this with the od package for manipulating and mapping origin-destination data. I have upvoted the comment by @Grzegorz which was useful in converting the shapefiles and joining the centroids to it, in my use case.

Doby answered 6/3 at 13:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.