Comparing simple features {sf} and Spatial objects {sp}: speed and memory
Asked Answered
H

1

7

I'm using some spatial data in R, and wondering whether use the packages/functions that rely on the old Spatial format (package sp) or the new package sf. I made this test, based on code found here.

The idea is to "identify all points falling within a maximum distance of xx meters with respect to each single point in a spatial points dataset".

library(tictoc)

# define a buffer distance and a toy data
maxdist <- 500
df<-data.frame(x = runif(10000, 0, 100000),y = runif(10000, 0, 100000),id = 1:10000)

# doing the analysis using sf
library(sf)
tic("sf")
pts     <- df %>% st_as_sf(coords = c("x", "y"))
pts_buf <- st_buffer(pts, maxdist,nQuadSegs = 5)
int     <- st_intersects(pts_buf, pts)
toc()

# doing the analysis using sp
library(sp)
library(rgeos)
tic("sp")
pts2      <- SpatialPointsDataFrame(df[,1:2],as.data.frame(df[,3]))
pts_buf2  <- gBuffer(pts2, byid=TRUE,width=maxdist)
int2      <- over(pts_buf2, pts2,returnList = TRUE)
toc()

# size of the objects
object.size(pts)<object.size(pts2)
object.size(pts_buf)<object.size(pts_buf2)

Using sf seems to be much better as faster (around 0.53 vs 2.1 seconds in my machine) and requiring less memory. There is one exception though. Why object pts is much larger than pts2? Is sf less efficient in storing a vector of points?

Hookah answered 13/3, 2018 at 14:18 Comment(0)
I
4

One reason I can think of:

The pts (sf) object keeps an attribute specifying the 'type' of geometry object for every row. This is because every row has the potential to be a different geometry in an sf object.

Whereas the pts2 (sp) is in an object of class SpatialPointsDataFrame. It can only hold points, so there's no need to keep the extra attribute for each 'row'

Take the first two rows of the sf object as an example, these are the attributes associated with those geometries

lapply(1:2, function(x) attributes(st_geometry(pts[x, ])))
# [[1]]
# [[1]]$names
# [1] "1"
# 
# [[1]]$class
# [1] "sfc_POINT" "sfc"      
# 
# [[1]]$precision
# [1] 0
# 
# [[1]]$bbox
# xmin     ymin     xmax     ymax 
# 81647.16 72283.90 81647.16 72283.90 
# 
# [[1]]$crs
# Coordinate Reference System: NA
# 
# [[1]]$n_empty
# [1] 0
# 
# 
# [[2]]
# [[2]]$names
# [1] "2"
# 
# [[2]]$class
# [1] "sfc_POINT" "sfc"      
# 
# [[2]]$precision
# [1] 0
# 
# [[2]]$bbox
# xmin      ymin      xmax      ymax 
# 5591.116 38967.060  5591.116 38967.060 
# 
# [[2]]$crs
# Coordinate Reference System: NA
# 
# [[2]]$n_empty
# [1] 0

Every row of this sf data.frame will have similar attributes.

If you strip the attributes from the geometries (to just leave a data.frame)

pts_coords <- st_coordinates(pts)
pts_striped <- pts
st_geometry(pts_striped) <- NULL
pts_striped <- cbind(pts_striped, pts_coords)

And then compare the object sizes

object.size(pts_striped)
# 200896
object.size(pts2)
# 203624

The objects are much closer aligned in size

Impassible answered 13/3, 2018 at 21:10 Comment(1)
Thank you @SymbolixAU, this is something. Although not clear why the sf object, after buffer, becomes smaller than the SpatialPolygonsDataFrameHookah

© 2022 - 2024 — McMap. All rights reserved.