Calculate a centre point of multiple lat, long points in a data-frame
Asked Answered
D

3

6

I have a dataset that looks like this:

site   lat      long 
bras2  41.21   -115.11
tex4   45.3    -112.31
bras2  41.15   -115.15 
bras2  41.12   -115.19

For samples with the same site name, I want to calculate their centre point and then add it as a column to the dataset. Some site names are duplicated twice, other three times, other four times.

Like this:

site   lat      long    centre_lat  centre_long 
bras2  41.21   -115.11  value here     value here
tex4   45.3    -112.31  45.3           -112.31 
bras2  41.15   -115.15  value here     value here
bras2  41.12   -115.19  value here     value here

How can I do this?

Dena answered 20/12, 2020 at 11:14 Comment(2)
A midpoint is the single point that sits on the geodesic that connects two points and is equidistant from those two points. Whatever you think the midpoint between three points is, this is not the calculation that midPoint calculates. It expects two points only. Are you thinking of the centroid of multiple points?Coady
Yes, a centroid. Thank you so much, I was mistaken with regards to what a midpoint is.Dena
C
9

If you're using spatial data, you should look into using the sf package. It handles geometries and functions for operating on them well.

Code below shows using both sf::st_centroid and geosphere::centroid. I prefer sf's way of doing things.

df <- read.table(header=TRUE, text= "site   lat      long 
bras2  41.21   -115.11
tex4   45.3    -112.31
bras2  41.15   -115.15 
bras2  41.12   -115.19")


library(dplyr)
library(geosphere)
library(sf)

# Using sf's st_centroid
df_sf <- st_as_sf(df, coords = c('long', 'lat'))

centroids_sf <- df_sf %>%
  group_by(site) %>% 
  summarize(geometry = st_union(geometry)) %>% 
  st_centroid

  
# Using geosphere::centroid
centroids_geoshpere <- df_sf %>%
  group_by(site) %>%
  filter(n() >2)  %>% ## geosphere needs polygons therefore 3+ points
  st_union() %>%
  st_cast('POLYGON') %>%
  as('Spatial') %>% # geoshpere expects SpatialPolygons objects
  centroid() 
  

centroids_geoshpere
#>         [,1]     [,2]
#> [1,] -115.15 41.16001
centroids_sf
#> Simple feature collection with 2 features and 1 field
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -115.15 ymin: 41.16 xmax: -112.31 ymax: 45.3
#> CRS:            NA
#> # A tibble: 2 x 2
#>   site         geometry
#> * <chr>         <POINT>
#> 1 bras2 (-115.15 41.16)
#> 2 tex4   (-112.31 45.3)

Looks like thery're close enough to the same point. I don't think geosphere::centroid can give a centroid for a single point, but may be wrong. sf::st_centroid has no problem with 1,2, or more points. Created on 2020-12-20 by the reprex package (v0.3.0)

Coplin answered 20/12, 2020 at 16:5 Comment(2)
Hi! Thanks so much. Could I ask how will this code deal with cases where there are 2 points and a polygon cannot be made? Will it just ignore these cases?Dena
The geosphere method above will remove locations with fewer than 3 points. The sf method will work for 1, 2 and 3+ points. I would suggest the sf method.Coplin
M
2

You could calculate the means grouped by site names using ave after stripping off the site numbers using gsub.

within(dat, {
  g <- gsub("\\d", "", site)
  mid.lat <- ave(lat, g)
  mid.long <- ave(long, g)
  rm(g)
})
#    site   lat    long mid.long mid.lat
# 1 bras2 41.21 -115.11 -115.150  41.160
# 2  tex4 45.30 -112.31 -112.310  45.300
# 3 bras2 41.15 -115.15 -115.150  41.160
# 4 bras2 41.12 -115.19 -115.150  41.160
# 5  foo1 42.10 -123.10 -123.225  42.225
# 6  foo2 42.20 -123.20 -123.225  42.225
# 7 foo11 42.30 -123.30 -123.225  42.225
# 8 foo12 42.30 -123.30 -123.225  42.225

Or, if you depend on the NA:

within(dat, {
  g <- gsub("\\d", "", site)
  n <- ave(site, g, FUN=length)
  mid.lat <- NA
  mid.long <- NA
  mid.lat[n > 1] <- ave(lat[n > 1], g[n > 1])
  mid.long[n > 1] <- ave(long[n > 1], g[n > 1])
  rm(g, n)
  })
#    site   lat    long mid.long mid.lat
# 1 bras2 41.21 -115.11 -115.150  41.160
# 2  tex4 45.30 -112.31       NA      NA
# 3 bras2 41.15 -115.15 -115.150  41.160
# 4 bras2 41.12 -115.19 -115.150  41.160
# 5  foo1 42.10 -123.10 -123.225  42.225
# 6  foo2 42.20 -123.20 -123.225  42.225
# 7 foo11 42.30 -123.30 -123.225  42.225
# 8 foo12 42.30 -123.30 -123.225  42.225

Data:

dat <- structure(list(site = c("bras2", "tex4", "bras2", "bras2", "foo1", 
"foo2", "foo11", "foo12"), lat = c(41.21, 45.3, 41.15, 41.12, 
42.1, 42.2, 42.3, 42.3), long = c(-115.11, -112.31, -115.15, 
-115.19, -123.1, -123.2, -123.3, -123.3)), class = "data.frame", row.names = c(NA, 
-8L))
Meddlesome answered 20/12, 2020 at 12:35 Comment(3)
Hi! Thank you. One question - is ave a mean in this case? So, is it simply taking a mean of all latitudes and longitudes under the same site name? I am not sure if one can calculate centres like this due to the curvature of the Earth? Please correct me if I am wrong.Dena
@Dena Yes, ave groups first argument by second argument and applies a function that defaults to mean, see help("ave"). I admit not to be a specialist, but when considering GPS coordinates as rectangular projections the curvature shouldn't matter. In this answer they calculate the centroids similarly with mean values.Meddlesome
@Dena The calculation the would be based on this centroid formula .Meddlesome
H
2

The geosphere package has a function centroid to solve problems such as this one.
It as long as there more than one point in shape it is straight forward. Most of the code below involves handling the single point case in the example above.

df <- read.table(header=TRUE, text= "site   lat      long 
bras2  41.21   -115.11
tex4   45.3    -112.31
bras2  41.15   -115.15 
bras2  41.12   -115.19")


library(dplyr)
library(geosphere)

df %>% group_by(side) %>% centroid(.[ ,c(3,2)])

sites <- split(df, df$site)
results <-lapply(sites, function(x) {
   if(nrow(x)>1 ) {
     value <- as.data.frame(centroid(x[, c(3,2)]))
   }
   else {
      value <- x[1, c(3,2)]
      names(value) <- c("lon", "lat")
   }
   value$site <- x$site[1]
   value
})

answer<-bind_rows(results)

      lon      lat  site
1 -115.15 41.16001 bras2
2 -112.31 45.30000  tex4
Holbrooke answered 20/12, 2020 at 15:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.