How can ggplot2 keep jittered locations within a map boundary, such as a US state?

Asked 23/12, 2014 at 14:37 Answered 25/12, 2021 at 0:21

Is there a way to keep points that are jittered on a map within a boundary of that map? In the example below, where jittered locations in southwestern Connecticut end up in the water or in an adjoining state, is there a way to have R jitter the location points but not over a map boundary?

Alternatively, is there some other technique, such as to create a table grob near each city to list the firms' names?

# create a data frame called "ct" of geolocations in two cities near the border of a US state (Connecticut).  Each firm has the same lat and longitude of one of the two cities

> dput(ct)
structure(list(city = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L), .Label = c("Greenwich", "Stamford"), class = "factor"), 
    firm = structure(c(1L, 12L, 21L, 22L, 23L, 24L, 25L, 26L, 
    27L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L, 14L, 
    15L, 16L, 17L, 18L, 19L, 20L), .Label = c("A1", "A10", "A11", 
    "A12", "A13", "A14", "A15", "A16", "A17", "A18", "A19", "A2", 
    "A20", "A21", "A22", "A23", "A24", "A25", "A26", "A27", "A3", 
    "A4", "A5", "A6", "A7", "A8", "A9"), class = "factor"), long = c(-73.63, 
    -73.63, -73.63, -73.63, -73.63, -73.55, -73.55, -73.55, -73.55, 
    -73.55, -73.55, -73.55, -73.55, -73.55, -73.55, -73.55, -73.55, 
    -73.55, -73.55, -73.55, -73.55, -73.55, -73.55, -73.55, -73.55, 
    -73.55, -73.55), lat = c(41.06, 41.06, 41.06, 41.06, 41.06, 
    41.09, 41.09, 41.09, 41.09, 41.09, 41.09, 41.09, 41.09, 41.09, 
    41.09, 41.09, 41.09, 41.09, 41.09, 41.09, 41.09, 41.09, 41.09, 
    41.09, 41.09, 41.09, 41.09)), .Names = c("city", "firm", 
"long", "lat"), row.names = c(NA, -27L), class = "data.frame")


library(ggplot2)
# load the map of the United States
all_states <- map_data("state")
# choose to map the borders only of the state of Connecticut
st.map <- subset(all_states, region == "connecticut")

# plot the points for the firms with minimal jitter that still distinguishes each point
ggplot(ct, aes(long, lat)) + 
  geom_polygon(data=st.map, aes(x=long, y=lat, group = group), colour="grey70", fill="white") +
  coord_map() + 
  geom_point(position=position_jitter(width=.1, height=.1), size=2)

enter image description here

Changing each longitude or latitude a tiny bit, as in this question, won't work because there are too many points and I am hoping for an algorithmic solution, since I have many situations where this crowding and border-crossing could arise. https://stackoverflow.com/questions/22943110/jitter-coordinates

Thank you for any suggestions or answers.

Ataxia answered 23/12, 2014 at 14:37 Comment(0)

You can make your own jitter function that jitters the data. Then use the function pnt.in.poly from SDMTools to check if the point is inside the polygon. Otherwise you just jitter the original point again. See below for an example:

require(SDMTools)
bounded_jitter <- function(mapping, data, bounds, width, height, ...){
  # data2 is the jittered data
  data2 <- data
  data2[, paste(mapping$x)] <- rnorm(nrow(data), data[, paste(mapping$x)], width/1.96)
  data2[, paste(mapping$y)] <- rnorm(nrow(data), data[, paste(mapping$y)], height/1.96)
  # is it inside the polygon?
  idx <- as.logical(pnt.in.poly(pnts = data2[, c(paste(mapping$x), paste(mapping$y))],  
                                poly.pnts = bounds)[, 'pip'])
  while(!all(idx)) { # redo for points outside polygon
    data2[!idx, paste(mapping$x)] <- rnorm(sum(!idx), data[!idx, paste(mapping$x)], width/1.96)
    data2[!idx, paste(mapping$y)] <- rnorm(sum(!idx), data[!idx, paste(mapping$y)], height/1.96)
    idx <- as.logical(pnt.in.poly(pnts = data2[, c(paste(mapping$x), paste(mapping$y))],  
                                  poly.pnts = bounds)[, 'pip'])
  }
  # the point
  geom_point(data = data2, mapping, ...)
}
# plot the points for the firms with minimal jitter that still distinguishes each point
ggplot(ct, aes(long, lat)) + 
  geom_polygon(data=st.map, aes(x=long, y=lat, group = group), colour="grey70", fill="white") +
  coord_map() + 
  geom_point(size=2) + 
  bounded_jitter(mapping = aes(x=long, y=lat), 
                 data = ct, 
                 bounds = st.map[, c('long', 'lat')], 
                 width = .1, 
                 height = .1)

resulting plot: Connecticut with jittered points inside

Shamblin answered 23/12, 2014 at 15:52 Comment(2)

Wow, this is a great function. It would be great if this function exists in a package! +1 – Distended 23/12, 2014 at 16:5

@shadow, that is very cool! What if I have this situation in many states and do not know which ones will jitter impropertly? My actual use cases has thousands of locations in hundreds of cities, many of which are near oceans, lakes or state lines. How can I have R check for the boundary-crossing and call the function when needed? – Ataxia 23/12, 2014 at 16:26

There are a few tools that have come out in the 7 years since this was first posted, namely the sf package that works very nicely with the tidyverse packages, and the corresponding ggplot2::geom_sf. I'll work with everything as sf objects instead of polygons in order to get access to spatial operations, downloading the state & town boundaries (tigris downloads shapefiles from the Census Bureau and returns sf objects) and converting the coordinates.

library(dplyr)
library(sf)
library(ggplot2)
sf_use_s2(FALSE)
#> Spherical geometry (s2) switched off

state_sf <- tigris::states(cb = TRUE) %>%
  filter(STUSPS == "CT")
town_sf <- tigris::county_subdivisions("CT", cb = TRUE)
pts_sf <- ct_pts %>%
  mutate(geometry = purrr::map2(long, lat, ~st_point(x = c(.x, .y)))) %>%
  st_as_sf(crs = st_crs(state_sf))

Version 1 is to just take a circular buffer around each distinct point (since I noticed your original dataset has repeats of what look to be town centroids), then masking that to fit inside the state boundaries.

circle_buff <- pts_sf %>%
  distinct(city, geometry) %>%
  st_buffer(dist = 0.1) %>%
  st_intersection(state_sf)

ggplot() +
  geom_sf(data = state_sf, fill = "white") +
  geom_sf(aes(fill = city), data = circle_buff, color = NA, alpha = 0.4)

Then you can create jittered points by sampling within those polygons, taking the same number of points per town as there are observations in the original dataset.

set.seed(10)
jitter1 <- ct_pts %>%
  select(city) %>%
  inner_join(circle_buff, by = "city") %>%
  group_by(city) %>%
  summarise(geometry = suppressMessages(st_sample(geometry, size = n()))) %>%
  ungroup() %>%
  st_as_sf()

ggplot() +
  geom_sf(data = state_sf, fill = "white") +
  geom_sf(aes(color = city), data = jitter1, size = 0.8, alpha = 0.8)

Note, however, that since the buffers pass outside the towns' boundaries and overlap, Stamford points and Greenwich points can occupy some of the same space in that overlapping area. Version 2 masks the buffers by town boundaries rather than just the state, so the area available to sample for the two towns no longer overlaps. For this example I shrunk the buffer distance a bit just to illustrate having buffer borders ending both inside and outside town boundaries—that is, the space available to sample for each town is both within the town and within the buffer radius.

town_buff <- pts_sf %>%
  distinct(city, geometry) %>%
  st_buffer(dist = 0.07) %>%
  split(.$city) %>%
  purrr::imap_dfr(~st_intersection(.x, town_sf %>% filter(NAME == .y)))

jitter2 <- ct_pts %>%
  select(city) %>%
  inner_join(town_buff, by = "city") %>%
  group_by(city) %>%
  summarise(geometry = suppressMessages(st_sample(geometry, size = n()))) %>%
  ungroup() %>%
  st_as_sf()

ggplot() +
  geom_sf(data = state_sf, fill = "white") +
  geom_sf(aes(color = city), data = jitter2, size = 0.8, alpha = 0.8)

Posen answered 25/12, 2021 at 0:21 Comment(0)

Recommended topics

Hot tags