Sampling different numbers of rows by group in dplyr tidyverse
Asked Answered
L

2

7

I'd like to sample rows from a data frame by group. But here's the catch, I'd like to sample a different number of records based on data from another table. Here is my reproducible data:

df <- data_frame(
  Stratum = rep(c("High","Medium","Low"), 10),
  id = c(1:30),
  Value = runif(30)
)

sampleGuide <- data_frame(
  Stratum = c("High","Medium","Low"),
  Surveys = c(3,2,5)
)

Output should look like this:

# A tibble: 10 × 2
   Stratum      Value
     <chr>      <dbl>
1     High 0.21504972
2     High 0.71069005
3     High 0.09286843
4   Medium 0.52553056
5   Medium 0.06682459
6      Low 0.38793128
7      Low 0.01285081
8      Low 0.87865734
9      Low 0.09100829
10     Low 0.14851919

Here is my NONWORKING attempt

> df %>% 
+   left_join(sampleGuide, by = "Stratum") %>% 
+   group_by(Stratum) %>% 
+   sample_n(unique(Surveys))
Error in unique(Surveys) : object 'Surveys' not found

Also

> df %>% 
+   group_by(Stratum) %>% 
+   nest() %>% 
+   left_join(sampleGuide, by = "Stratum") %>% 
+   mutate(sample = map(., ~ sample_n(data, Surveys)))
Error in mutate_impl(.data, dots) : 
      Don't know how to sample from objects of class function

It seems like sample_n requires the size to be a single number. Any ideas?

I'm only looking for tidyverse solutions. Extra points for purrr!

This was a similar problem, but I am not satisfied with the accepted answer because IRL the number of strata I'm dealing with is large.

Lamee answered 15/1, 2017 at 21:50 Comment(0)
L
8

Figured it out with map2() from purrr

df %>% 
  nest(-Stratum) %>% 
  left_join(sampleGuide, by = "Stratum") %>% 
  mutate(Sample = map2(data, Surveys, sample_n)) %>% 
  unnest(Sample)
Lamee answered 16/1, 2017 at 3:56 Comment(0)
C
-1

A simple and general solution for the title question, using superseded sample_n rather than slice_sample:

require(dplyr)

mtcars %>% group_by(cyl) %>% sample_n(size = unique(cyl), replace = TRUE)

This creates a subsample using cyl as group sizes for each subgroup. Remove replace = TRUE if not needed.

Comras answered 17/9 at 15:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.