Return dplyr pipeline result early
Asked Answered
T

3

6

When coding, I often want to check the intermediate results of the pipeline I'm working on. If I'm working on the early parts of a long pipeline, it requires quite a few clicks/mouse to run that selectively and to save the outcome. Is there a neat way to do something like the following?

library(dplyr)
result = mtcars |>
  # Testing this step
  filter(cyl == 4) |>
  return_early() |>

  # I don't want to run the rest of the pipeline
  group_by(gear) |>
  summarise()

so that after executing, result will hold the result at return_early() without executing the rest of the pipeline?

Note that I'm asking about a convenient way to save the intermediate output and stop the evaluation. If you're interested in printing, see here and here.

Thrice answered 2/3, 2023 at 8:48 Comment(2)
Have a look at github.com/MilesMcBain/breakerofchains.Sistrunk
RelatedHekker
A
3

My habit is to comment out (#) the pipe leading to the next command, then run the code (Macbook cmd + enter or Windows ctrl + enter).

After checking the results, simply remove the comment character (#) and go on.

library(tidyverse)

result = mtcars |>
  filter(cyl == 4) #|> <- run the code here, the rest would be ignored
  group_by(gear) |>
  summarise()

This would still require a few clicks to remove the comment character, would love to see other's approach.

Actinic answered 2/3, 2023 at 9:0 Comment(0)
W
4

You can assign the intermediate result to a temporary object temp. Though the overall pipeline is still evaluated and returns the output as result.

library(dplyr)

result <- mtcars |>
  filter(cyl == 4) |>
  assign(x = 'temp') |>
  group_by(gear) |>
  summarise()

result
#    gear
# 1     3
# 2     4
# 3     5

temp
#                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
# Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
# Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
# ...

Update

Inspired by @Maël's use of stop(), I create a function including a toggle so that you can control whether the remaining pipeline will be evaluated or not.

return_early <- function(x, name = 'temp', eval_latter = FALSE){
  assign(name, x, pos = 1)
  if(!eval_latter) {
    stop("Intermediate object '", name, "' is created. ",
         "The remaining pipeline is omitted!", call. = FALSE)
  }
  return(x)
}

Test 1: Only temp is created.

result <- mtcars |>
  filter(cyl == 4) |>
  return_early() |>
  group_by(gear) |>
  summarise()

Error: Intermediate object 'temp' is created. The remaining pipeline is omitted!

Test 2: Both result and temp are created.

result <- mtcars |>
  filter(cyl == 4) |>
  return_early(eval_latter = TRUE) |>
  group_by(gear) |>
  summarise()
Wedekind answered 2/3, 2023 at 9:14 Comment(0)
A
3

My habit is to comment out (#) the pipe leading to the next command, then run the code (Macbook cmd + enter or Windows ctrl + enter).

After checking the results, simply remove the comment character (#) and go on.

library(tidyverse)

result = mtcars |>
  filter(cyl == 4) #|> <- run the code here, the rest would be ignored
  group_by(gear) |>
  summarise()

This would still require a few clicks to remove the comment character, would love to see other's approach.

Actinic answered 2/3, 2023 at 9:0 Comment(0)
S
2

You could assign and then stop. It throws an error, but stops the evaluation:

result <- 
  mtcars |>
  filter(cyl == 4) |>
  assign(x = "early") |>
  stop() |>
  group_by(gear) |>
  summarise()

#Error in group_by(stop(assign(filter(mtcars, cyl == 4), x = "early")),  : 
#  c(22.8, 24.4, 22.8, 32.4, 30.4, 33.9, 21.5, 27.3, 26, 30.4, 21.4)c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4)c(108, 146.7, 140.8, 78.7, 75.7, 71.1, 120.1, 79, 120.3, 95.1, 121)c(93, 62, 95, 66, 52, 65, 97, 66, 91, 113, 109)c(3.85, 3.69, 3.92, 4.08, 4.93, 4.22, 3.7, 4.08, 4.43, 3.77, 4.11)c(2.32, 3.19, 3.15, 2.2, 1.615, 1.835, 2.465, 1.935, 2.14, 1.513, 2.78)c(18.61, 20, 22.9, 19.47, 18.52, 19.9, 20.01, 18.9, 16.7, 16.9, 18.6)c(1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1)c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1)c(4, 4, 4, 4, 4, 4, 3, 4, 5, 5, 4)c(1, 2, 2, 1, 2, 1, 1, 1, 2, 2, 2)

#> result
#Error: object 'result' not found

#> early
#                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
Subarctic answered 2/3, 2023 at 9:33 Comment(1)
Notice that the pipeline breaks at the stop line not because the functionality of stop, but because you pass a dataframe into stop.(stop cannot accept a dataframe as input, so an error pops up.) So even if I replace stop() with mean(), sum(), etc, the same effect can be achieved, i.e. "early" is created and the pipeline pauses.Wedekind

© 2022 - 2024 — McMap. All rights reserved.