Stepping through a pipeline with intermediate results
Asked Answered
H

5

18

Is there a way to output the result of a pipeline at each step without doing it manually? (eg. without selecting and running only the selected chunks)

I often find myself running a pipeline line-by-line to remember what it was doing or when I am developing some analysis.

For example:

library(dplyr)

mtcars %>% 
  group_by(cyl) %>% 
  sample_frac(0.1) %>% 
  summarise(res = mean(mpg))
# Source: local data frame [3 x 2]
# 
# cyl  res
# 1   4 33.9
# 2   6 18.1
# 3   8 18.7

I'd to select and run:

mtcars %>% group_by(cyl)

and then...

mtcars %>% group_by(cyl) %>% sample_frac(0.1)

and so on...

But selecting and CMD/CTRL+ENTER in RStudio leaves a more efficient method to be desired.

Can this be done in code?

Is there a function which takes a pipeline and runs/digests it line by line showing output at each step in the console and you continue by pressing enter like in demos(...) or examples(...) of package guides

Hospitality answered 8/5, 2015 at 8:50 Comment(2)
Check out R's debug() function. It is close to what you want. You could use it with the print() statements. This post on Cross Validated talks more about it.Galengalena
You can simply use %>% print() %>% - see this answer: https://mcmap.net/q/741552/-printing-intermediate-results-without-breaking-pipeline-in-tidyverseSkees
S
3

It is easy with magrittr function chain. For example define a function my_chain with:

foo <- function(x) x + 1
bar <- function(x) x + 1
baz <- function(x) x + 1
my_chain <- . %>% foo %>% bar %>% baz

and get the final result of a chain as:

     > my_chain(0)
    [1] 3

You can get a function list with functions(my_chain) and define a "stepper" function like this:

stepper <- function(fun_chain, x, FUN = print) {
  f_list <- functions(fun_chain)
  for(i in seq_along(f_list)) {
    x <- f_list[[i]](x)
    FUN(x)
  }
  invisible(x)
}

And run the chain with interposed print function:

stepper(my_chain, 0, print)

# [1] 1
# [1] 2
# [1] 3

Or with waiting for user input:

stepper(my_chain, 0, function(x) {print(x); readline()})
Silky answered 8/5, 2015 at 13:36 Comment(0)
K
10

You can select which results to print by using the tee-operator (%T>%) and print(). The tee-operator is used exclusively for side-effects like printing.

# i.e.
mtcars %>%
  group_by(cyl) %T>% print() %>%
  sample_frac(0.1) %T>% print() %>%
  summarise(res = mean(mpg))
Kuhn answered 7/1, 2017 at 17:48 Comment(1)
When the output is a dataframe I find it useful to use %T>% View() %>% to see the intermediate resultsDialogue
S
3

It is easy with magrittr function chain. For example define a function my_chain with:

foo <- function(x) x + 1
bar <- function(x) x + 1
baz <- function(x) x + 1
my_chain <- . %>% foo %>% bar %>% baz

and get the final result of a chain as:

     > my_chain(0)
    [1] 3

You can get a function list with functions(my_chain) and define a "stepper" function like this:

stepper <- function(fun_chain, x, FUN = print) {
  f_list <- functions(fun_chain)
  for(i in seq_along(f_list)) {
    x <- f_list[[i]](x)
    FUN(x)
  }
  invisible(x)
}

And run the chain with interposed print function:

stepper(my_chain, 0, print)

# [1] 1
# [1] 2
# [1] 3

Or with waiting for user input:

stepper(my_chain, 0, function(x) {print(x); readline()})
Silky answered 8/5, 2015 at 13:36 Comment(0)
N
2

Add print:

mtcars %>% 
  group_by(cyl) %>% 
  print %>% 
  sample_frac(0.1) %>% 
  print %>% 
  summarise(res = mean(mpg))
Nylanylghau answered 8/5, 2015 at 8:56 Comment(3)
I get that print returns it's argument and so this works but it's not really shorter/faster/more convenient than just hand selecting and running chunks.Hospitality
@andrewwong Tell us more, why would you need to run it line by line, more importantly why would you want to look at print output one by one?Nylanylghau
updated question. I want like an interactive stepper in the console or an auto-magic markdown document with the intermediates all generated. thanks for your thoughts!Hospitality
P
2

IMHO magrittr is mostly useful interactively, that is when I am exploring data or building a new formula/model.

In this cases, storing intermediate results in distinct variables is very time consuming and distracting, while pipes let me focus on data, rather than typing:

x %>% foo
## reason on results and 
x %>% foo %>% bar
## reason on results and 
x %>% foo %>% bar %>% baz
## etc.

The problem here is that I don't know in advance what the final pipe will be, like in @bergant.

Typing, as in @zx8754,

x %>% print %>% foo %>% print %>% bar %>% print %>% baz

adds to much overhead and, to me, defeats the whole purpose of magrittr.

Essentially magrittr lacks a simple operator that both prints and pipes results.
The good news is that it seems quite easy to craft one:

`%P>%`=function(lhs, rhs){ print(lhs); lhs %>% rhs }

Now you can print an pipe:

1:4 %P>% sqrt %P>% sum 
## [1] 1 2 3 4
## [1] 1.000000 1.414214 1.732051 2.000000
## [1] 6.146264

I found that if one defines/uses a key bindings for %P>% and %>%, the prototyping workflow is very streamlined (see Emacs ESS or RStudio).

Peder answered 11/12, 2016 at 21:37 Comment(0)
W
2

I wrote the package pipes that can do several things that might help :

  • use %P>% to print the output.
  • use %ae>% to use all.equal on input and output.
  • use %V>% to use View on the output, it will open a viewer for each relevant step.

If you want to see some aggregated info you can try %summary>%, %glimpse>% or %skim>% which will use summary, tibble::glimpse or skimr::skim, or you can define your own pipe to show specific changes, using new_pipe

# devtools::install_github("moodymudskipper/pipes")
library(dplyr)
library(pipes)
res <- mtcars %P>% 
  group_by(cyl) %P>% 
  sample_frac(0.1) %P>% 
  summarise(res = mean(mpg))
#> group_by(., cyl)
#> # A tibble: 32 x 11
#> # Groups:   cyl [3]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ... with 22 more rows
#> sample_frac(., 0.1)
#> # A tibble: 3 x 11
#> # Groups:   cyl [3]
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  26       4  120.    91  4.43  2.14  16.7     0     1     5     2
#> 2  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
#> 3  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#> summarise(., res = mean(mpg))
#> # A tibble: 3 x 2
#>     cyl   res
#>   <dbl> <dbl>
#> 1     4  26  
#> 2     6  17.8
#> 3     8  18.7
res <- mtcars %ae>% 
  group_by(cyl) %ae>% 
  sample_frac(0.1) %ae>% 
  summarise(res = mean(mpg))
#> group_by(., cyl)
#> [1] "Attributes: < Names: 1 string mismatch >"                                              
#> [2] "Attributes: < Length mismatch: comparison on first 2 components >"                     
#> [3] "Attributes: < Component \"class\": Lengths (1, 4) differ (string compare on first 1) >"
#> [4] "Attributes: < Component \"class\": 1 string mismatch >"                                
#> [5] "Attributes: < Component 2: Modes: character, list >"                                   
#> [6] "Attributes: < Component 2: Lengths: 32, 2 >"                                           
#> [7] "Attributes: < Component 2: names for current but not for target >"                     
#> [8] "Attributes: < Component 2: Attributes: < target is NULL, current is list > >"          
#> [9] "Attributes: < Component 2: target is character, current is tbl_df >"
#> sample_frac(., 0.1)
#> [1] "Different number of rows"
#> summarise(., res = mean(mpg))
#> [1] "Cols in y but not x: `res`. "                                                                
#> [2] "Cols in x but not y: `qsec`, `wt`, `drat`, `hp`, `disp`, `mpg`, `carb`, `gear`, `am`, `vs`. "
res <- mtcars %V>% 
  group_by(cyl) %V>% 
  sample_frac(0.1) %V>% 
  summarise(res = mean(mpg))
# you'll have to test this one by yourself
Washwoman answered 9/4, 2019 at 19:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.