Printing intermediate results without breaking pipeline in tidyverse
Asked Answered
G

3

13

Is there a command to add to tidyverse pipelines that does not break the flow, but produces some side effect, like printing something out. The usecase I have in mind is something like this. In case of a pipeline

data %>%
  mutate(new_var = <some time consuming operation>) %>%
  mutate(new_var2 = <some other time consuming operation>) %>%
  ...

I would like to add some command to the pipeline that would not modify the end result, but would print out some progress or the state of things. Maybe something like this:

data %>%
  mutate(new_var = <some time consuming operation>) %>%
  command_x(print("first operation done")) %>%
  mutate(new_var2 = <some other time consuming operation>) %>%
  ...

Does there exist such command_x already?

Goggin answered 8/9, 2017 at 19:21 Comment(5)
Please use reproducible examples in your questionsCrayton
Related https://mcmap.net/q/680267/-stepping-through-a-pipeline-with-intermediate-results Luke's answer there is the idiomatic way, I think.Bijugate
The %T>% is almost what I'm looking for, but it would be nice to have a function that returns its first argument and as a second argument would take an expression on the data given in first, like other dplyr functions do. I think I saw something like that somewhere, but might be wrong.Goggin
You could just write pipeable_command_x = function(df, other_args){command_x(other_args); return(df)} and use thatLoyceloyd
Also look at the tidylog package which prints a status upon completion of each operation.Mesonephros
E
9

You could easily write your own function

pass_through <- function(data, fun) {fun(data); data}

And use it like

mtcars %>% pass_through(. %>% ncol %>% print) %>% nrow

Here we use the . %>% syntax to create an anonymous function. You could also write your own more explicitly with

mtcars %>% pass_through(function(x) print(ncol(x))) %>% nrow
Engleman answered 8/9, 2017 at 20:46 Comment(0)
R
15

For the specific case of printing an intermediate step in the pipeline, just use %>% print() %>%. E.g.,

mtcars %>%
  filter(cyl == 4) %>%
  print() %>%
  summarise(mpg = mean(mpg))

For a simple status message, either library(tidylog) or do it manually:

pipe_message = function(.data, status) {message(status); .data}
mtcars %>%
  filter(cyl == 4) %>%
  pipe_message("first operation done") %>%
  select(cyl)

See the answer by @MrFlick for a more general solution for non-print functions.

Rockbound answered 7/1, 2019 at 13:31 Comment(2)
It works well! I don't understand why it's not built into the package. Can I ask you a question, why do you add a call to data after the print(data) statement? pipe_print = function(data) {print(data)} also works.Kristof
You are right! This simplifies this case a whole lot as you can see from my updated answer. It's clear now why it's not built into dplyr.Mesonephros
E
9

You could easily write your own function

pass_through <- function(data, fun) {fun(data); data}

And use it like

mtcars %>% pass_through(. %>% ncol %>% print) %>% nrow

Here we use the . %>% syntax to create an anonymous function. You could also write your own more explicitly with

mtcars %>% pass_through(function(x) print(ncol(x))) %>% nrow
Engleman answered 8/9, 2017 at 20:46 Comment(0)
D
5

You can do on the fly with an anonymous function:

mtcars %>% ( function(x){print(x); return(x)} ) %>% nrow()
Dumbarton answered 20/8, 2020 at 22:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.