Using loops with knitr to produce multiple pdf reports... need a little help to get me over the hump
Asked Answered
S

3

36

First of all, I must admit that I'm very new to knitr and the concept of reproducible analysis, but I can see its potential in improving my current workflow (which includes much copy-pasting into word docs).

I often have to produce multiple reports by group (Hospital in this example) and within each hospital, there may be many different Wards that I'm reporting an outcome on. Previously I ran all of my plots and analysis in R using loops, then the copy/pasting work commenced; however, after reading this post (Can Sweave produce many pdfs automatically?), and it gave me hope that I may actually be able to skip many steps and go straight from R to report through Rnw/knitr.

However, after giving it a try I see that there is something that isn't quite working out (as the R environment within the Rnw does not appear to recognize the looping variables I'm trying to pass to it??).

   ##  make my data
Hospital <- c(rep("A", 20), rep("B", 20))
Ward <- rep(c(rep("ICU", 10), rep("Medicine", 10)), 2)
Month <- rep(seq(1:10), 4)
Outcomes <- rnorm(40, 20, 5)
df <- data.frame(Hospital, Ward, Month, Outcomes)


##  Here is my current work flow-- produce all plots, but export as png and cut/paste
for(hosp in unique(df$Hospital)){
  subgroup <- df[ df$Hospital == hosp,]
  for(ward in unique(subgroup$Ward)){
    subgroup2 <- subgroup[subgroup$Ward == ward,]
    savename <- paste(hosp, ward)
    plot(subgroup2$Month, subgroup2$Outcomes, type="o", main=paste("Trend plot for", savename))
  }
}
# followed by much copy/pasting


##  Here is what I'm trying to go for using knitr 
library(knitr)
for (hosp in unique(df$Hospital)){
  knit("C:file.path\\testing_loops.Rnw", output=paste('report_', Hospital, '.tex', sep=""))
}

## With the following *Rnw file
## start *.Rnw Code
\documentclass[10pt]{article}
\usepackage[margin=1.15 in]{geometry}
<<loaddata, echo=FALSE, message=FALSE>>=
  Hospital <- c(rep("A", 20), rep("B", 20))
Ward <- rep(c(rep("ICU", 10), rep("Medicine", 10)), 2)
Month <- rep(seq(1:10), 4)
Outcomes <- rnorm(40, 20, 5)
df <- data.frame(Hospital, Ward, Month, Outcomes)
subgroup <- df[ df$Hospital == hosp,]
@

\begin{document}
<<setup, echo=FALSE >>=
  opts_chunk$set(fig.path = paste("test", hosp , sep=""))
@

Some infomative text about hospital \Sexpr{hosp}

<<plots, echo=FALSE >>=
  for(ward in unique(subgroup$Ward)){
    subgroup2 <- subgroup[subgroup$Ward == ward,]
    #     subgroup2 <- subgroup2[ order(subgroup2$Month),]
    savename <- paste(hosp, ward)
    plot(subgroup2$Month, subgroup2$Outcomes, type="o", main=paste("Trend plot for", savename))
  }
@
\end{document}


##  To be then turned into pdf with this
tools::texi2pdf("C:file.path\\report_A.tex", clean = TRUE, quiet = TRUE)

After trying to run my knit() code chunk I get this error:

Error in file(con, "w") : invalid 'description' argument

And when I look into the directory where the *.tex file was to be created, I can see the 2 pdf plots from hospital A were produced (none for B) and no hospital specific *.tex file to knit into a pdf. Thanks in advance for any help you can offer!

Stinky answered 13/3, 2013 at 21:23 Comment(0)
W
16

You don't need to re-define the data in the .Rnw file and I think the warning is coming from the fact that you are putting the output name together with Hospital (the full vector of hospitals) rather than hosp (the loop index).

Following your example, testingloops.Rnw would be

\documentclass[10pt]{article}
\usepackage[margin=1.15 in]{geometry}
<<loaddata, echo=FALSE, message=FALSE>>=
subgroup <- df[ df$Hospital == hosp,]
@

\begin{document}
<<setup, echo=FALSE >>=
  opts_chunk$set(fig.path = paste("test", hosp , sep=""))
@

Some infomative text about hospital \Sexpr{hosp}

<<plots, echo=FALSE >>=
  for(ward in unique(subgroup$Ward)){
    subgroup2 <- subgroup[subgroup$Ward == ward,]
    #     subgroup2 <- subgroup2[ order(subgroup2$Month),]
    savename <- paste(hosp, ward)
    plot(subgroup2$Month, subgroup2$Outcomes, type="o", main=paste("Trend plot for", savename))
  }
@
\end{document}

and the driver R file would be just

##  make my data
Hospital <- c(rep("A", 20), rep("B", 20))
Ward <- rep(c(rep("ICU", 10), rep("Medicine", 10)), 2)
Month <- rep(seq(1:10), 4)
Outcomes <- rnorm(40, 20, 5)
df <- data.frame(Hospital, Ward, Month, Outcomes)

## knitr loop
library("knitr")
for (hosp in unique(df$Hospital)){
  knit2pdf("testingloops.Rnw", output=paste0('report_', hosp, '.tex'))
}
Whitherward answered 13/3, 2013 at 22:34 Comment(4)
Brian, AND my machine (for some reason) likes knit2pdf a lot more than it did the lapply tools::texi2pdf! Awesome!Stinky
@Brian Diggs, not sure if you are still monitoring this, but, if I wanted to insert descriptive text after the plots of each ward--so within the plotting code loop in the .Rnw file--what would you know the best way? I tried inserting cat("This is informative text about \\Sexpr{ward})" right after plot. I also turned on code chunk options tidy.opts=list(comment=""), but knit2pdf places the text after both plots--not under each plot in the loop, as I had intended. Also during compilation the "\" causes an escape error(?).Stinky
I got the text to appear after each plot adding cat("This is informative text about", ward) after the plot. For a single bit of information, that is fine. But if you are wanting to loop over more complicated reporting, look at knit_child and Example 020 at github.com/yihui/knitr-examples.Whitherward
Has anything changed in the past few years that would make this not work anymore? I'm unable to make this work. I get tex files and pdfs of images in a figure folder, but the final PDF doesn't compile.Chacma
M
11

Great question! This works for me with the other bits you've supplied in your question. Note that I've replaced your hosp with just x. I've called your Rnw file test.rnw

# input data
Hospital <- c(rep("A", 20), rep("B", 20))
Ward <- rep(c(rep("ICU", 10), rep("Medicine", 10)), 2)
Month <- rep(seq(1:10), 4)
Outcomes <- rnorm(40, 20, 5)
df <- data.frame(Hospital, Ward, Month, Outcomes)

# generate the tex files, one for each hospital in df
library(knitr)
lapply(unique(df$Hospital), function(x) 
       knit("C:\\emacs\\test.rnw", 
            output=paste('report_', x, '.tex', sep="")))

# generate PDFs from the tex files, one for each hospital in df
lapply(unique(df$Hospital), function(x)
       tools::texi2pdf(paste0("C:\\emacs\\", paste0('report_', x, '.tex')), 
                       clean = TRUE, quiet = TRUE))

I've replaced your loops withlapply and anonymous functions, which often seem to be considered more R-ish.

Here you can see where I replaced the hosp with x in the rnw file:

\documentclass[10pt]{article}
\usepackage[margin=1.15 in]{geometry}
<<loaddata, echo=FALSE, message=FALSE>>=
  Hospital <- c(rep("A", 20), rep("B", 20))
Ward <- rep(c(rep("ICU", 10), rep("Medicine", 10)), 2)
Month <- rep(seq(1:10), 4)
Outcomes <- rnorm(40, 20, 5)
df <- data.frame(Hospital, Ward, Month, Outcomes)
subgroup <- df[ df$Hospital == x,]
@

\begin{document}
<<setup, echo=FALSE >>=
  opts_chunk$set(fig.path = paste("test", x , sep=""))
@

Some informative text about hospital \Sexpr{x}

<<plots, echo=FALSE >>=
  for(ward in unique(subgroup$Ward)){
    subgroup2 <- subgroup[subgroup$Ward == ward,]
    #     subgroup2 <- subgroup2[ order(subgroup2$Month),]
    savename <- paste(x, ward)
    plot(subgroup2$Month, subgroup2$Outcomes, type="o", main=paste("Trend plot for", savename))
  }
@
\end{document}

The result is two tex files (report_A.tex, report_B.tex), four PDFs for the figures (A1, A2, B1, B2) and two PDFs for the reports (report_A.pdf, report_B.pdf), each with their figures in them. Is that what you were after?

Milano answered 13/3, 2013 at 21:59 Comment(3)
Absolutely! I'm having trouble (right now) getting the second lapply chunk to behave as it loops through tools:texi2pdf, but I can figure that out myself. Just having the first lapply knit my *.tex files is fantastic! Many, many THANKS!!Stinky
Glad that helped, have you tried knit2pdf as in Brian Diggs' answer? You could replace the second lapply with: library(knitr); lapply(unique(df$Hospital), function(x) knit2pdf("C:\\emacs\\test.rnw", output=paste0('report_', x, '.tex')))Milano
By 'replace' I mean 'do away with' the second lapply, pardon me.Milano
C
2

In this answer I intend to answer a more general question: "Using loops to produce multiple pdf reports", and not your specific example. This is because this trend was quite hard to follow as a noob. I managed to get it working eventualy (html version), so this is my humble solution. There are probably some better ones published here, I just can't fully understand them yet.

  1. create RMD file with your design and save it in the working\input directory (in Rstudio: file->newfile->R markdown). This file should include all functions you need to make the plots in the report (simply declare them in one of those code chunks). Think of this file as the template for all future reports. Don't worry about passing the data into it's environment after chewing it up earlier- I will cover that in (2). the key issue to understand is that all calculations are done further down the pipeline (at the moment you render the RMD file).

  2. create the loop you need to use in a diffrent control r file. In my case, there is a loop that iterates over all files in the directory, and gets them into data frame. then I want to pass those dataframes into the RMD, along with other data variables, in order to plot them. This is how its done:

    run_on_all<-function(path_in="path:\\where\\your\\input\\and\\RMD\\is", path_out="path:\\where\\your\\output\\will\\be") setwd(path_in) ibrary(rmarkdown) library(knitr) list_of_file_names=list.files(path = getwd, pattern = "*.csv") #this gets a list of the input files names for (file_name in list_of_file_names) { data=read.csv(file_name) #read file into data frame report_name=paste(some_variable_name,".html",sep="") render("your_template.Rmd",output_file =report_name,output_dir =path_out,list(data,all other parameters you want to input into the RMD))} }

  3. The most important command is the render function call. It allows you to throw into the RMD environment any paramenters you wish. It also allows you to change the name of the report and to change the output location. In addition, by calling it you are also generating the report, so you get it all in one line.(note that if the call to RMD is within a function, you may find that the variables you input are missing, yet the report will still be published correctly)

summary

there are two files you need- RMD file, that will be the template for all additional reports and a control file. the control file gets data, chews it up and pass the chewed parameteres into the RMD (via the render function). the RMD gets the data, does some computations, plots it and publishes it in a new file (also by the render function). I hope I helped.

Cryptogenic answered 1/1, 2018 at 18:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.