Convert (print) a file to PDF - using R? (in windows)
Asked Answered
M

4

14

I wish to convert an HTML file into a PDF file, using R.

Is there a command, or a combination of tools/commands, that can perform this conversion?

Muscular answered 20/9, 2011 at 7:55 Comment(9)
Not really an R question, since there's no way R can do this. You'd have to call an external utility, which is the easy step in doing this from R. The hard step is HTML to PDF conversion - by 'HTML file' do you mean the rendered version of it or the plain HTML text? To render HTML you pretty much need a web browser to handle the images, stylesheets, javascript possibly.Plectognath
Hi spacedman - I mean the plain HTML text. If I understand you correctly, I'd need to have R "run" the print command from my browser. Is that even possible?Muscular
So you want an HTML pretty-printer? Highlighting tags, colouring text, that kind of thing?Plectognath
Yes. I want to create an HTML report in R, and then automatically print it to PDF if I'd need to send it as such a file...Muscular
Still not clear... What's an HTML report? You want to make a file with things like <li>The error was 2.334</li> etc etc and then make a PDF with the formatted output (ie lists with bullets, headings sized correctly and so on)?Plectognath
Thanks Spacedman - indeed - that is what I am afterMuscular
How about a more generic markup language such as asciidoc (package ascii) that can be processed by R to obtain both pdf and html output?Wilds
It sound like what your asking is addressed in the thread below. I assume the only reason you'd wand to convert is if you have an r markdown file. #11025623Sackman
I'm doing some web-scraping with R and would like to convert selected HTML files to pdf for storage and later reviewing. Anybody know updates to this since 2013?Waggery
I
7

Update: if you have Pandoc installed, you can use something like

html_to_pdf <- function(html_file, pdf_file) {
  cmd <- sprintf("pandoc %s -t latex -o %s", html_file, pdf_file)
  system(cmd)
}

There are a few web services that do HTML to PDF conversion and have REST APIs so you can call them with RCurl. A quick internet search gives pdfcrowd.com. They let you upload documents as well as converting URL, but it's a paid for service.

Next hit is joliprint, which is free. Try this:

library(RCurl)
url_to_convert <- curlEscape("http://lifehacker.com/5706937/dont-make-important-decisions-until-your-decision-time") #or wherever

the_pdf <- getForm(
  "http://eu.joliprint.com/api/rest/url/print", 
  url = url_to_convert
)
Interior answered 20/9, 2011 at 10:19 Comment(5)
I appear to be having pesky firewall issues. Can someone unencumbered by a corporate network try my code please.Interior
It gives a warning. and the output file doesn't seem to work. Although your direction seems interesting! # Warning message: # In testCurlOptionsInFormParameters(.params) : # Found possible curl options in form parameters: url cat(the_pdf, file = "d:\\temp.pdf")Muscular
is their any method to convert pdf to html using RHeadland
is joliprint.com still around? I received an error when I tried to visit it...Are there any alternative ways to do this?Domineca
@Domineca I too am looking for this... I'm thinking rPython or reticulate using beautifulsoup might be the only answerWaggery
O
6

wkhtmltopdf is a nice cross-platform tool for this. Install as appropriate for your operating system, then call from R e.g.

system("wkhtmltopdf --javascript-delay 1 in.html out.pdf")

I found the javascript delay necessary to avoid the message "Loading [Contrib]/a11y/accessibility-menu.js" being included in the pdf as a result of loading MathJax - which HTML files generated by R markdown will do.

Owl answered 23/1, 2017 at 14:13 Comment(0)
B
5

To convert from html to pdf, you can use :

library(pagedown)
chrome_print(path_To_Html, output = path_To_PDF)
Borreri answered 19/8, 2021 at 19:42 Comment(0)
R
0

You can also choose to knit to pdf if you can have your code as an R markdown file.

Ranunculus answered 13/7, 2024 at 11:5 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.