available CRAN vignettes

Asked 31/5, 2012 at 0:17 Answered 1/6, 2012 at 10:18

There's the available.packages() function to list all packages available on CRAN. Is there a similar function to find all available vignettes? If not how would I get a list of all vignettes and the packages they're associated with?

As a corner case to keep in mind the data.table package has 3 vignettes associated with it.

EDIT: Per Andrie's response I realize I wasn't clear. I know about the vignette function for finding all the available local vignettes, I'm after a way to get all the vignettes of all packages on CRAN.

Brattice answered 31/5, 2012 at 0:17 Comment(4)

You could parse (e.g. with XML and RCurl) all package indices on CRAN although I am quite sure sysadmins would not love this idea there. – Ostrich 31/5, 2012 at 6:46

I seem to recall looking at this in response to some SO question (can't find it now) and deciding that since the information isn't included in the output of available.packages(), nor in the result of applying readRDS to @CRAN/web/packages/packages.rds (a trick from Jeroen Ooms), I couldn't think of a non-scraping way to do it ... – Confined 31/5, 2012 at 7:42

This would be part of a package that is intended for CRAN so a scraping method is not the best approach unless... I could scrape once and store the information as a data set in the package that gets updated with each new version of the package. A possibility perhaps. – Brattice 31/5, 2012 at 12:1

@BenBolker could you add that comment as an answer so I can accept it? Not the answer I was hoping for but thought may be the probable outcome. – Brattice 31/5, 2012 at 15:33

Here's my scraper, applied to the first 100 packages (leading to 44 vignettes)

pkgs <- unname(available.packages()[, 1])[1:100]
vindex_urls <- paste0(getOption("repos"),"/web/packages/", pkgs, 
    "/vignettes/index.rds", sep = "")
getf <- function(x) {
      ## I think there should be a way to do this directly
      ## with readRDS(url(...)) but I can't get it to work
    suppressWarnings(
              download.file(x,"tmp.rds",quiet=TRUE))
    readRDS("tmp.rds")
}
library(plyr)
vv <- ldply(vindex_urls,
            .progress="text",
            function(x) {
                if (inherits(z <- try(getf(x),silent=TRUE),
                    "try-error")) NULL else z
            })
tmpf <- function(x,n) { if (is.null(x)) NULL else
                            data.frame(pkg=n,x) }
vframe <- do.call(rbind,mapply(tmpf,vv,pkgs))
rownames(vframe) <- NULL
head(vframe[,c("pkg","Title")])

There may be ways to clean this up/make it more compact, but it seems to work OK. Your scrape once/update occasionally strategy seems reasonable. Or if you wanted you could scrape daily (or weekly or whatever seems reasonable) and save/post the results somewhere publicly accessible, then include a function with that URL hard-coded in the package ... or even create a nicely formatted HTML table, with links, that the whole world could use (and then add Viagra ads to the page, and $$PROFIT$$ ...)

edit: wrapped both the download and the readRDS in a function, so I can wrap the whole thing in try

Confined answered 1/6, 2012 at 10:18 Comment(5)

I think I'm going to scrape with this or a modified version and include the output as a data set in the package. Thank you though i'll probably forgo the Viagra advertisement. – Brattice 1/6, 2012 at 15:13

I tried the function above and ran into a problem. I eliminated , sep = "" as you're using paste0. But when I try to run the v v portion. It stops at 20% and gives me the following error: Error in the readRDS (tmp.rds) : error reading from the connection Any ideas on how to overcome this? What am I doing wrong? – Brattice 2/6, 2012 at 15:17

Don't know. Maybe there's a corrupted index.rds file somewhere? You could put in yet another try statement around the readRDS in order to skip over it ... because the function uses ldply, it doesn't have access to the name of the current package being processed; it might be worth sucking it up and using a for loop for greater transparency ... – Confined 2/6, 2012 at 16:25

This allows me to keep trucking but then throws up an error: Error in list_to_dataframe(res, attr(.data, "split_labels")) : Results must be all atomic, or all data frames – Brattice 2/6, 2012 at 17:23

You need to check the results of try(readRDS(...)) and return NULL on an error. I will try an edit. – Confined 2/6, 2012 at 18:43

The functions vignette() and browseVignettes() list all vignettes of packages installed on your machine.

vignette(package="data.table")

Vignettes in package ‘data.table’:

datatable-faq                         Frequently asked questions (source, pdf)
datatable-intro                       Quick introduction (source, pdf)
datatable-timings                     Timings of common tasks (source, pdf)

browseVignettes() is especially helpful since it creates a web page with hyperlinks:

browseVignettes(package="data.table")

Vignettes found by browseVignettes(package = "data.table")

Vignettes in package data.table

Frequently asked questions - PDF  R  LaTeX/noweb 
Quick introduction - PDF  R  LaTeX/noweb 
Timings of common tasks - PDF  R  LaTeX/noweb

Maintop answered 31/5, 2012 at 5:9 Comment(1)

Thanks Andrie, I'm actually after a complete list of vignettes that CRAN has to offer, not just the locally installed packages. I think this may be a bit more difficult than I had hoped. – Brattice 31/5, 2012 at 5:23

Recommended topics

Hot tags