I seem to recall looking at this in response to some SO question (can't find it now) and deciding that since the information isn't included in the output of available.packages()
, nor in the result of applying readRDS
to @CRAN/web/packages/packages.rds
(a trick from Jeroen Ooms), I couldn't think of a non-scraping way to do it ...
Here's my scraper, applied to the first 100 packages (leading to 44 vignettes)
pkgs <- unname(available.packages()[, 1])[1:100]
vindex_urls <- paste0(getOption("repos"),"/web/packages/", pkgs,
"/vignettes/index.rds", sep = "")
getf <- function(x) {
## I think there should be a way to do this directly
## with readRDS(url(...)) but I can't get it to work
suppressWarnings(
download.file(x,"tmp.rds",quiet=TRUE))
readRDS("tmp.rds")
}
library(plyr)
vv <- ldply(vindex_urls,
.progress="text",
function(x) {
if (inherits(z <- try(getf(x),silent=TRUE),
"try-error")) NULL else z
})
tmpf <- function(x,n) { if (is.null(x)) NULL else
data.frame(pkg=n,x) }
vframe <- do.call(rbind,mapply(tmpf,vv,pkgs))
rownames(vframe) <- NULL
head(vframe[,c("pkg","Title")])
There may be ways to clean this up/make it more compact, but it seems to work OK. Your scrape once/update occasionally strategy seems reasonable. Or if you wanted you could scrape daily (or weekly or whatever seems reasonable) and save/post the results somewhere publicly accessible, then include a function with that URL hard-coded in the package ... or even create a nicely formatted HTML table, with links, that the whole world could use (and then add Viagra ads to the page, and $$PROFIT$$ ...)
edit: wrapped both the download and the readRDS in a function, so I can wrap the whole thing in try
XML
andRCurl
) all package indices on CRAN although I am quite sure sysadmins would not love this idea there. – Ostrichavailable.packages()
, nor in the result of applyingreadRDS
to@CRAN/web/packages/packages.rds
(a trick from Jeroen Ooms), I couldn't think of a non-scraping way to do it ... – Confined