R help page as object

Asked 18/1, 2012 at 22:57 Answered 24/1, 2012 at 7:31

Is there a nice way to extract get the R-help page from an installed package in the form of an R object (e.g a list). I would like to expose help pages in the form of standardized JSON or XML schemas. However getting the R-help info from the DB is harder than I thought.

I hacked together a while ago to get the HTML of an R help manual page. However I would rather have a general R object that contains this information, that I can render to JSON/XML/HTML, etc. I looked into the helpr package from Hadley, but this seems to be a bit of overkill for my purpose.

Spurn answered 18/1, 2012 at 22:57 Comment(0)

So below what I hacked together. However I yet have to test it on many help files to see if it generally works.

Rd2list <- function(Rd){
    names(Rd) <- substring(sapply(Rd, attr, "Rd_tag"),2);
    temp_args <- Rd$arguments;

    Rd$arguments <- NULL;
    myrd <- lapply(Rd, unlist);
    myrd <- lapply(myrd, paste, collapse="");

    temp_args <- temp_args[sapply(temp_args , attr, "Rd_tag") == "\\item"];
    temp_args <- lapply(temp_args, lapply, paste, collapse="");
    temp_args <- lapply(temp_args, "names<-", c("arg", "description"));
    myrd$arguments <- temp_args;
    return(myrd);
}

getHelpList <- function(...){
    thefile <- help(...)
    myrd <- utils:::.getHelpFile(thefile);
    Rd2list(myrd);
}

And then you would do something like:

myhelp <- getHelpList("qplot", package="ggplot2");
cat(jsonlite::toJSON(myhelp));

Spurn answered 24/1, 2012 at 7:31 Comment(2)

One tip: drop the semicolons. Seriously, drop them. That's C code, not R code. In R you don't need them unless you want to put two commands on one line, and I strongly advise you not to do that. – Continence 8/2, 2012 at 12:57

I like em. They often help me debugging when I forget closing brackets. – Spurn 8/2, 2012 at 17:34

Edited with suggestion of Hadley

You can do this a bit easier by:

getHTMLhelp <- function(...){
    thefile <- help(...)
    capture.output(
      tools:::Rd2HTML(utils:::.getHelpFile(thefile))
    )
}

Using tools:::Rd2txt instead of tools:::Rd2HTML will give you plain text. Just getting the file (without any parsing) gives you the original Rd format, so you can write your custom parsing function to parse it into an object (see the solution of @Jeroen, which does a good job in extracting all info into a list).

This function takes exactly the same arguments as help() and returns a vector with every element being a line in the file, eg:

> head(HelpAnova)
[1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">"      
[2] "<html><head><title>R: Anova Tables</title>"                             
[3] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">"
[4] "<link rel=\"stylesheet\" type=\"text/css\" href=\"R.css\">"             
[5] "</head><body>"                                                          
[6] ""

Or :

> HelpGam <- getHTMLhelp(gamm,package=mgcv)
> head(HelpGam)
[1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">"      
[2] "<html><head><title>R: Generalized Additive Mixed Models</title>"        
[3] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">"
[4] "<link rel=\"stylesheet\" type=\"text/css\" href=\"R.css\">"             
[5] "</head><body>"                                                          
[6] ""

Continence answered 19/1, 2012 at 15:41 Comment(6)

But is there a way to get a non-htmlified object? – Spurn 19/1, 2012 at 22:50

Should have specified that in your question. As you parse the Rd I thought that was what you wanted. Use Rd2txt will give you plain text. Just getting the file (without any parsing) gives you the original Rd format. If you want to transform this to a list, you'll have to write your own function. – Continence 20/1, 2012 at 7:56

I really hate the use of match.call and subsequent call manipulation. I think it's way better just to work with strings. – Leonaleonanie 20/1, 2012 at 14:43

@Leonaleonanie A matter of style I guess. It's how lm and many other functions work. Plus, it should still work if the R core decides to change the directory structure for example. That's going to be a bit more tricky using string manipulation. – Continence 20/1, 2012 at 17:31

I meant that you could write your function as getHTMLHelp <- function(topic, ...) thefile <- help(...). You haven't gained anything by using match.call except to make the function more complicated. – Leonaleonanie 20/1, 2012 at 22:33

And just because some base R functions do it, doesn't mean it's good practice. – Leonaleonanie 20/1, 2012 at 22:33

So below what I hacked together. However I yet have to test it on many help files to see if it generally works.

Rd2list <- function(Rd){
    names(Rd) <- substring(sapply(Rd, attr, "Rd_tag"),2);
    temp_args <- Rd$arguments;

    Rd$arguments <- NULL;
    myrd <- lapply(Rd, unlist);
    myrd <- lapply(myrd, paste, collapse="");

    temp_args <- temp_args[sapply(temp_args , attr, "Rd_tag") == "\\item"];
    temp_args <- lapply(temp_args, lapply, paste, collapse="");
    temp_args <- lapply(temp_args, "names<-", c("arg", "description"));
    myrd$arguments <- temp_args;
    return(myrd);
}

getHelpList <- function(...){
    thefile <- help(...)
    myrd <- utils:::.getHelpFile(thefile);
    Rd2list(myrd);
}

And then you would do something like:

myhelp <- getHelpList("qplot", package="ggplot2");
cat(jsonlite::toJSON(myhelp));

Spurn answered 24/1, 2012 at 7:31 Comment(2)

I like em. They often help me debugging when I forget closing brackets. – Spurn 8/2, 2012 at 17:34

Recommended topics

Hot tags