R help page as object
Asked Answered
S

2

9

Is there a nice way to extract get the R-help page from an installed package in the form of an R object (e.g a list). I would like to expose help pages in the form of standardized JSON or XML schemas. However getting the R-help info from the DB is harder than I thought.

I hacked together a while ago to get the HTML of an R help manual page. However I would rather have a general R object that contains this information, that I can render to JSON/XML/HTML, etc. I looked into the helpr package from Hadley, but this seems to be a bit of overkill for my purpose.

Spurn answered 18/1, 2012 at 22:57 Comment(0)
S
6

So below what I hacked together. However I yet have to test it on many help files to see if it generally works.

Rd2list <- function(Rd){
    names(Rd) <- substring(sapply(Rd, attr, "Rd_tag"),2);
    temp_args <- Rd$arguments;

    Rd$arguments <- NULL;
    myrd <- lapply(Rd, unlist);
    myrd <- lapply(myrd, paste, collapse="");

    temp_args <- temp_args[sapply(temp_args , attr, "Rd_tag") == "\\item"];
    temp_args <- lapply(temp_args, lapply, paste, collapse="");
    temp_args <- lapply(temp_args, "names<-", c("arg", "description"));
    myrd$arguments <- temp_args;
    return(myrd);
}

getHelpList <- function(...){
    thefile <- help(...)
    myrd <- utils:::.getHelpFile(thefile);
    Rd2list(myrd);
}

And then you would do something like:

myhelp <- getHelpList("qplot", package="ggplot2");
cat(jsonlite::toJSON(myhelp));
Spurn answered 24/1, 2012 at 7:31 Comment(2)
One tip: drop the semicolons. Seriously, drop them. That's C code, not R code. In R you don't need them unless you want to put two commands on one line, and I strongly advise you not to do that.Continence
I like em. They often help me debugging when I forget closing brackets.Spurn
C
6

Edited with suggestion of Hadley

You can do this a bit easier by:

getHTMLhelp <- function(...){
    thefile <- help(...)
    capture.output(
      tools:::Rd2HTML(utils:::.getHelpFile(thefile))
    )
}

Using tools:::Rd2txt instead of tools:::Rd2HTML will give you plain text. Just getting the file (without any parsing) gives you the original Rd format, so you can write your custom parsing function to parse it into an object (see the solution of @Jeroen, which does a good job in extracting all info into a list).

This function takes exactly the same arguments as help() and returns a vector with every element being a line in the file, eg:

> head(HelpAnova)
[1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">"      
[2] "<html><head><title>R: Anova Tables</title>"                             
[3] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">"
[4] "<link rel=\"stylesheet\" type=\"text/css\" href=\"R.css\">"             
[5] "</head><body>"                                                          
[6] ""           

Or :

> HelpGam <- getHTMLhelp(gamm,package=mgcv)
> head(HelpGam)
[1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">"      
[2] "<html><head><title>R: Generalized Additive Mixed Models</title>"        
[3] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">"
[4] "<link rel=\"stylesheet\" type=\"text/css\" href=\"R.css\">"             
[5] "</head><body>"                                                          
[6] ""           
Continence answered 19/1, 2012 at 15:41 Comment(6)
But is there a way to get a non-htmlified object?Spurn
Should have specified that in your question. As you parse the Rd I thought that was what you wanted. Use Rd2txt will give you plain text. Just getting the file (without any parsing) gives you the original Rd format. If you want to transform this to a list, you'll have to write your own function.Continence
I really hate the use of match.call and subsequent call manipulation. I think it's way better just to work with strings.Leonaleonanie
@Leonaleonanie A matter of style I guess. It's how lm and many other functions work. Plus, it should still work if the R core decides to change the directory structure for example. That's going to be a bit more tricky using string manipulation.Continence
I meant that you could write your function as getHTMLHelp <- function(topic, ...) thefile <- help(...). You haven't gained anything by using match.call except to make the function more complicated.Leonaleonanie
And just because some base R functions do it, doesn't mean it's good practice.Leonaleonanie
S
6

So below what I hacked together. However I yet have to test it on many help files to see if it generally works.

Rd2list <- function(Rd){
    names(Rd) <- substring(sapply(Rd, attr, "Rd_tag"),2);
    temp_args <- Rd$arguments;

    Rd$arguments <- NULL;
    myrd <- lapply(Rd, unlist);
    myrd <- lapply(myrd, paste, collapse="");

    temp_args <- temp_args[sapply(temp_args , attr, "Rd_tag") == "\\item"];
    temp_args <- lapply(temp_args, lapply, paste, collapse="");
    temp_args <- lapply(temp_args, "names<-", c("arg", "description"));
    myrd$arguments <- temp_args;
    return(myrd);
}

getHelpList <- function(...){
    thefile <- help(...)
    myrd <- utils:::.getHelpFile(thefile);
    Rd2list(myrd);
}

And then you would do something like:

myhelp <- getHelpList("qplot", package="ggplot2");
cat(jsonlite::toJSON(myhelp));
Spurn answered 24/1, 2012 at 7:31 Comment(2)
One tip: drop the semicolons. Seriously, drop them. That's C code, not R code. In R you don't need them unless you want to put two commands on one line, and I strongly advise you not to do that.Continence
I like em. They often help me debugging when I forget closing brackets.Spurn

© 2022 - 2024 — McMap. All rights reserved.