Is it possible to use R package data in testthat tests or run_examples()?
Asked Answered
E

2

34

I'm working on developing an R package, using devtools, testthat, and roxygen2. I have a couple of data sets in the data folder (foo.txt and bar.csv).

My file structure looks like this:

/ mypackage
    / data
        * foo.txt, bar.csv
    / inst
        / tests
            * run-all.R, test_1.R
    / man
    / R

I'm pretty sure 'foo' and 'bar' are documented correctly:

    #' Foo data
    #'
    #' Sample foo data
    #'
    #' @name foo
    #' @docType data
    NULL
    #' Bar data
    #'
    #' Sample bar data
    #'
    #' @name bar
    #' @docType data
    NULL

I would like to use the data in 'foo' and 'bar' in my documentation examples and unit tests.

For example, I would like to use these data sets in my testthat tests by calling:

    data(foo)
    data(bar)
    expect_that(foo$col[1], equals(bar$col[1]))

And, I would like the examples in the documentation to look like this:

    #' @examples
    #' data(foo)
    #' functionThatUsesFoo(foo)

If I try to call data(foo) while developing the package, I get the error "data set 'foo' not found". However, if I build the package, install it, and load it - then I can make the tests and examples work.

My current work-arounds are to not run the example:

    #' @examples
    #' \dontrun{data(foo)}
    #' \dontrun{functionThatUsesFoo(foo)}

And in the tests, pre-load the data using a path specific to my local computer:

    foo <- read.delim(pathToFoo, sep="\t", fill = TRUE, comment.char="#")
    bar <- read.delim(pathToBar, sep=";", fill = TRUE, comment.char="#"
    expect_that(foo$col[1], equals(bar$col[1]))

This does not seem ideal - especially since I'm collaborating with others - requiring all the collaborators to have the same full paths to 'foo' and 'bar'. Plus, the examples in the documentation look like they can't be run, even though once the package is installed, they can.

Any suggestions? Thanks much.

Elson answered 17/1, 2012 at 16:49 Comment(3)
Don't use data(). Just rely on lazy loading.Baro
Sorry about that last comment, I'm still getting use to this formatting. Thanks @hadley. That helped with the testthat tests. I'm still at a loss as to how to make an example in the documentation (using roxygen2) that lets me take advantage of the data set.Elson
If you convert the data to .Rdata files, then load_all will load it for you.Baro
D
23

Importing non-RData files within examples/tests

I found a solution to this problem by peering at the JSONIO package, which obviously needed to provide some examples of reading files other than those of the .RData variety.

I got this to work in function-level examples, and satisfy both R CMD check mypackage as well as testthat::test_package().

(1) Re-organize your package structure so that example data directory is within inst. At some point R CMD check mypackage told me to move non-RData data files to inst/extdata, so in this new structure, that is also renamed.

/ mypackage
    / inst
        / tests
            * run-all.R, test_1.R
        / extdata
            * foo.txt, bar.csv
    / man
    / R
    / tests
        * run-testthat-mypackage.R

(2) (Optional) Add a top-level tests directory so that your new testthat tests are now also run during R CMD check mypackage.

The run-testthat-mypackage.R script should have at minimum the following two lines:

library("testthat")
test_package("mypackage")

Note that this is the part that allows testthat to be called during R CMD check mypackage, and not necessary otherwise. You should add testthat as a "Suggests:" dependency in your DESCRIPTION file as well.

(3) Finally, the secret-sauce for specifying your within-package path:

barfile <- system.file("extdata", "bar.csv", package="mypackage")
bar <- read.csv(barfile)
# remainder of example/test code here...

If you look at the output of the system.file() command, it is returning the full system path to your package within the R framework. On Mac OS X this looks something like:

"/Library/Frameworks/R.framework/Versions/2.15/Resources/library/mypackage/extdata/bar.csv"

The reason this seems okay to me is that you don't hard code any path features other than those within your package, so this approach should be robust relative to other R installations on other systems.

data() approach

As for the data() semantics, as far as I can tell this is specific to R binary (.RData) files in the top-level data directory. So you can circumvent my example above by pre-importing the data files and saving them with the save() command into your data-directory. However, this assumes you only need to show an example in which the data is already loaded into R, as opposed to also reproducibly demonstrating the upstream process of importing the files.

Druce answered 21/6, 2012 at 22:29 Comment(4)
Thanks for the in-depth answer!Elson
You're welcome. I'm glad it helped. It's become useful for my own package devel, so I wanted to share.Robinrobina
I've been trying to figure out how to make the "upstream process of importing" reproducible as well. A typical use case I have is that I want to work with a transformation of some shapefiles that isn't trivial---maybe it takes a minute or so. I can include the shapefiles in inst/extdata, but I can never seem to find that path from code executing inside install(). Plus, even document() seems to want to rebuild all the .r files inside data/. I don't want to rebuild them every time I add or change documentation for a function. A data/Makefile but that seems kludgey. Tips appreciated!Grisly
@holstius Why do you have .r files inside data/? Your data re-build tests should be run by your unit testing code in tests or inst/tests. If investigating this doesn't help, I suggest creating a separate SO question for your comment, with something that approaches a reproducible example.Robinrobina
T
2

Per @hadley's comment, the .RData conversion will work well.

As for the broader question of team collaboration with different environments across team members, a common pattern is to agree on a single environment variable, e.g., FOO_PROJECT_ROOT, that everyone on the team will set up appropriately in their environment. From that point on you can use relative paths, including across projects.

An R-specific approach would be to agree on some data/functions that every team member will set up in their .Rprofile files. That's, for example, how devtools finds packages in non-standard locations.

Last but not least, though it is not optimal, you can actually put developer-specific code in your repository. If @hadley does it, it's not such a bad thing. See, for example, how he activates certain behaviors in testthat in his own environment.

Titty answered 20/6, 2012 at 3:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.