How to use data within a function in an R package?
Asked Answered
O

2

9

I am currently writing a function for an R package. Part of what this function is aimed to do is (a) take data as an input and (b) check one of its columns against a list of acceptable values.

These acceptable values are given to me from another organization. They are within a .csv file. What I would like to do is load this .csv file and use it as a reference to check if the column from the user has valid values.

For example, let's say the user has these data:

set.seed(1839)
user <- data.frame(x=sample(letters,10),
                   y=rnorm(10))
user

   x          y
1  v -0.7025836
2  p -1.4586245
3  f  0.1987113
4  y  1.0544690
5  o -0.7112214
6  m  0.2956671
7  b  0.3016737
8  a -0.0945271
9  x -0.2790357
10 c  0.1681388

And the .csv contains many (useful) columns, but I only care about one (z) for the moment:

ref <- data.frame(z=letters[1:4], a=rnorm(4), b=(rnorm(4)))
ref

  z          a          b
1 a -0.3563105  1.4536406
2 b  1.6841862  1.3232985
3 c  1.3073516 -0.6978598
4 d  0.4352904 -0.3971175

The code I would like to run is (note: I am not calling library in the actual function, I am just doing it here for simplicity's sake):

library(dplyr)
valid_values <- ref %>%
  select(z) %>% 
  unname() %>% 
  unlist() %>% 
  as.character()

summary <- user %>% 
  mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))

summary tells me which values of x in user are valid:

   x          y x_valid
1  v -0.7025836   FALSE
2  p -1.4586245   FALSE
3  f  0.1987113   FALSE
4  y  1.0544690   FALSE
5  o -0.7112214   FALSE
6  m  0.2956671   FALSE
7  b  0.3016737    TRUE
8  a -0.0945271    TRUE
9  x -0.2790357   FALSE
10 c  0.1681388    TRUE

Now, what do I use to replace ref with in my function code? Where should I store this data in my package? How do I load it? And what type of file should I covert it to?

The function should look something like:

x_check <- function(data) {

  # get valid values
  valid_values <- ??? %>%
    select(z) %>% 
    unname() %>% 
    unlist() %>% 
    as.character()

  # compare against valid values
  return(
    data %>% 
    mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))
  )
}

What do I replace the ??? with to get my data? I do not care much whether or not the user is able to see this ref data I wish to load in.


I am using devtools::load_all("directory/for/my/package") to test my package. Relevant session information:

R version 3.4.0 (2017-04-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.3 (Maipo)

other attached packages:
[1] roxygen2_6.0.1             devtools_1.13.2
Ogle answered 11/7, 2017 at 20:51 Comment(7)
Have you read about How to include data in R packages?Manille
Generally, you store the data in the data/ folder, you load it using data() (if it's not lazy loaded). And you can use devtools::use_data() to set that up for you.Manille
@Gregor Yes, I read through Hadley's chapter on it from that link, specifically. I have stored my data in the data/ folder and tried to use devtools::use_data(admit_source.RData), where admit_source is the name of the file, but I received the error: Error: Could not find package root.Ogle
@Gregor note that the DESCRIPTION file has also specified LazyData: trueOgle
I think you need to follow the link a little more closely and maybe read ?use_data - you should give use_data an R object, it will take care of creating the RData file. And if you have errors like that, maybe your working directory isn't set to the package folder? It seems like your question would be "why isn't use_data working? How can I avoid this error?" All the stuff about your function seems unrelated.Manille
@Gregor I'm not necessarily tied to devtools::use_data; I just want to figure out a way to access that data when someone runs the function. I may be just confused, but it seems like Hadley specifically says to give it an .RData file generated using save().I wasn't sure of use_data is what I wanted anyways, because the documentation asks for an existing object, which corresponds to why his example involves creating an object x <- c(1:10). If use_data takes an existing object, how do I actually put the file into an R object? That's what I want, anyways.Ogle
@MarkWhite Sorry to mention you here, but I think this post should interest you.Lemieux
O
17

I figured it out, just in case anyone comes across this in the future. How I accomplished this was just loading the data from the /data file in the local environment within the function:

x_check <- function(data) {

  # get reference data
  data("ref", envir=environment())

  # get valid values
  valid_values <- ref %>%
    select(z) %>% 
    unname() %>% 
    unlist() %>% 
    as.character()

  # compare against valid values
  return(
    data %>% 
    mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))
  )
}
Ogle answered 12/7, 2017 at 3:21 Comment(0)
U
4

See Hadley Wickham's book on R writing packages where he explains how to store data in a package.

"The most common location for package data is (surprise!) data/. Each file in this directory should be a .RData file created by save() containing a single object (with the same name as the file)."

This will make your dataset accessible to any user of your package with packagename::data.

Ugaritic answered 29/1, 2020 at 11:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.