How can create a function using variables in a dataframe
Asked Answered
P

3

5

I'm sure the question is a bit dummy (sorry)... I'm trying to create a function using differents variables I have stored in a Dataframe. The function is like that:

mlr_turb <- function(Cond_in, Flow_in, pH_in, pH_out, Turb_in, nm250_i, nm400_i, nm250_o, nm400_o){

     Coag = (+0.032690 + 0.090289*Cond_in + 0.003229*Flow_in - 0.021980*pH_in - 0.037486*pH_out 
             +0.016031*Turb_in  -0.026006*nm250_i +0.093138*nm400_o - 0.397858*nm250_o - 0.109392*nm400_o)/0.167304

    return(Coag)
    }

m4_turb <- mlr_turb(dataset)  

The problem is when I try to run my function in a dataframe (with the same name of variables). It doesn't detect my variables and shows this message:

Error in mlr_turb(dataset) : 
  argument "Flow_in" is missing, with no default

But, actually, there is, also all the variables.

I think I missplace or missing some order in the function that gives it the possibility to take the variables from the dataset. I have searched a lot about that but I have not found any answer...

Pachysandra answered 14/4, 2020 at 13:2 Comment(1)
You're currently telling it that Cond_in = dataset in your function... you need something more like mlr_turb(dataset$Cond_in, dataset$Flow_in ...)Hon
C
2

You meet a standard problem when writing R that is related to the question of standard evaluation (SE) vs non standard evaluation (NSE). If you need more elements, you can have a look at this blog post I wrote

I think the most convenient way to write function using variables is to use variable names as arguments of the function.

Let's take again @Muon example.

# a simple function that takes x, y and z as arguments 
myFun <- function(x, y, z){
  result <- (x + y)/z
  return(result)
}

The question is where R should find the values behind names x, y and z. In a function, R will first look within the function environment (here x,y and z are defined as parameters) then it will look at global environment and then it will look at the different packages attached.

In myFun, R expects vectors. If you give a column name, you will experience an error. What happens if you want to give a column name ? You must say to R that the name you gave should be associated to a value in the scope of a dataframe. You can for instance do something like that:

myFun <- function(df, col1 = "x", col2 = "y", col3 = "z"){
  result <- (df[,col1] + df[,col2])/df[,col3]
  return(result)
}

You can go far further in that aspect with data.table package. If you start writing functions that need to use variables from a dataframe, I recommend you to start having a look at this package

Creaky answered 14/4, 2020 at 13:47 Comment(0)
A
3

No dumb questions!

I think you're looking for do.call. This function allows you to unpack values into a function as arguments. Here's a really simple example.

# a simple function that takes x, y and z as arguments 
myFun <- function(x, y, z){
  result <- (x + y)/z
  return(result)
}

# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
                     y=(1:5)*pi,
                     z=(11:15))

# unpack the values into the function using do.call
do.call('myFun', myData)

Output:

[1] 0.3765084 0.6902654 0.9557522 1.1833122 1.3805309
Amsterdam answered 14/4, 2020 at 13:13 Comment(0)
C
2

You meet a standard problem when writing R that is related to the question of standard evaluation (SE) vs non standard evaluation (NSE). If you need more elements, you can have a look at this blog post I wrote

I think the most convenient way to write function using variables is to use variable names as arguments of the function.

Let's take again @Muon example.

# a simple function that takes x, y and z as arguments 
myFun <- function(x, y, z){
  result <- (x + y)/z
  return(result)
}

The question is where R should find the values behind names x, y and z. In a function, R will first look within the function environment (here x,y and z are defined as parameters) then it will look at global environment and then it will look at the different packages attached.

In myFun, R expects vectors. If you give a column name, you will experience an error. What happens if you want to give a column name ? You must say to R that the name you gave should be associated to a value in the scope of a dataframe. You can for instance do something like that:

myFun <- function(df, col1 = "x", col2 = "y", col3 = "z"){
  result <- (df[,col1] + df[,col2])/df[,col3]
  return(result)
}

You can go far further in that aspect with data.table package. If you start writing functions that need to use variables from a dataframe, I recommend you to start having a look at this package

Creaky answered 14/4, 2020 at 13:47 Comment(0)
Z
1

I like Muon's answer, but I couldn't get it to work if there are columns in the data.frame not in the function. Using the with() function is a simple way to make this work as well...

#Code from Muon:
# a simple function that takes x, y and z as arguments 
myFun <- function(x, y, z){
  result <- (x + y)/z
  return(result)
}

# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
                     y=(1:5)*pi,
                     z=(11:15), 
                     a=6:10)    #adding a var not used in myFun
    
# unpack the values into the function using do.call
do.call('myFun', myData)
#generates an error for the unused "a" column

#using with() function:
with(myData, myFun(x, y, z))
Zealand answered 12/1, 2023 at 1:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.