function works (boot.stepAIC ) but throws an error inside another function - environment issue?
Asked Answered
P

1

7

I realized a strange behavior today with in my R code. I tried a package {boot.StepAIC} which includes a bootstrap function for the results of the stepwise regression with the AIC. However I do not think the statistical background is here the problem (I hope so).
I can use the function at the top level of R. This is my example code.

require(MASS)
require(boot.StepAIC)

n<-100
x<-rnorm(n); y<-rnorm(n,sd=2); z<-rnorm(n,sd=3); res<-x+y+z+rnorm(n,sd=0.1)
dat.test<-as.data.frame(cbind(x,y,z,res))
form.1<-as.formula(res~x+y+z)
boot.stepAIC(lm(form.1, dat.test),dat.test) # should be OK - works at me

However, I wanted to wrap that in an own function. I pass the data and the formula to that function. But I get an error within boot.stepAIC() saying:

the model fit failed in 100 bootstrap samples Error in strsplit(nam.vars, ":") : non-character argument

# custom function
fun.boot.lm.stepAIC<-function(dat,form) {
  if(!inherits(form, "formula")) stop("No formula given")
  fit.lm<-lm(formula=form,data=dat)
  return(boot.stepAIC(object=fit.lm,data=dat))
 }
fun.boot.lm.stepAIC(dat=dat.test,form=form.1)
# results in an error 

So where is the mistake? I suppose it must have something to do with the local and global environment, doesn't it?

Protection answered 16/4, 2012 at 14:55 Comment(5)
I haven't used boot.stepAIC before but suspect it may also have to do with how the formula being passed into the function (which is related to the environment issues too). See https://mcmap.net/q/662973/-understanding-lm-and-environment, https://mcmap.net/q/920420/-anova-test-fails-on-lme-fits-created-with-pasted-formula for some ideas. In particular, calling lm or boot.stepAIC via do.call may help as then the arguments are evaluated before being passed in. You may also investigate the as.name suggestion in the comments. These issues are tricky -- good luck!Venettavenezia
https://mcmap.net/q/1625466/-formula-error-inside-function/210673 also looks to be the same issue.Venettavenezia
yep. I read through this already. I suppose the issues are connected.Protection
But maybe also my former (utterly confusing) post is related. stackoverflow.com/questions/9161273Protection
Yes, it seems likely that that other post is related too. It is confusing though, and I wasn't able to recreate some of your errors. See comment there.Venettavenezia
V
5

Using do.call as in anova test fails on lme fits created with pasted formula provides the answer.

boot.stepAIC doesn't have access to form when run within a function; that can be recreated in the global environment like this; we see that lm is using form.1 as the formula, and removing it makes boot.stepAIC fail.

> form.1<-as.formula(res~x+y+z)
> mm <- lm(form.1, dat.test)
> mm$call
lm(formula = form.1, data = dat.test)
> rm(form.1)
> boot.stepAIC(mm,dat.test)
# same error as OP

Using do.call does work. Here I use as.name as well; otherwise the mm object carries around the entire dataset instead of just the name of it.

> form.1<-as.formula(res~x+y+z)
> mm <- do.call("lm", list(form.1, data=as.name("dat.test")))
> mm$call
lm(formula = res ~ x + y + z, data = dat.test)
> rm(form.1)
> boot.stepAIC(mm,dat.test)

To apply this to the original problem, I'd do this:

fun.boot.lm.stepAIC<-function(dat,form) {
  if(!inherits(form, "formula")) stop("No formula given")
  mm <- do.call("lm", list(form, data=as.name(dat)))
  do.call("boot.stepAIC", list(mm,data=as.name(dat)))
}    
form.1<-as.formula(res~x+y+z)
fun.boot.lm.stepAIC(dat="dat.test",form=form1)

This works too but the entire data set gets included in the final output object, and the final output to console, as well.

fun.boot.lm.stepAIC<-function(dat,form) {
  if(!inherits(form, "formula")) stop("No formula given")
  mm <- do.call("lm", list(form, data=dat))
  boot.stepAIC(mm,data=dat)
}    
form.1<-as.formula(res~x+y+z)
fun.boot.lm.stepAIC(dat=dat.test,form=form.1)
Venettavenezia answered 16/4, 2012 at 21:3 Comment(8)
Thanks. Due to the comprehensive explanation, I see the point. I also read the two related posts. Honestly, I still have some headache with these issues. What is the "use case" for that behavior? I pass two objects to that function so it should be executed in the context of the calling function. I see no point in R or the boot.stepAIC (don't know who to "blame") to redirect to the global environment. The point is how can I be certain in which context a function is looking for the objects. My understanding so far is, alway use do.call() rather than the function directly. Any strategies on that?Protection
Well I played a bit around with that and I still try to understand the context. In your last example you basically pass the name of the global (or parent) variable and access out of the function the global variable dat.test. Is it a call by reference? Could it be that the modeling functions sometimes use a call by reference strategy even I assume its purely call by value?Protection
1) boot.stepAIC uses update, which reruns the call of the linear model; if the call had the name of a function object (such as form) then that object must be accessible. 2) Each function has an environment (the one it was created in, usually), and a chain of parent environments, that it looks in to find objects. However, running a function within another function does not change this parent chain! At the end of the chain is the global environment, so when form is in the global environment, it can find it. But when form is in the environment of the calling function, it can't.Venettavenezia
Well, I think I got that. Still I dont understand the problems.I create a function f and pass the formula form (from global env) to that function. Inside the function I create a linear model everything should be fine. because form is in the f environment and the model also. I call stepAIC which alters the model. So why is there any need to search in the global environment. It should search the model and the formula inside of f environment where it was called from.Protection
The problems I have still exists. Its because I created the function only for one reason to use it in an apply function. And I am struggling again with the same issues I had before even with your solution. So I know that I did not get the point and have not understood the internal procedure how its correctly handled. :(Protection
(meant to send earlier...) 3) This means that your original code works if you set form <- form.1, as boot.stepAIC can then find form in the global environment. However, if you set form and form.1 to different things, you'll get very unexpected results! R is generally a functional language, but these environment chains are perhaps exceptions. It's better (in my opinion) is to form the call with do.call so that going up the chain for the formula isn't necessary.Venettavenezia
You say: "It should search the model and the formula inside of f environment where it was called from". Perhaps. But it doesn't. Perhaps github.com/hadley/devtools/wiki/Scoping and github.com/hadley/devtools/wiki/Evaluation would be useful references.Venettavenezia
Thanks. I ll try to get it working and will use the ressources you mentioned. If I fail I would open a new question :) Thanks for your effortsProtection

© 2022 - 2024 — McMap. All rights reserved.