About lexical scoping in R
Asked Answered
A

2

5

I am fairly new to R and while I was reading the manuals I came across a passage about lexical scoping along with this code example:

 open.account <- function(total) {
   list(
     deposit = function(amount) {
       if(amount <= 0)
         stop("Deposits must be positive!\n")
       total <<- total + amount
       cat(amount, "deposited.  Your balance is", total, "\n\n")
     },
     withdraw = function(amount) {
       if(amount > total)
         stop("You don't have that much money!\n")
       total <<- total - amount
       cat(amount, "withdrawn.  Your balance is", total, "\n\n")
     },
     balance = function() {
       cat("Your balance is", total, "\n\n")
     }
   )
 }

 ross <- open.account(100)
 robert <- open.account(200)

 ross$withdraw(30)
 ross$balance()
 robert$balance()

 ross$deposit(50)
 ross$balance()
 ross$withdraw(500)

So, I understand what the above code does, I guess I'm still confused about exactly how it works. If you can still access a function's "local" variables after the function has finished executing, isn't it very hard or impossible to predict when a variable is no longer needed? In the code above, if it were used as part of a larger program, would "total" be kept stored in memory until the entire program was done?(Essentially becoming a global variable memory-wise) If this is true, wouldn't this cause memory use issues?

I've looked at two other questions on this site: "How is Lexical Scoping implemented?" and "Why are lexical scopes prefered by the compilers?". The answers there went right over my head but it made me wonder: If(as I am guessing) the compiler isn't just making all variables global(memory-wise) and is instead using some technique to predict when certain variables won't be needed anymore and can be deleted, wouldn't doing this work actually make things harder on the compiler rather than easier?

I know that was alot of different questions but any help would be nice, thanks.

Agra answered 28/6, 2013 at 15:24 Comment(6)
total is a global variable in this instance since you use the <<- operator, see ?"<<-". You should be able to inspect total directly in an interactive session. If instead you used <- it would not be available after the function execution, but it would also break this particular function.Difficulty
R does indeed store all its variables in memory. This can indeed cause memory issues with large datasets. This is why there are packages like bigdata, bigmemory, ff, data.table, etc to get around some of R's design limitations. Other things you can do are store your datasets in a database and only query them when necessary; see RODBC, DBI, RMySQL, SQLite and so on.Ashcan
@Difficulty Umm not quite sure what u mean, but the R Help doc says <<- searches parent functions for the variable being assigned and if it is not found, then it creates a global variable. Also just typing total into the interactive session after running the above code causes "Error: object 'total' not found"Agra
@HongOoi Oh ok, I guess that answers the main thing I was wondering about. Thanks!Agra
@Katana, basically, the environment created to evaluate the function "call" is only destroyed if there are no references to objects therein after the evaluation is complete. See github.com/hadley/devtools/wiki/Environments and related links for more information.Ahumada
@Ahumada Oh, I see now, so total continues to exist because the 3 functions returned by open.account still hold a reference to the parent environment? And if that reference is gone the environment and total will be deleted. Ok, now everything makes sense. Thank you :)Agra
A
5

OP seems to be looking for clarification about environments.

In R, every function[1] has an enclosing environment. This is the collection of objects that it knows about, in addition to those that are passed in as its arguments, or that it creates in its code.

When you create a function at the prompt, its environment is the global environment. This is just the collection of objects in your workspace, which you can see by typing ls(). For example, if your workspace contains a data frame Df, you could create a function like the following:

showDfRows <- function()
{
    cat("The number of rows in Df is: ", nrow(Df, "\n")
    return(NULL)
}

Your function knows about Df even though you didn't pass it in as an argument; it exists in the funtion's environment. Environments can be nested, which is how things like package namespaces work. You can, for example do lm(y ~ x, data=Df) to fit a regression, even though your workspace doesn't contain any object called lm. This is because the global environment's chain of parents includes the stats package, which is where the lm function lives.[2]

When functions are created inside another function, their enclosing environment is the evaluation frame of their parent function. This means that the child function can access all the objects known to the parent. For example:

f <- function(x)
{
    g <- function()
    {
        cat("The value of x is ", x, "\n")
    }
    return(NULL)
}

Notice that g doesn't contain any object called x, nor are any of its arguments named x. However, it all still works, because it will retrieve x from the evaluation frame of its parent f.

This is the trick that the code up above is using. When you run open_account, it creates an evaluation frame in which to execute its code. open_account then creates 3 functions, deposit, withdraw and balance. Each of these 3 has as its enclosing environment the evaluation frame of open_account. In this evaluation frame there is a variable called total, whose value was passed in by you, and which will be manipulated by deposit, withdraw and balance.

When open_account completes, it returns a list. If this was a regular function, its evaluation frame would now be disposed of by R. In this case, however, R can see that the returned list contains functions that need to use that evaluation frame; so the frame continues to stay in existence.

So, why don't Ross' and Robert's accounts clash with each other? Every time you execute open_account, R creates a new evaluation frame. The frames from opening Ross' and Robert's accounts are completely separate, just like, if you run lm(y ~ x, data=Df), there will be a separate frame to if you run lm(y ~ x, data=Df2). Each time open_account returns, it will bring with it a new environment in which to store the balance just created. (It will also contain new copies of the deposit, withdraw and balance functions, but generally we can afford to ignore the memory used for this.)

[1] technically every closure, but let's not muddy things

[2] again, there's a technical distinction between namespaces and environments but it isn't important here

Ashcan answered 28/6, 2013 at 16:22 Comment(1)
Ah, yes, this is indeed what I was looking for. Thank you! And thanks to Ferdinand.kraft as well(the commented link was also helpful)Agra
L
3

You should think of open.account as a generator function that makes named individual instances of "accounts". Each individual account has a local "total" and a set of functions that operate on that particular total. (There is no compiler; R is interpreted.) The local variable 'total' would take up space until the object that held it was removed. I don't think "global" is a good way to talk about this (despite the language in the help page.). If you were at the command line (i.e. "looking" at the .GlobalEnv) and you executed an ls() call, you would not see any of the open-account 'totals'.

If you want to create a code-inspectable version, the strategy suggested by @ G. Grothendieck in 2011 in R-help might be interesting:

open.account <- function(total) {
   this<-environment()
   list(this,
     deposit = function(amount) {
       if(amount <= 0)
         stop("Deposits must be positive!\n")
       total <<- total + amount
       cat(amount, "deposited.  Your balance is", total, "\n\n")
     },
     withdraw = function(amount) {
       if(amount > total)
         stop("You don't have that much money!\n")
       total <<- total - amount
       cat(amount, "withdrawn.  Your balance is", total, "\n\n")
     },
     balance = function() {
       cat("Your balance is", total, "\n\n")
     }
   )
 }

 ross <- open.account(100)
ross$deposit(200)
ross[[1]]$total
[1] 300

If you named that first list element. 'this' you could do:

> ross$deposit(200)
200 deposited.  Your balance is 300 

> ross$this$total
[1] 300

I had problems with the word 'lexical' for quite a while. I could not for the longest time figure out why it was called by that name. Eventually I came around to a dictionary analogy, perhaps with different versions of dictionaries. A word gets meaning from its definition within a collection of other words at one particular time of publication. The meaning might change if a new dictionary is published, but a person researching material that was published contemporaneous with the first version should be interpreting it in light of the earlier dictionary version rather than later ones.

Losing answered 28/6, 2013 at 15:40 Comment(7)
Well my main confusion is that it seems neither the abstract "accounts" nor the individual instances of "total" actually have names. The names refer to the three functions that open.account returns. There doesn't seem to be any "object" that holds total. But what I gather from what ur saying is that despite the fact that "total" has no references visible to a human, the R interpreter still knows when it will no longer be needed? If, say, ross were set to something else unrelated to open.account, it's instance of total would be removed from memory?Agra
Yes. If 'ross' is either removed from the workspace or if that name is assigned to something else, the memory that held the 'total' and the local functions would all be garbage-collectable.Losing
However, saying it is garbage collectable, does not necessarily mean that memory fragmentation might not persist. Sometimes the "available memory" will be large but the largest contiguous space may be quite a bit smaller that what is "available" in_toto.Losing
Ok, I think I understand now, thanks. In response to ur "lexical" analogy. My own interpretation of the term was that since lexical means "related to words". Lexical scoping does things in a way that is directly related to the text and words in the source code(as opposed to the events that happen at execution). Ex. A variable declared has scope precisely within the text of w/e function it was declared in, A function that uses <<-(like in this case) looks for matching variables starting from the actual text of watever it was declared in(as opposed to the environment it was called in), etc.Agra
I think of <<- as popping a value outside a membrane (in the biological sense). The difficulty I often have is that there may be more organelles in a cell (or adjacent cells, or there may be different cell types in a body organ) so keeping track of exactly which membrane is enclosing an operation can be difficult. The use of <<- in this instance is popping the recalculated value outside the particular called local function into (cytoplasm of?) the next level "up".Losing
This discussion brings nice parallels with Classes in OOP (e.g. using R6 ). - Your function environment works like a Class for all its "global" variables (i.e. those declared using "<<-") . - Each variable declared as a Class, will know its (and only its) "global" variables. NB1: I also would not call these variables global (they are "global" for their class/environment only). NB2: Coming to R from C++, I also feel uncomfortable using the work "lexicon" for what a OOP terminologyCeremony
I'm not sure. The result of environment might not be where the objects created with <<- end up. Reference classes are often described as providing a close analogue of OOP . See ?Reference Classes and the main help pages for the R6 package.Losing

© 2022 - 2024 — McMap. All rights reserved.