Return list vs environment from an R function
Asked Answered
B

1

6

What advantage/disadvantage is there for using one over other in the following two cases? Case-I is returning its output as an environment and Case-II is returning its output as a list.

Case I:

function(x) {
  ret <- new.env()
  ret$x <- x
  ret$y <- x^2
  return(ret)
}

Case II:

function(x) {
  ret <- list()
  ret$x <- x
  ret$y <- x^2
  return(ret)
}
Basalt answered 5/2, 2019 at 14:51 Comment(5)
To name elements in a list with an equation you'll have to quote it. (e.g. ret$'x^2' <- x^2)Flection
You can index a list with both names and numerical indices. An env can only be indexed using a name. In this case I don't see any benefits of using an environment.Premonition
I think the deciding factor is what you plan to do with the return value after you call the function. It's not clear from your description what your ultimately goal would be. It is a lot more common in R to return a list rather than an environment though.Scandura
oops!, I should have given a proper name! I have corrected it now.Basalt
@Scandura I have also seen a similar pattern that people return list but then there is this environment and I wondered why people are not using. Many times we don't need the list to be indexed but only the name to the object suffice.Basalt
F
7

Although similars, there're differences in return a list and a enviroment. From Advanced R:

Generally, an environment is similar to a list, with four important exceptions:

  • Every name in an environment is unique.

  • The names in an environment are not ordered (i.e., it doesn’t make sense to ask what the first element of an environment is).

  • An environment has a parent.

  • Environments have reference semantics.

More technically, an environment is made up of two components, the frame, which contains the name-object bindings (and behaves much like a named list), and the parent environment. Unfortunately “frame” is used inconsistently in R. For example, parent.frame() doesn’t give you the parent frame of an environment. Instead, it gives you the calling environment. This is discussed in more detail in calling environments.

From the help:

help(new.env)

Environments consist of a frame, or collection of named objects, and a pointer to an enclosing environment. The most common example is the frame of variables local to a function call; its enclosure is the environment where the function was defined (unless changed subsequently). The enclosing environment is distinguished from the parent frame: the latter (returned by parent.frame) refers to the environment of the caller of a function. Since confusion is so easy, it is best never to use ‘parent’ in connection with an environment (despite the presence of the function parent.env).

from the function's documentation:

e1 <- new.env(parent = baseenv())  # this one has enclosure package:base.
e2 <- new.env(parent = e1)
assign("a", 3, envir = e1)
ls(e1)
#[1] "a"

However ls will gives the environments created:

ls()
#[1] "e1" "e2"

And you can access your enviroment objects just like a list:

e1$a
#[1] 3

Playing with your functions:

f1 <- function(x) {
   ret <- new.env()
   ret$x <- x
   ret$y <- x^2
   return(ret)
}

res <- f1(2)
res
#<environment: 0x0000021d55a8a3e8>

res$y
#[1] 4

f2 <- function(x) {
   ret <- list()
   ret$x <- x
   ret$y <- x^2
   return(ret)

res2 <- f(2)
res2
#$x
#[1] 2

#$y
#[1] 4

res2$y
#[1] 4

Their performance is quite similar, according to microbenchmarking:

microbenchmark::microbenchmark(
   function(x) {
      ret <- new.env()
      ret$x <- x
      ret$y <- x^2
      return(ret)
   },
   function(x) {
      ret <- list()
      ret$x <- x
      ret$y <- x^2
      return(ret)
   },
   times = 500L
)

#Unit: nanoseconds
#                                                                                 #expr
# function(x) {     ret <- new.env()     ret$x <- x     ret$y <- x^2     #return(ret) }
#    function(x) {     ret <- list()     ret$x <- x     ret$y <- x^2     #return(ret) }
# min lq   mean median  uq  max neval
#   0  1 31.802      1 100  801   500
#   0  1 37.802      1 100 2902   500

and they return objects with same sizes:

object.size(res)
#464 bytes

object.size(res2)
#464 bytes

and you can always generate a list from an enviroment (list2env) and the inverse too (as.list):

L <- list(a = 1, b = 2:4, p = pi, ff = gl(3, 4, labels = LETTERS[1:3]))
e <- list2env(L)
e$ff
# [1] A A A A B B B B C C C C
#Levels: A B C

as.list(e)
#$ff
# [1] A A A A B B B B C C C C
#Levels: A B C
#
#$p
#[1] 3.141593
#
#$b
#[1] 2 3 4
#
#$a
#[1] 1
Flection answered 5/2, 2019 at 15:20 Comment(3)
Thank you for the answer, My question is not how to do this but what difference is there between them in terms of performance, ease/cases of use and so on.Basalt
@Basalt take a look at the performance part. Let me know if this answers your question.Flection
Would be nice if you added another nice example of environments: modifying the calling environment. Suppose I have three functions, a, b, and c, which all have the same inputs (say) x. Now all of them run the same pre-processing (say sqrt(x)). Then I define my function: pre_proc <- function() env <- parent.frame(); env$x <- sqrt(x) . Then I can add pre_proc() at the start of a, b and c without passing inputs (or outputs). This can be useful in instances with several inputs and outpus from one function called in several functions. Rather than passing lists back and forth.Argueta

© 2022 - 2024 — McMap. All rights reserved.