A comprehensive survey of the types of things in R; 'mode' and 'class' and 'typeof' are insufficient
Asked Answered
C

5

54

The language R confuses me. Entities have modes and classes, but even this is insufficient to fully describe the entity.

This answer says

In R every 'object' has a mode and a class.

So I did these experiments:

> class(3)
[1] "numeric"
> mode(3)
[1] "numeric"
> typeof(3)
[1] "double"

Fair enough so far, but then I passed in a vector instead:

> mode(c(1,2))
[1] "numeric"
> class(c(1,2))
[1] "numeric"
> typeof(c(1,2))
[1] "double"

That doesn't make sense. Surely a vector of integers should have a different class, or different mode, than a single integer? My questions are:

  • Does everything in R have (exactly one) class ?
  • Does everything in R have (exactly one) mode ?
  • What, if anything, does 'typeof' tell us?
  • What other information is needed to fully describe an entity? (Where is the 'vectorness' stored, for example?)

Update: Apparently, a literal 3 is just a vector of length 1. There are no scalars. OK But... I tried mode("string") and got "character", leading me to think that a string was a vector of characters. But if that was true, then this should be true, but it's not! c('h','i') == "hi"

Chauvin answered 13/1, 2012 at 18:47 Comment(10)
You haven't passed a list in your example, but a vector. A list would be list(1,2) which indeed has mode, class and typeof of "list". If you want to determine if an object has more than one element, you would probably use length.Oliviero
Also, there are no scalars in R. 3 is actually a vector of length one. So there's no surprise that its the same as your c(1,2) example. You might find this part of the R language manual helpful.Destefano
Maybe my question is "How would you write a function serializeme(x) in R which fully describes everything about x"? Would it start off with a test, for the sake of arguments, as to whether x is a list or a vector or function name or whatever? What's the 'top-level' test that would be done first?Chauvin
Re: your update. A mode of "character" does not mean a single character, but a character variable; other languages might call this a string. Additionally, just like 3, "string" is a length 1 vector of type character. c("stringA","stringB") is a length 2 vector of type character.Nevermore
I would probably use str or dputOliviero
There's no "string" class/mode. Check the link @Destefano gave you. "hi" is a 1-length character vector. If you want a serializeme function, could you use... serialize?Katabatic
@JoshuaUlrich is right about serialize(), though it's fairly low level. Depending what you need it for, save() might be more to your liking.Mode
@JoshuaUlrich, this question is for curiousity, in case I ever try (again) to do some real R work. I don't need a serialize, I just suggested it as an analogy for the sort of understanding I would like.Chauvin
@AaronMcDaid: sure, but you could look at the source code for serialize to see exactly how it works, which would answer your question, right?Katabatic
Prior to your comment, I wasn't aware such a function existed :-) I'll look now!Chauvin
E
63

I agree that the type system in R is rather weird. The reason for it being that way is that it has evolved over (a long) time...

Note that you missed one more type-like function, storage.mode, and one more class-like function, oldClass.

So, mode and storage.mode are the old-style types (where storage.mode is more accurate), and typeof is the newer, even more accurate version.

mode(3L)                  # numeric
storage.mode(3L)          # integer
storage.mode(`identical`) # function
storage.mode(`if`)        # function
typeof(`identical`)       # closure
typeof(`if`)              # special

Then class is a whole different story. class is mostly just the class attribute of an object (that's exactly what oldClass returns). But when the class attribute is not set, the class function makes up a class from the object type and the dim attribute.

oldClass(3L) # NULL
class(3L) # integer
class(structure(3L, dim=1)) # array
class(structure(3L, dim=c(1,1))) # matrix
class(list()) # list
class(structure(list(1), dim=1)) # array
class(structure(list(1), dim=c(1,1))) # matrix
class(structure(list(1), dim=1, class='foo')) # foo

Finally, the class can return more than one string, but only if the class attribute is like that. The first string value is then kind of the main class, and the following ones are what it inherits from. The made-up classes are always of length 1.

# Here "A" inherits from "B", which inherits from "C"
class(structure(1, class=LETTERS[1:3])) # "A" "B" "C"

# an ordered factor:
class(ordered(3:1)) # "ordered" "factor"
Enucleate answered 13/1, 2012 at 21:22 Comment(8)
What an excellent, lucid explanation. You've cleared up many mysteries for me with this one answer. Thanks!Mode
@JoshO'Brien - Glad you found it useful!Enucleate
Thanks. I have another question. Does everything have 'attributes'? It appears that everything does, I was able to do class(structure(c(1,2), class="list")) and now it thinks the vector's class is "list"!Chauvin
@AaronMcDaid - Yes, all objects can have attributes. And setting the class attribute to something wrong (like you setting class of a numeric vector to "list"), can lead to errors. But is.list would still return FALSE because it uses the type information, not the class.Enucleate
@Enucleate you say that "class makes up a function from the object type". So why typeof double corresponds to class numeric?Espinosa
Could u little bit elaborate on "The made-up classes are always of length 1"? In the example you gave, length(c("B","C")) is 2!.Gondi
Upvote for this sentence: mode [is] the old-style type ... and typeof is the newer, even more accurate version. This is what I am looking for.Overblouse
If something is weird and broken, it's time to deprecate it.An
N
21

Here's some code to determine what the four type functions, class, mode, typeof, and storage.mode return for each of the kinds of R object.

library(methods)
library(tibble)
library(purrr)
library(xml2)
library(kable)

setClass("dummy", representation(x="numeric", y="numeric"))

types <- list(
  "logical vector" = logical(),
  "integer vector" = integer(),
  "numeric vector" = numeric(),
  "complex vector" = complex(),
  "character vector" = character(),
  "raw vector" = raw(),
  factor = factor(),
  "logical matrix" = matrix(logical()),
  "numeric matrix" = matrix(numeric()),
  "logical array" = array(logical(8), c(2, 2, 2)),
  "numeric array" = array(numeric(8), c(2, 2, 2)),
  list = list(),
  pairlist = .Options,
  "data frame" = data.frame(),
  "closure function" = identity,
  "builtin function" = `+`,
  "special function" = `if`,
  environment = new.env(),
  null = NULL,
  formula = y ~ x,
  expression = expression(),
  call = call("identity"),
  name = as.name("x"),
  "paren in expression" = expression((1))[[1]],
  "brace in expression" = expression({1})[[1]],
  "S3 lm object" = lm(dist ~ speed, cars),
  "S4 dummy object" = new("dummy", x = 1:10, y = rnorm(10)),
  "external pointer" = read_xml("<foo><bar /></foo>")$node
)

type_info <- imap_dfr(
  types,
  function(x, nm)
  {
    tibble(
      "spoken type" = nm,
      class = class(x), 
      typeof = typeof(x),
      mode  = mode(x),
      storage.mode = storage.mode(x)
    )
  }
)

knitr::kable(type_info)

Here's the output:

|spoken type         |class       |typeof      |mode        |storage.mode |
|:-------------------|:-----------|:-----------|:-----------|:------------|
|logical vector      |logical     |logical     |logical     |logical      |
|integer vector      |integer     |integer     |numeric     |integer      |
|numeric vector      |numeric     |double      |numeric     |double       |
|complex vector      |complex     |complex     |complex     |complex      |
|character vector    |character   |character   |character   |character    |
|raw vector          |raw         |raw         |raw         |raw          |
|factor              |factor      |integer     |numeric     |integer      |
|logical matrix      |matrix      |logical     |logical     |logical      |
|logical matrix      |array       |logical     |logical     |logical      |
|numeric matrix      |matrix      |double      |numeric     |double       |
|numeric matrix      |array       |double      |numeric     |double       |
|logical array       |array       |logical     |logical     |logical      |
|numeric array       |array       |double      |numeric     |double       |
|list                |list        |list        |list        |list         |
|pairlist            |pairlist    |pairlist    |pairlist    |pairlist     |
|data frame          |data.frame  |list        |list        |list         |
|closure function    |function    |closure     |function    |function     |
|builtin function    |function    |builtin     |function    |function     |
|special function    |function    |special     |function    |function     |
|environment         |environment |environment |environment |environment  |
|null                |NULL        |NULL        |NULL        |NULL         |
|formula             |formula     |language    |call        |language     |
|expression          |expression  |expression  |expression  |expression   |
|call                |call        |language    |call        |language     |
|name                |name        |symbol      |name        |symbol       |
|paren in expression |(           |language    |(           |language     |
|brace in expression |{           |language    |call        |language     |
|S3 lm object        |lm          |list        |list        |list         |
|S4 dummy object     |dummy       |S4          |S4          |S4           |
|external pointer    |externalptr |externalptr |externalptr |externalptr  |

The types of objects available in R are discussed in the R Language Definition manual. There are a few types not mentioned here: you can't test for objects of type "promise", "...", and "ANY", and "bytecode" and "weakref" are only available at the C-level.

The table of available types in the R source is here.

Noni answered 21/10, 2016 at 8:12 Comment(9)
The first column of the last table is interesting. Can we say that every object is exactly one of those types? My knowledge of R has improved a lot since I first asked this, but I'm still confused about the fundamentals. For example, you gave list and data.frame as two different items, but now I feel that a data.frame is really just a list with a certain classChauvin
@AaronMcDaid Yes, a data frame is just a list, but with extra checking that each element is the same length, and a row.names attribute. And yes, all objects have exactly one of the 20-something possible values of typeof.Noni
Wow... this is like gold here! It was also educational to me how you were able to create particular syntactic structures. E.g., expression({1})[[1]] is how we can recreate a brace in an expression.Autosome
Why did Hadley Wickham state that "all those answers are wrong" with reference to this table? twitter.com/richierocks/status/789380495033376768Chemist
@Chemist Because at this point, mode and storage.mode are legacy features left over from S. You should only ever need to care about class() and typeof().Noni
Shouldn't the spoken type "primitive function" be "builtin function"? I have an objection there. Because: is.primitive returns TRUE for both special-function and builtin-function. I propose that the spoken type "primitive function" in the above table to be "builtin function" to reveal the distinction btw builtin-fnc and special-fnc..Gondi
My second objection: spoken type "character vector" should be "string vector". The word character wrongly intutions "1-character string". However, objects with many characters can be in this style: class(c("ac3","b")) # character. Note the "ac3". For class(c("ac3","b")), I have a vector whose components are "strings", not single characters.Gondi
@ErdoganCEVHER I've changed "primitive" to "builtin" as suggested. I don't agree with your second point: nobody says "string vector". "character vector" is in all the documentation; "string" is more informal.Noni
I get a different output. In the table provided class for every object is of length one. This is not true for many objects, e.g. class(matrix(logical())) returns "matrix"and "array". That is how I see the table running the code but in the answer it is missing. In the answer it is as if the code were ... class = class(x)[1]....Iselaisenberg
F
13

Does everything in R have (exactly one) class ?

Exactly one is definitely not right:

> x <- 3
> class(x) <- c("hi","low")
> class(x)
[1] "hi"  "low"

Everything has (at least one) class.

Does everything in R have (exactly one) mode ?

Not certain but I suspect so.

What, if anything, does 'typeof' tell us?

typeof gives the internal type of an object. Possible values according to ?typeof are:

The vector types "logical", "integer", "double", "complex", "character", "raw" and "list", "NULL", "closure" (function), "special" and "builtin" (basic functions and operators), "environment", "S4" (some S4 objects) and others that are unlikely to be seen at user level ("symbol", "pairlist", "promise", "language", "char", "...", "any", "expression", "externalptr", "bytecode" and "weakref").

mode relies on typeof. From ?mode:

Modes have the same set of names as types (see typeof) except that types "integer" and "double" are returned as "numeric". types "special" and "builtin" are returned as "function". type "symbol" is called mode "name". type "language" is returned as "(" or "call".

What other information is needed to fully describe an entity? (Where is the 'listness' stored, for example?)

A list has class list:

> y <- list(3)
> class(y)
[1] "list"

Do you mean vectorization? length should be sufficient for most purposes:

> z <- 3
> class(z)
[1] "numeric"
> length(z)
[1] 1

Think of 3 as a numeric vector of length 1, rather than as some primitive numeric type.

Conclusion

You can get by just fine with class and length. By the time you need the other stuff, you likely won't have to ask what they're for :-)

Fosdick answered 13/1, 2012 at 19:9 Comment(3)
attributes may be handy too.Vilayet
As I show in my answer, a list with a dim attribute is not of class "list".Enucleate
I didn't realize you could set dim on a list. Oddness.Fosdick
A
13

Adding to one of your sub-questions :

  • What other information is needed to fully describe an entity?

In addition to class, mode, typeof, attributes, str, and so on, is() is also worth noting.

is(1)
[1] "numeric" "vector"

While useful, it is also unsatisfactory. In this example, 1 is more than just that; it is also atomic, finite, and a double. The following function should show all that an object is according to all available is.(...) functions:

what.is <- function(x, show.all=FALSE) {

  # set the warn option to -1 to temporarily ignore warnings
  op <- options("warn")
  options(warn = -1)
  on.exit(options(op))

  list.fun <- grep(methods(is), pattern = "<-", invert = TRUE, value = TRUE)
  result <- data.frame(test=character(), value=character(), 
                       warning=character(), stringsAsFactors = FALSE)

  # loop over all "is.(...)" functions and store the results
  for(fun in list.fun) {
    res <- try(eval(call(fun,x)),silent=TRUE)
    if(class(res)=="try-error") {
      next() # ignore tests that yield an error
    } else if (length(res)>1) {
      warn <- "*Applies only to the first element of the provided object"
      value <- paste(res,"*",sep="")
    } else {
      warn <- ""
      value <- res
    }
    result[nrow(result)+1,] <- list(fun, value, warn)
  }

  # sort the results
  result <- result[order(result$value,decreasing = TRUE),]
  rownames(result) <- NULL

  if(show.all)
    return(result)
  else
    return(result[which(result$value=="TRUE"),])
}

So now we get a more complete picture:

> what.is(1)
        test value warning
1  is.atomic  TRUE        
2  is.double  TRUE        
3  is.finite  TRUE        
4 is.numeric  TRUE        
5  is.vector  TRUE 

> what.is(CO2)
           test value warning
1 is.data.frame  TRUE        
2       is.list  TRUE        
3     is.object  TRUE        
4  is.recursive  TRUE 

You also get more information with the argument show.all=TRUE. I am not pasting any example here as the results are over 50 lines long.

Finally, this is meant as a complementary source of information, not as a replacement for any of the other functions mentionned earlier.

EDIT

To include even more "is" functions, as per @Erdogan's comment, you could add this bit to the function:

  # right after 
  # list.fun <- grep(methods(is), pattern = "<-", invert = TRUE, value = TRUE)
  list.fun.2 <- character()

  packs <- c('base', 'utils', 'methods') # include more packages if needed

  for (pkg in packs) {
    library(pkg, character.only = TRUE)
    objects <- grep("^is.+\\w$", ls(envir = as.environment(paste('package', pkg, sep = ':'))),
                    value = TRUE)
    objects <- grep("<-", objects, invert = TRUE, value = TRUE)
    if (length(objects) > 0) 
      list.fun.2 <- append(list.fun.2, objects[sapply(objects, function(x) class(eval(parse(text = x))) == "function")])
  }

  list.fun <- union(list.fun.1, list.fun.2)  

  # ...and continue with the rest
  result <- data.frame(test=character(), value=character(), 
                       warning=character(), stringsAsFactors = FALSE)
  # and so on...
Albinus answered 18/10, 2014 at 3:22 Comment(3)
I included this function (with a few added features) in my package summarytools. Make sure to use devtools::install_github(dcomtois/summarytools) to get the most up-to-date version. The function also uses the functions mentionned in the other answers (mode, class, typeof, attributes, among others) to summarize as much as possible what really is an object.Albinus
Very nice approach. There are 55 is... functions in my R version. That said, what will be crema on your code is adding other is.. functions. For example, isS4 is not among the 55 grep(methods(is), ...) bacause 55 functions has the naming isPOINTname whereas in isS4, there is no "." in fnc name; i.e. it is not is.S4. Can you catch, Dominic, the is... functions that violates isPOINTname naming?Gondi
@ErdoganCEVHER Pls see my Edit at the bottom... Including all packages would be much more complicated, but having only a few makes good sense!Albinus
I
0

This question deals with the confusion around all kinds of "types" in R. After reading this and quite a few other sites, in my opinion one comment by @RichieCotton best solves the confusion. It took me quite a while to encounter this comment and I want to highlight it by adding it as an answer.

[...] at this point, mode and storage.mode are legacy features left over from S. You should only ever need to care about class() and typeof().

So while reading other answers keep this in mind and focus on typeof and class. Unfortunately, in other posts it does not become obvious that mode and storage.mode should not be used. It would have saved me a lot of time reading this comment first.

Iselaisenberg answered 8/2, 2022 at 8:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.