creating a matrix of indicator variables
Asked Answered
O

2

2

I would like to create a matrix of indicator variables. My initial thought was to use model.matrix, which was also suggested here: Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level

However, model.matrix does not seem to work if a factor has only one level.

Here is an example data set with three levels to the factor 'region':

dat = read.table(text = "
    reg1    reg2    reg3   
      1       0       0
      1       0       0
      1       0       0
      1       0       0
      1       0       0
      1       0       0
      0       1       0
      0       1       0
      0       1       0
      0       0       1
      0       0       1
      0       0       1
      0       0       1
", sep = "", header = TRUE)

# model.matrix works if there are multiple regions:

region <- c(1,1,1,1,1,1,2,2,2,3,3,3,3)

df.region <- as.data.frame(region)

df.region$region <- as.factor(df.region$region)

my.matrix <- as.data.frame(model.matrix(~ -1 + df.region$region, df.region))
my.matrix


# The following for-loop works even if there is only one level to the factor
# (one region):

# region <- c(1,1,1,1,1,1,1,1,1,1,1,1,1)

my.matrix <- matrix(0, nrow=length(region), ncol=length(unique(region)))

for(i in 1:length(region)) {my.matrix[i,region[i]]=1}
my.matrix

The for-loop is effective and seems simple enough. However, I have been struggling to come up with a solution that does not involve loops. I can use the loop above, but have been trying hard to wean myself off of them. Is there a better way?

Overslaugh answered 22/12, 2012 at 2:24 Comment(0)
S
7

I would use matrix indexing. From ?"[":

A third form of indexing is via a numeric matrix with the one column for each dimension: each row of the index matrix then selects a single element of the array, and the result is a vector.

Making use of that nice feature:

my.matrix <- matrix(0, nrow=length(region), ncol=length(unique(region)))
my.matrix[cbind(seq_along(region), region)] <- 1

#       [,1] [,2] [,3]
#  [1,]    1    0    0
#  [2,]    1    0    0
#  [3,]    1    0    0
#  [4,]    1    0    0
#  [5,]    1    0    0
#  [6,]    1    0    0
#  [7,]    0    1    0
#  [8,]    0    1    0
#  [9,]    0    1    0
# [10,]    0    0    1
# [11,]    0    0    1
# [12,]    0    0    1
# [13,]    0    0    1
Sourdine answered 22/12, 2012 at 2:35 Comment(4)
+1 for anybody anytime who uses the little-known but very cool matrix indexing feature. It's a favorite of mine.Cyano
I think though that length(unique(region)) should be instead nlevels(region); if a level is missing, the matrix won't be wide enough.Cyano
@Aaron, the first line, I copied from the OP. Look how region is defined; it is not a factor so I think length(unique(region)) is appropriate.Sourdine
Ah yes, I see. I'd still prefer something like max, but if it's always defined as a sequence of increasing integers, with none missing, then of course either way is just fine.Cyano
O
0

I came up with this solution by modifying an answer to a similar question here:

Reshaping a column from a data frame into several columns using R

region <- c(1,1,1,1,1,1,2,2,2,3,3,3,3)
site <- seq(1:length(region))
df <- cbind(site, region)
ind <- xtabs( ~ site + region, df)
ind

region <- c(1,1,1,1,1,1,1,1,1,1,1,1,1)
site <- seq(1:length(region))
df <- cbind(site, region)
ind <- xtabs( ~ site + region, df)
ind

EDIT:

The line below will extract the data frame of indicator variables from ind:

ind.matrix <- as.data.frame.matrix(ind)
Overslaugh answered 25/12, 2012 at 13:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.