Reshape multiple categorical variables to binary response variables
Asked Answered
B

5

13

I am trying to convert the following format:

mydata <- data.frame(movie = c("Titanic", "Departed"), 
                     actor1 = c("Leo", "Jack"), 
                     actor2 = c("Kate", "Leo"))

     movie actor1 actor2
1  Titanic    Leo   Kate
2 Departed   Jack    Leo

to binary response variables:

     movie Leo Kate Jack
1  Titanic   1    1    0
2 Departed   1    0    1

I tried the solution described in Convert row data to binary columns but I could get it to work for two variables, not three.

I would really appreciate if there is a clean way to do this.

Bari answered 27/8, 2013 at 20:32 Comment(0)
S
17

How much spice is too much? Here is a solution via tidyr:

library(dplyr)
library(tidyr)

mydata %>%
  gather(actor,name,starts_with("actor")) %>%
  mutate(present = 1) %>%
  select(-actor) %>%
  spread(name,present,fill = 0)

       movie Jack Kate Leo
 1 Departed    1    0   1
 2  Titanic    0    1   1
Sallysallyann answered 23/6, 2014 at 19:52 Comment(0)
B
7

One way to reshape your data.frame is with the reshape2 package, using melt and dcast. For example:

library(reshape2)
long.mydata <- melt(mydata, id.vars = "movie")
wide.mydata <- dcast(long.mydata, movie ~ value, function(x) 1, fill = 0)

Pay attention to the fun.aggregate and fill parameters in dcast, which control what goes to fill in the interior after casting.

Borders answered 27/8, 2013 at 23:1 Comment(0)
M
4

Since they say variety is the spice of life, here's an approach in base R using table:

table(cbind(mydata[1], 
            actor = unlist(mydata[-1], use.names=FALSE)))
#           actor
# movie      Jack Leo Kate
#   Departed    1   1    0
#   Titanic     0   1    1

The above output is a matrix of class table. To get a data.frame, use as.data.frame.matrix.

as.data.frame.matrix(table(
  cbind(mydata[1], actor = unlist(mydata[-1], use.names=FALSE))))
#          Jack Leo Kate
# Departed    1   1    0
# Titanic     0   1    1
Multiply answered 3/9, 2013 at 4:18 Comment(0)
L
1

The reshape2-package has also the recast-function.

The code:

library(reshape2)
recast(mydata, id.var = 'movie', movie ~ value, fun.aggregate = length)

The result:

     movie Jack Kate Leo
1 Departed    1    0   1
2  Titanic    0    1   1
Luge answered 26/11, 2017 at 9:1 Comment(0)
N
1

An updated tidyr-based option is to convert to long-shape, use complete to fill in missing combinations of movies and actors, and then just convert a logical is.na test to a numeric value. Then reshape back to wide.

library(tidyr)

mydata %>%
  pivot_longer(starts_with("actor"), names_to = "acted") %>%
  complete(movie, value) %>%
  dplyr::mutate(acted = as.numeric(!is.na(acted))) %>%
  pivot_wider(names_from = value, values_from = acted)
#> # A tibble: 2 x 4
#>   movie     Jack   Leo  Kate
#>   <fct>    <dbl> <dbl> <dbl>
#> 1 Departed     1     1     0
#> 2 Titanic      0     1     1
Narva answered 9/11, 2019 at 19:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.