stacking columns into 1 column in R [duplicate]
Asked Answered
A

4

12

I have a data frame that looks like:

ID Time U1 U2 U3 U4 ...
1  20    1  2 3  5 .. 
2  20    2  5 9  4 ..
3  20    2  5 6  4 ..
.
.

And I would need to keep it like: 

ID Time  U
1  20    1
1  20    2
1  20    3
1  20    5
2  20    2
2  20    5
2  20    9
2  20    4
3  20    2
3  20    5
3  20    6
3  20    4

I tried with:

X <- read.table("mydata.txt", header=TRUE, sep=",")
X_D <- as.data.frame(X)
X_new <- stack(X_D, select = -c(ID, Time))

But I haven't managed to get the data into that form. Honestly, I have little experience with stacking/transposing, so any help is greatly appreciated!

Authentic answered 9/4, 2013 at 19:44 Comment(4)
This is commonly called conversion from wide format to long format. However, the way you've defined it, you lose information about which column a piece of data came from. Incidentally, the package reshape2 covers this sort of transformation.Epictetus
Additionally, you're more likely to get the answer you really want if you provide a really easy way for answerers to get example data into R to work with. Providing what's in the file isn't simple, but providing dput() output of the original example dataframe object IS!Epictetus
There's also a solution with tidyr::gather()Roussel
@Roussel - can you enumerate it? I'm looking for the tidyr equivalent of the JMP "stack columns" platform.Individually
A
16

Here's the stack approach:

dat2a <- data.frame(dat[1:2], stack(dat[3:ncol(dat)]))
dat2a
#    ID Time values ind
# 1   1   20      1  U1
# 2   2   20      2  U1
# 3   3   20      2  U1
# 4   1   20      2  U2
# 5   2   20      5  U2
# 6   3   20      5  U2
# 7   1   20      3  U3
# 8   2   20      9  U3
# 9   3   20      6  U3
# 10  1   20      5  U4
# 11  2   20      4  U4
# 12  3   20      4  U4

This is very similar to melt from "reshape2":

library(reshape2)
dat2b <- melt(dat, id.vars=1:2)
dat2b
#    ID Time variable value
# 1   1   20       U1     1
# 2   2   20       U1     2
# 3   3   20       U1     2
# 4   1   20       U2     2
# 5   2   20       U2     5
# 6   3   20       U2     5
# 7   1   20       U3     3
# 8   2   20       U3     9
# 9   3   20       U3     6
# 10  1   20       U4     5
# 11  2   20       U4     4
# 12  3   20       U4     4

And, very similar to @TylerRinker's answer, but not dropping the "times", is to just use sep = "" to help R guess time and variable names.

dat3 <- reshape(dat, direction = "long", idvar=1:2, 
                varying=3:ncol(dat), sep = "", timevar="Measure")
dat3
#        ID Time Measure U
# 1.20.1  1   20       1 1
# 2.20.1  2   20       1 2
# 3.20.1  3   20       1 2
# 1.20.2  1   20       2 2
# 2.20.2  2   20       2 5
# 3.20.2  3   20       2 5
# 1.20.3  1   20       3 3
# 2.20.3  2   20       3 9
# 3.20.3  3   20       3 6
# 1.20.4  1   20       4 5
# 2.20.4  2   20       4 4
# 3.20.4  3   20       4 4

In all three of those, you end up with four columns, not three, like you describe in your desired output. However, as @ndoogan points out, by doing so, you're loosing information about your data. If you're fine with that, you can always drop that column from the resulting data.frame quite easily (for example, dat2a <- dat2a[-4].

Albur answered 10/4, 2013 at 5:41 Comment(0)
P
2

With base reshape:

dat <- read.table(text="ID Time U1 U2 U3 U4
1  20    1  2 3  5
2  20    2  5 9  4
3  20    2  5 6  4", header=TRUE)


colnames(dat) <- gsub("([a-zA-Z]*)([0-9])", "\\1.\\2", colnames(dat))
reshape(dat, varying=3:ncol(dat), v.names="U", direction ="long", timevar = "Time", 
    idvar = "ID")
Porfirioporgy answered 9/4, 2013 at 20:8 Comment(4)
(+1) It could be worth specifying what "base reshape" means in the context of another reshape answer that depends on reshape2 package.Epictetus
This should do it for anyone wondering: ?reshapePorfirioporgy
+1 for matching the output exactly, but I do also worry about loosing information this way.Albur
@AnandaMahto Agreed. In this case I didn't see it as a problem so I dropped it to match the user's wants.Porfirioporgy
T
1

You can also use melt():

library(reshape2)

new_data <- melt(old_data, id.vars=c("ID","Time"),
    value.name = "U")

Then remove the 'variable' column:

new_data$variable <- NULL
Thomasina answered 18/6, 2017 at 20:14 Comment(1)
Oops, this will actually order the U rows by rows in the old_dataThomasina
R
0

Try this:

do.call(rbind, lapply(1:4, function(i)structure(dat[,c("ID", "Time", paste0("U",i))], names=c("ID", "Time", "U"))))

Where dat is your data.frame...

Runlet answered 9/4, 2013 at 19:49 Comment(2)
Thanks! It gives me an error though: 'undefined columns selected' here '[.data.frame'(dat, ,[,c("ID", "Time",Authentic
Hi @user2263330, it works for me with dat <- data.frame(ID=1:3, Time=20, U1=1:3, U2=4:6, U3=7:9, U4=10:12). What are the names of your data.frame?Runlet

© 2022 - 2024 — McMap. All rights reserved.