stacking columns into 1 column in R [duplicate]

Asked 9/4, 2013 at 19:44 Answered 18/6, 2017 at 20:14

I have a data frame that looks like:

ID Time U1 U2 U3 U4 ...
1  20    1  2 3  5 .. 
2  20    2  5 9  4 ..
3  20    2  5 6  4 ..
.
.

And I would need to keep it like: 

ID Time  U
1  20    1
1  20    2
1  20    3
1  20    5
2  20    2
2  20    5
2  20    9
2  20    4
3  20    2
3  20    5
3  20    6
3  20    4

I tried with:

X <- read.table("mydata.txt", header=TRUE, sep=",")
X_D <- as.data.frame(X)
X_new <- stack(X_D, select = -c(ID, Time))

But I haven't managed to get the data into that form. Honestly, I have little experience with stacking/transposing, so any help is greatly appreciated!

Authentic answered 9/4, 2013 at 19:44 Comment(4)

This is commonly called conversion from wide format to long format. However, the way you've defined it, you lose information about which column a piece of data came from. Incidentally, the package reshape2 covers this sort of transformation. – Epictetus 9/4, 2013 at 19:54

Additionally, you're more likely to get the answer you really want if you provide a really easy way for answerers to get example data into R to work with. Providing what's in the file isn't simple, but providing dput() output of the original example dataframe object IS! – Epictetus 9/4, 2013 at 20:0

There's also a solution with tidyr::gather() – Roussel 19/10, 2015 at 13:27

@Roussel - can you enumerate it? I'm looking for the tidyr equivalent of the JMP "stack columns" platform. – Individually 14/10, 2017 at 17:23

Here's the stack approach:

dat2a <- data.frame(dat[1:2], stack(dat[3:ncol(dat)]))
dat2a
#    ID Time values ind
# 1   1   20      1  U1
# 2   2   20      2  U1
# 3   3   20      2  U1
# 4   1   20      2  U2
# 5   2   20      5  U2
# 6   3   20      5  U2
# 7   1   20      3  U3
# 8   2   20      9  U3
# 9   3   20      6  U3
# 10  1   20      5  U4
# 11  2   20      4  U4
# 12  3   20      4  U4

This is very similar to melt from "reshape2":

library(reshape2)
dat2b <- melt(dat, id.vars=1:2)
dat2b
#    ID Time variable value
# 1   1   20       U1     1
# 2   2   20       U1     2
# 3   3   20       U1     2
# 4   1   20       U2     2
# 5   2   20       U2     5
# 6   3   20       U2     5
# 7   1   20       U3     3
# 8   2   20       U3     9
# 9   3   20       U3     6
# 10  1   20       U4     5
# 11  2   20       U4     4
# 12  3   20       U4     4

And, very similar to @TylerRinker's answer, but not dropping the "times", is to just use sep = "" to help R guess time and variable names.

dat3 <- reshape(dat, direction = "long", idvar=1:2, 
                varying=3:ncol(dat), sep = "", timevar="Measure")
dat3
#        ID Time Measure U
# 1.20.1  1   20       1 1
# 2.20.1  2   20       1 2
# 3.20.1  3   20       1 2
# 1.20.2  1   20       2 2
# 2.20.2  2   20       2 5
# 3.20.2  3   20       2 5
# 1.20.3  1   20       3 3
# 2.20.3  2   20       3 9
# 3.20.3  3   20       3 6
# 1.20.4  1   20       4 5
# 2.20.4  2   20       4 4
# 3.20.4  3   20       4 4

In all three of those, you end up with four columns, not three, like you describe in your desired output. However, as @ndoogan points out, by doing so, you're loosing information about your data. If you're fine with that, you can always drop that column from the resulting data.frame quite easily (for example, dat2a <- dat2a[-4].

Albur answered 10/4, 2013 at 5:41 Comment(0)

With base reshape:

dat <- read.table(text="ID Time U1 U2 U3 U4
1  20    1  2 3  5
2  20    2  5 9  4
3  20    2  5 6  4", header=TRUE)


colnames(dat) <- gsub("([a-zA-Z]*)([0-9])", "\\1.\\2", colnames(dat))
reshape(dat, varying=3:ncol(dat), v.names="U", direction ="long", timevar = "Time", 
    idvar = "ID")

Porfirioporgy answered 9/4, 2013 at 20:8 Comment(4)

(+1) It could be worth specifying what "base reshape" means in the context of another reshape answer that depends on reshape2 package. – Epictetus 9/4, 2013 at 20:10

This should do it for anyone wondering: ?reshape – Porfirioporgy 9/4, 2013 at 20:15

+1 for matching the output exactly, but I do also worry about loosing information this way. – Albur 10/4, 2013 at 5:41

@AnandaMahto Agreed. In this case I didn't see it as a problem so I dropped it to match the user's wants. – Porfirioporgy 10/4, 2013 at 6:1

You can also use melt():

library(reshape2)

new_data <- melt(old_data, id.vars=c("ID","Time"),
    value.name = "U")

Then remove the 'variable' column:

new_data$variable <- NULL

Thomasina answered 18/6, 2017 at 20:14 Comment(1)

Oops, this will actually order the U rows by rows in the old_data – Thomasina 18/6, 2017 at 20:18

Try this:

do.call(rbind, lapply(1:4, function(i)structure(dat[,c("ID", "Time", paste0("U",i))], names=c("ID", "Time", "U"))))

Where dat is your data.frame...

Runlet answered 9/4, 2013 at 19:49 Comment(2)

Thanks! It gives me an error though: 'undefined columns selected' here '[.data.frame'(dat, ,[,c("ID", "Time", – Authentic 9/4, 2013 at 20:3

Hi @user2263330, it works for me with dat <- data.frame(ID=1:3, Time=20, U1=1:3, U2=4:6, U3=7:9, U4=10:12). What are the names of your data.frame? – Runlet 9/4, 2013 at 20:8

Recommended topics

Hot tags