How to convert multiple columns to individual rows in R

Asked 9/12, 2013 at 5:59 Answered 9/12, 2013 at 6:45

I have a data frame in R that has many rows (over 3000) with F0 (fundamental frequency) tracks of an utterance in it. The rows have the following information in them: speaker ID, group #, repetition #, accent type, sex, and then 50 columns of F0 points. The data looks like this:

Speaker Sex Group Repetition Accent    Word         1         2         3        4
    105   M     1          1      N AILMENT 102.31030 102.31030 102.31030 102.31127 
    105   M     1          1      N COLLEGE 111.80641 111.80313 111.68612 111.36020
    105   M     1          1      N  FATHER 124.06655 124.06655 124.06655 124.06655

But instead of only going to X4, it has 50 points per row, so I have a 3562x56 data frame. I want to change it so each column of data in the F0 track (so after word, from 1:50) gets its own column, with the associated column number as another row. I want to keep all of the information in the first six columns with each data point as well, so it would look like this:

Speaker Sex Group Repetition Accent    Word       Num        F0
    105   M     1          1      N AILMENT         1 102.31030
    105   M     1          1      N AILMENT         2 102.31030
    105   M     1          1      N AILMENT         3 102.31030
    105   M     1          1      N AILMENT         4 102.31127
    ...
    105   M     1          1      N COLLEGE         1 111.80641 
    105   M     1          1      N COLLEGE         1 111.80313 
    105   M     1          1      N COLLEGE         1 111.68612 
    105   M     1          1      N COLLEGE         1 111.36020 
    ...

The code I tried to use, while tedious, is as follows:

x = 1
for (i in 1:dim(normrangef0)[1]) {
     for (j in 1:50) {
             norm.all$Speaker[x] <- normrangef0$Speaker[i]
             norm.all$Sex[x] <- normrangef0$Sex[i]
             norm.all$Group[x] <- normrangef0$Group[i]
             norm.all$Repetition[x] <- normrangef0$Repetition[i]
             norm.all$Word[x] <- normrangef0$Word[i]
             norm.all$Accent[x] <- normrangef0$Accent[i]
             norm.all$Time[x] <- j
             norm.all$F0[x] <- normrangef0[i,j+6]
             x = x+1    
    }
}

However, when I do this with norm.all as a NULL object (just defined by norm.all = c() ), I end up with a list of over 200k items, many of which are NAs. When I define norm.all as a data frame (either an empty one or one of all 0s, in the 178100x8 data frame, I get an error:

Error in $<-.data.frame(*tmp*, "Speaker", value = 105L) : replacement has 1 row, data has 0

Is my code just totally off? Is there another way to do this?

Intravenous answered 9/12, 2013 at 5:59 Comment(2)

what language is this supposed to be? C? – Ballot 9/12, 2013 at 6:31

Sorry about that! It's in R. – Intravenous 9/12, 2013 at 6:32

Use melt from "reshape2"

library(reshape2)
melt(mydf, id.vars=c("Speaker", "Sex", "Group", "Repetition", "Accent", "Word"))
#    Speaker Sex Group Repetition Accent    Word variable    value
# 1      105   M     1          1      N AILMENT        1 102.3103
# 2      105   M     1          1      N COLLEGE        1 111.8064
# 3      105   M     1          1      N  FATHER        1 124.0666
# 4      105   M     1          1      N AILMENT        2 102.3103
# 5      105   M     1          1      N COLLEGE        2 111.8031
# 6      105   M     1          1      N  FATHER        2 124.0666
# 7      105   M     1          1      N AILMENT        3 102.3103
# 8      105   M     1          1      N COLLEGE        3 111.6861
# 9      105   M     1          1      N  FATHER        3 124.0666
# 10     105   M     1          1      N AILMENT        4 102.3113
# 11     105   M     1          1      N COLLEGE        4 111.3602
# 12     105   M     1          1      N  FATHER        4 124.0666

In base R, you can also use stack to stack the columns named 1 through 4, and cbind that with the first group of columns. Alternatively, unlist will also do this.

You may also want to look into the "data.table" package to get a bit of a speed boost.

Conjugate answered 9/12, 2013 at 6:42 Comment(4)

What does this message indicate? Warning message: attributes are not identical across measure variables; they will be dropped . I checked my results and they look fine - but would like to confirm the meaning of the warning. – Paddle 18/2, 2015 at 1:2

ok the data is not looking fine. I am losing info in the resulting data.frame. instead of picking all the values from the variable, I am getting only the first value for each row! – Paddle 18/2, 2015 at 1:13

@vagabond, that warning is usually when you are melting columns of different types (numeric and integer together, for example, or character and factor together). I'm not sure how to interpret your second comment without some sample data to work with. – Conjugate 18/2, 2015 at 3:46

I'll post it as a question. I did try to convert all measure variables to the same class but it didn't work - same warning message. Will post question. – Paddle 18/2, 2015 at 20:46

With reshape:

x <- read.table(header=T, text="Speaker Sex Group Repetition Accent    Word         1         2         3        4
105   M     1          1      N AILMENT 102.31030 102.31030 102.31030 102.31127
105   M     1          1      N COLLEGE 111.80641 111.80313 111.68612 111.36020
105   M     1          1      N  FATHER 124.06655 124.06655 124.06655 124.06655")

reshape(x, direction="long", sep='', varying=paste0('X', 1:4))
##     Speaker Sex Group Repetition Accent    Word time        X id
## 1.1     105   M     1          1      N AILMENT    1 102.3103  1
## 2.1     105   M     1          1      N COLLEGE    1 111.8064  2
## 3.1     105   M     1          1      N  FATHER    1 124.0666  3
## 1.2     105   M     1          1      N AILMENT    2 102.3103  1
## 2.2     105   M     1          1      N COLLEGE    2 111.8031  2
## 3.2     105   M     1          1      N  FATHER    2 124.0666  3
## 1.3     105   M     1          1      N AILMENT    3 102.3103  1
## 2.3     105   M     1          1      N COLLEGE    3 111.6861  2
## 3.3     105   M     1          1      N  FATHER    3 124.0666  3
## 1.4     105   M     1          1      N AILMENT    4 102.3113  1
## 2.4     105   M     1          1      N COLLEGE    4 111.3602  2
## 3.4     105   M     1          1      N  FATHER    4 124.0666  3

Abacist answered 9/12, 2013 at 6:45 Comment(0)

Recommended topics

Hot tags