Select first 80 observations for each level in R

Asked 23/5, 2013 at 19:41 Answered 23/5, 2013 at 20:14

I have a data set that looks like this:

structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2, 
0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5, 
418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0), 
    GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA, 
    0, 0, 0, 0, 0), TID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("t1", 
    "t10", "t11", "t12", "t13", "t14", "t15", "t16", "t17", "t18", 
    "t19", "t2", "t20", "t21", "t22", "t23", "t24", "t25", "t3", 
    "t4", "t5", "t6", "t7", "t8", "t9"), class = "factor")), .Names = c("A", 
"T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA, 
6L), class = "data.frame")

I want to select the first 80 observations of all variables for each TID. So far, I can do this with the first TID only using the code:

sub.data1<-NM[1:80, ]

How can I do it for all my other TIDs?

Thanks!

Quadrennium answered 23/5, 2013 at 19:41 Comment(0)

Using function ddply() from plyr you can split data by TID and then select forst 80 with head() and then put all again in one data frame,

library(plyr)
ddply(NM, .(TID), head, n = 80)

Boeotia answered 23/5, 2013 at 19:48 Comment(1)

+1! Probably there is no need for the lambda function, ddply(NM, .(TID), head, n = 80) should work. – Knawel 23/5, 2013 at 19:51

I would do:

lapply(split(dat, dat$TID), head, 80)

It returns a list of data.frames with 80 (or less) rows. If instead you want everything into one data.frame:

do.call(rbind, lapply(split(dat, dat$TID), head, 80))

Dingdong answered 23/5, 2013 at 19:46 Comment(1)

sorry I forgot to mention that I want to retain all the other variables too. – Quadrennium 23/5, 2013 at 19:51

Using function ddply() from plyr you can split data by TID and then select forst 80 with head() and then put all again in one data frame,

library(plyr)
ddply(NM, .(TID), head, n = 80)

Boeotia answered 23/5, 2013 at 19:48 Comment(1)

+1! Probably there is no need for the lambda function, ddply(NM, .(TID), head, n = 80) should work. – Knawel 23/5, 2013 at 19:51

Using data tables, I made a shorter example with just TIDs t1 and t2 that returns the first 2 rows of t1 and t2. It can be adjusted for your data.

library(data.table)
data<-structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
                "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
                "25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2, 
                0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5, 
                        418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0), 
                GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA, 
                        0, 0, 0, 0, 0), TID = c("t1","t1","t1","t2","t2","t2")), .Names = c("A", 
                "T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA, 
                6L), class = "data.frame")
dt<-data.table(data)
dt[,head(.SD,2),by=TID]

This results in:

   TID A    T     X     Y V GD ND ND2
1:  t1 1 0.04 464.4 418.5 0  0 NA  NA
2:  t1 1 0.08 464.4 418.5 0  0  0   0
3:  t2 1 0.16 464.4 418.5 0  0  0   0
4:  t2 1 0.20 464.4 418.5 0  0  0   0

and can be changed back to a data frame if desired by changing the last line to

as.data.frame(dt[,head(.SD,2),by=TID])

Dennet answered 23/5, 2013 at 20:5 Comment(0)

Here is another solution in base:

do.call(rbind, by(NM, NM$TID, head, 80))

Coolish answered 23/5, 2013 at 20:14 Comment(0)

Recommended topics

Hot tags