Working with dataframes in a list: Drop variables, add new ones

Asked 18/6, 2011 at 21:31 Answered 18/6, 2011 at 23:19

Solved r lapply data-manipulation data-management

Define a list dats with two dataframes, df1 and df2

dats <- list( df1 = data.frame(a=sample(1:3), b = sample(11:13)),
    df2 = data.frame(a=sample(1:3), b = sample(11:13)))

> dats
$df1
  a  b
1 2 12
2 3 11
3 1 13

$df2
  a  b
1 3 13
2 2 11
3 1 12

I would like to drop variable a in each data frame. Next I would like to add a variable with the id of each dataframe from an external dataframe, like:

ids <- data.frame(id=c("id1","id2"),df=c("df1","df2"))
> ids
  id  df
1 id1 df1
2 id2 df2

To drop unnecessary vars I tried this without luck:

> dats <- lapply(dats, function(x) assign(x, x[,c("b")]))  
> Error in assign(x, x[, c("b")]) : invalid first argument

Not sure how to add the id either.

I also tried, perhaps more appropriately:

> temp <- lapply(dats, function(x) subset(x[1], select=x[[1]]$b))
Error in x[[1]]$b : $ operator is invalid for atomic vectors

What I find confusing is that str(out[1]) returns a list, str(out[[1]]) returns a dataframe. I think that may have something to do with it.

Junko answered 18/6, 2011 at 21:31 Comment(1)

feel free to rollback if you disagree w/ my edit. I like the question. – Argeliaargent 18/6, 2011 at 21:35

Or try this: Extract your ids into a named vector that maps the data-frame name to the id:

df2id <- ids$id
names(df2id) <- ids$df

> df2id
df1 df2 
id1 id2 
Levels: id1 id2

Then use mapply to both (a) drop the a column from each data-frame, and (b) add the id column:

> mapply( function(d,x) cbind( subset(d, select = -a),
+                              id = x),
+         dats, df2id[ names(dats) ] ,
+         SIMPLIFY=FALSE)
$df1
   b  id
1 12 id1
2 11 id1
3 13 id1

$df2
   b  id
1 12 id2
2 11 id2
3 13 id2

Note that we are passing df2id[ names(dats) ] to the mapply -- this ensures that the data-frames in df2id are "aligned" with the data-frames in dats.

Whodunit answered 18/6, 2011 at 23:19 Comment(0)

Is this OK?

dats <- list( df1 = data.frame(a=sample(1:3), b = sample(11:13)),
    df2 = data.frame(a=sample(1:3), b = sample(11:13)))

ids <- data.frame(id=c("id1","id2"),df=c("df1","df2"))

# remove variable a
dats2 <- lapply(dats, function(x) x[,!names(x) == "a"])

# add id
for(i in 1:length(dats2)) {
  dats2[[i]] <- merge(dats2[[i]], ids$id[ids$df == names(dats2)[i]])
}

dats2

  $df1
     x   y
  1 11 id1
  2 12 id1
  3 13 id1

  $df2
     x   y
  1 11 id2
  2 12 id2
  3 13 id2

Gape answered 18/6, 2011 at 22:39 Comment(2)

+1 Very concise. In the real application I was trying to sort ids data frame so it had same order as dats2, then loop over one of the dataframes. The names(dats2)[i] was an eye opener here. it allows to use merge and let r sort on he fly. – Junko 18/6, 2011 at 22:55

Is there a way to specify a name for the column being appended inside the same line. I see the default in may application is to call it y. Could it be specified as id? – Junko 18/6, 2011 at 23:0

Recommended topics

Hot tags