Removing specific rows from a dataframe
Asked Answered
B

4

26

I have a data frame e.g.:

sub   day
1      1
1      2
1      3
1      4
2      1
2      2
2      3
2      4
3      1
3      2
3      3
3      4

and I would like to remove specific rows that can be identified by the combination of sub and day. For example say I wanted to remove rows where sub='1' and day='2' and sub=3 and day='4'. How could I do this? I realise that I could specify the row numbers, but this needs to be applied to a huge dataframe which would be tedious to go through and ID each row.

Baird answered 18/8, 2011 at 19:16 Comment(0)
S
36
DF[ ! ( ( DF$sub ==1 & DF$day==2) | ( DF$sub ==3 & DF$day==4) ) , ]   # note the ! (negation)

Or if sub is a factor as suggested by your use of quotes:

DF[ ! paste(sub,day,sep="_") %in% c("1_2", "3_4"), ]

Could also use subset:

subset(DF,  ! paste(sub,day,sep="_") %in% c("1_2", "3_4") )

(And I endorse the use of which in Dirk's answer when using "[" even though some claim it is not needed.)

Stopoff answered 18/8, 2011 at 19:35 Comment(5)
Thats the ticket. Thanks. What does %in% mean?Baird
It is an example of an infix operator and it returns a logical vector telling you which of the elements in the first argument are contained in the second argument. See ?match where it is defined.Stopoff
@BondedDust and here as wellAgouti
@42: I don't understand why this is the answer to the question when it does not consider day at all.Filamentous
@U.Windl I (speaking for BondedDust and myself) agree. Edited.Stopoff
B
16

This boils down to two distinct steps:

  1. Figure out when your condition is true, and hence compute a vector of booleans, or, as I prefer, their indices by wrapping it into which()
  2. Create an updated data.frame by excluding the indices from the previous step.

Here is an example:

R> set.seed(42)
R> DF <- data.frame(sub=rep(1:4, each=4), day=sample(1:4, 16, replace=TRUE))
R> DF
   sub day
1    1   4
2    1   4
3    1   2
4    1   4
5    2   3
6    2   3
7    2   3
8    2   1
9    3   3
10   3   3
11   3   2
12   3   3
13   4   4
14   4   2
15   4   2
16   4   4
R> ind <- which(with( DF, sub==2 & day==3 ))
R> ind
[1] 5 6 7
R> DF <- DF[ -ind, ]
R> table(DF)
   day
sub 1 2 3 4
  1 0 1 0 3
  2 1 0 0 0
  3 0 1 3 0
  4 0 2 0 2
R> 

And we see that sub==2 has only one entry remaining with day==1.

Edit The compound condition can be done with an 'or' as follows:

ind <- which(with( DF, (sub==1 & day==2) | (sub=3 & day=4) ))

and here is a new full example

R> set.seed(1)
R> DF <- data.frame(sub=rep(1:4, each=5), day=sample(1:4, 20, replace=TRUE))
R> table(DF)
   day
sub 1 2 3 4
  1 1 2 1 1
  2 1 0 2 2
  3 2 1 1 1
  4 0 2 1 2
R> ind <- which(with( DF, (sub==1 & day==2) | (sub==3 & day==4) ))
R> ind
[1]  1  2 15
R> DF <- DF[-ind, ]
R> table(DF)
   day
sub 1 2 3 4
  1 1 0 1 1
  2 1 0 2 2
  3 2 1 1 0
  4 0 2 1 2
R> 
Branks answered 18/8, 2011 at 19:23 Comment(5)
ok, I think that will work with a bit of extra help... i need to identify multiple days so I tried your code slightly modified: ind <-which(with(Licor, day=c('1','16','30','37','51','52','57','58'))) but get an error message. Any ideas?Baird
Work on the expression to compute the indies, you may find help(match) useful.Branks
This answer has an explanation on the strategy needed, how to apply the code, and what code to apply, with examples. This was helpful.Rasla
There seems to be a problem if ind is empty: The result is the empty subset instead of the full set (DF).Filamentous
See also bugs.r-project.org/bugzilla3/show_bug.cgi?id=17282 for an explanation.Filamentous
H
11

Here's a solution to your problem using dplyr's filter function.

Although you can pass your data frame as the first argument to any dplyr function, I've used its %>% operator, which pipes your data frame to one or more dplyr functions (just filter in this case).

Once you are somewhat familiar with dplyr, the cheat sheet is very handy.

> print(df <- data.frame(sub=rep(1:3, each=4), day=1:4))
   sub day
1    1   1
2    1   2
3    1   3
4    1   4
5    2   1
6    2   2
7    2   3
8    2   4
9    3   1
10   3   2
11   3   3
12   3   4
> print(df <- df %>% filter(!((sub==1 & day==2) | (sub==3 & day==4))))
   sub day
1    1   1
2    1   3
3    1   4
4    2   1
5    2   2
6    2   3
7    2   4
8    3   1
9    3   2
10   3   3
Herringbone answered 9/8, 2015 at 13:7 Comment(0)
A
2

One simple solution:

cond1 <- df$sub == 1 & df$day == 2

cond2 <- df$sub == 3 & df$day == 4

df <- df[!(cond1 | cond2),]

Arkansas answered 13/3, 2016 at 20:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.