Removing specific rows from a dataframe

Asked 18/8, 2011 at 19:16 Answered 13/3, 2016 at 20:13

I have a data frame e.g.:

and I would like to remove specific rows that can be identified by the combination of sub and day. For example say I wanted to remove rows where sub='1' and day='2' and sub=3 and day='4'. How could I do this? I realise that I could specify the row numbers, but this needs to be applied to a huge dataframe which would be tedious to go through and ID each row.

Baird answered 18/8, 2011 at 19:16 Comment(0)

DF[ ! ( ( DF$sub ==1 & DF$day==2) | ( DF$sub ==3 & DF$day==4) ) , ]   # note the ! (negation)

Or if sub is a factor as suggested by your use of quotes:

DF[ ! paste(sub,day,sep="_") %in% c("1_2", "3_4"), ]

Could also use subset:

subset(DF,  ! paste(sub,day,sep="_") %in% c("1_2", "3_4") )

(And I endorse the use of which in Dirk's answer when using "[" even though some claim it is not needed.)

Stopoff answered 18/8, 2011 at 19:35 Comment(5)

Thats the ticket. Thanks. What does %in% mean? – Baird 18/8, 2011 at 19:46

It is an example of an infix operator and it returns a logical vector telling you which of the elements in the first argument are contained in the second argument. See ?match where it is defined. – Stopoff 18/8, 2011 at 19:49

@BondedDust and here as well – Agouti 24/3, 2015 at 19:38

@42: I don't understand why this is the answer to the question when it does not consider day at all. – Filamentous 30/5, 2017 at 11:34

@U.Windl I (speaking for BondedDust and myself) agree. Edited. – Stopoff 30/5, 2017 at 19:42

This boils down to two distinct steps:

Figure out when your condition is true, and hence compute a vector of booleans, or, as I prefer, their indices by wrapping it into which()
Create an updated data.frame by excluding the indices from the previous step.

Here is an example:

R> set.seed(42)
R> DF <- data.frame(sub=rep(1:4, each=4), day=sample(1:4, 16, replace=TRUE))
R> DF
   sub day
1    1   4
2    1   4
3    1   2
4    1   4
5    2   3
6    2   3
7    2   3
8    2   1
9    3   3
10   3   3
11   3   2
12   3   3
13   4   4
14   4   2
15   4   2
16   4   4
R> ind <- which(with( DF, sub==2 & day==3 ))
R> ind
[1] 5 6 7
R> DF <- DF[ -ind, ]
R> table(DF)
   day
sub 1 2 3 4
  1 0 1 0 3
  2 1 0 0 0
  3 0 1 3 0
  4 0 2 0 2
R>

And we see that sub==2 has only one entry remaining with day==1.

Edit The compound condition can be done with an 'or' as follows:

ind <- which(with( DF, (sub==1 & day==2) | (sub=3 & day=4) ))

and here is a new full example

R> set.seed(1)
R> DF <- data.frame(sub=rep(1:4, each=5), day=sample(1:4, 20, replace=TRUE))
R> table(DF)
   day
sub 1 2 3 4
  1 1 2 1 1
  2 1 0 2 2
  3 2 1 1 1
  4 0 2 1 2
R> ind <- which(with( DF, (sub==1 & day==2) | (sub==3 & day==4) ))
R> ind
[1]  1  2 15
R> DF <- DF[-ind, ]
R> table(DF)
   day
sub 1 2 3 4
  1 1 0 1 1
  2 1 0 2 2
  3 2 1 1 0
  4 0 2 1 2
R>

Branks answered 18/8, 2011 at 19:23 Comment(5)

ok, I think that will work with a bit of extra help... i need to identify multiple days so I tried your code slightly modified: ind <-which(with(Licor, day=c('1','16','30','37','51','52','57','58'))) but get an error message. Any ideas? – Baird 18/8, 2011 at 19:33

Work on the expression to compute the indies, you may find help(match) useful. – Branks 18/8, 2011 at 19:38

This answer has an explanation on the strategy needed, how to apply the code, and what code to apply, with examples. This was helpful. – Rasla 2/4, 2014 at 23:3

There seems to be a problem if ind is empty: The result is the empty subset instead of the full set (DF). – Filamentous 30/5, 2017 at 11:37

See also bugs.r-project.org/bugzilla3/show_bug.cgi?id=17282 for an explanation. – Filamentous 31/5, 2017 at 6:40

Here's a solution to your problem using dplyr's filter function.

Although you can pass your data frame as the first argument to any dplyr function, I've used its %>% operator, which pipes your data frame to one or more dplyr functions (just filter in this case).

Once you are somewhat familiar with dplyr, the cheat sheet is very handy.

> print(df <- data.frame(sub=rep(1:3, each=4), day=1:4))
   sub day
1    1   1
2    1   2
3    1   3
4    1   4
5    2   1
6    2   2
7    2   3
8    2   4
9    3   1
10   3   2
11   3   3
12   3   4
> print(df <- df %>% filter(!((sub==1 & day==2) | (sub==3 & day==4))))
   sub day
1    1   1
2    1   3
3    1   4
4    2   1
5    2   2
6    2   3
7    2   4
8    3   1
9    3   2
10   3   3

Herringbone answered 9/8, 2015 at 13:7 Comment(0)

One simple solution:

cond1 <- df$sub == 1 & df$day == 2

cond2 <- df$sub == 3 & df$day == 4

df <- df[!(cond1 | cond2),]

Arkansas answered 13/3, 2016 at 20:13 Comment(0)

Recommended topics

Hot tags