Identify only non duplicated rows
Asked Answered
P

1

6

I have a dataset with many duplicated rows, and I would like to isolate only non duplicated values. my df looks something like this

df <- data.frame("group" = c("A", "A", "A","A","A","B","B","B"), 
                    "id" = c("id1", "id2", "id3", "id1", "id2","id1","id2","id1"), 
                    "Val" = c(10,10,10,10,10,12,12,12))

What I would like to extract are only the rows that do not have a duplicate. i.e. my final dataset should look like this

final <- data.frame("group" = c("A","B"), 
                 "id" = c("id3","id2"), 
                 "Val" = c(10,12))

Note I am not interested in finding unique values, but rather non duplicated ones. I know how to find unique values, for instance df %>% distinct() does the job. it is individuating non-duplicated rows that I am struggling with

Pinard answered 27/9, 2019 at 15:48 Comment(0)
J
8

Here is one option.

library(dplyr)
df %>%
   group_by(group) %>% 
   filter(!(duplicated(id)|duplicated(id, fromLast = TRUE)))

Or with dplyr alone

df %>% 
     group_by_all %>%
     filter(n() ==1)

Or in the newer version of dplyr (suggested by @Pål Bjartan)

df %>% 
  group_by(across(everything())) %>% 
  filter(n() ==1)

Or using base R

df[!(duplicated(df[1:2])|duplicated(df[1:2], fromLast = TRUE)),]
Jaime answered 27/9, 2019 at 15:50 Comment(2)
@d.b. Yes, I thought the OP wanted only selected number of columns as groupingJaime
Thanks for your solution. Most helpful. =) Regarding your dplyr solution, scoped verbs (_if, _at, _all) have been superceded by across() in an existing verb. I suggest you update your solution to reflect this: df %>% group_by(across(everything())) %>% filter(n() ==1)Nedanedda

© 2022 - 2024 — McMap. All rights reserved.