Pandas drop duplicates where condition
Asked Answered
D

3

5

I would like to df.drop_duplicates() based off a subset, but also ignore if a column has a specific value.

For example...

                 v1      v2     v3      
ID                                                          
148         8751704.0    G      dog   
123         9082007.0    G      dog  
123         9082007.0    G      dog 
123         9082007.0    G      cat   

I would like to drop duplicate [ID, v1] but ignore if v3 is equal to cat so something like this:

full_df.drop_duplicates([ID, v1], inplace=True, conditional=exclude v3 = cat)

Hope that makes sense

Davao answered 30/3, 2020 at 9:56 Comment(0)
N
7

Use boolean indexing with Series.duplicated and pd.Index.duplicated:

df[~(df['v1'].duplicated() & df.index.duplicated()) | df['v3'].eq('cat')]

Output

            v1 v2   v3
ID                    
148  8751704.0  G  dog
123  9082007.0  G  dog
123  9082007.0  G  cat

if IDis not the index:

df[~df[['ID', 'v1']].duplicated() | df['v3'].eq('cat')]
Nadaha answered 30/3, 2020 at 9:58 Comment(3)
When I try to run your second solution I get: KeyError: ('ID', 'v1')Davao
Ahh yes, for speed, what would be the fastest solution in your opinion ?Davao
if ID is the index I think this solution with pd.Index.duplicated is faster because you don't have to use reset_index. But I don't know because I haven't tested it with various sizes of DataFrame. anyway i think both solutions are goodNadaha
A
6

You could use chain another condition using a bitwise and, to ensure that v3 is not cat:

df[~(df.reset_index().duplicated(['ID', 'v1']) & df.v3.ne('cat').values).values]

        v1     v2   v3
148  8751704.0  G  dog
123  9082007.0  G  dog
123  9082007.0  G  cat
Accrue answered 30/3, 2020 at 9:59 Comment(0)
E
0

how about if you wanted to apply a keep first/last to the above situation eg:

so you kept the 2nd dog:

orig:

    v1     v2   v3

148 8751704.0 G dog 123 9082007.0 G dog 123 9082007.0 G cat

after keep last applied

    v1     v2   v3

123 9082007.0 G dog 123 9082007.0 G cat

Easterling answered 19/10 at 10:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.