Pandas drop duplicates where condition

About

Asked 30/3, 2020 at 9:56 Answered 19/10, 2024 at 10:5

I would like to df.drop_duplicates() based off a subset, but also ignore if a column has a specific value.

For example...

                 v1      v2     v3      
ID                                                          
148         8751704.0    G      dog   
123         9082007.0    G      dog  
123         9082007.0    G      dog 
123         9082007.0    G      cat

I would like to drop duplicate [ID, v1] but ignore if v3 is equal to cat so something like this:

full_df.drop_duplicates([ID, v1], inplace=True, conditional=exclude v3 = cat)

Hope that makes sense

Davao answered 30/3, 2020 at 9:56 Comment(0)

Use boolean indexing with Series.duplicated and pd.Index.duplicated:

df[~(df['v1'].duplicated() & df.index.duplicated()) | df['v3'].eq('cat')]

Output

            v1 v2   v3
ID                    
148  8751704.0  G  dog
123  9082007.0  G  dog
123  9082007.0  G  cat

if IDis not the index:

df[~df[['ID', 'v1']].duplicated() | df['v3'].eq('cat')]

Nadaha answered 30/3, 2020 at 9:58 Comment(3)

When I try to run your second solution I get: KeyError: ('ID', 'v1') – Davao 30/3, 2020 at 10:9

Ahh yes, for speed, what would be the fastest solution in your opinion ? – Davao 30/3, 2020 at 10:10

if ID is the index I think this solution with pd.Index.duplicated is faster because you don't have to use reset_index. But I don't know because I haven't tested it with various sizes of DataFrame. anyway i think both solutions are good – Nadaha 30/3, 2020 at 10:12

You could use chain another condition using a bitwise and, to ensure that v3 is not cat:

df[~(df.reset_index().duplicated(['ID', 'v1']) & df.v3.ne('cat').values).values]

        v1     v2   v3
148  8751704.0  G  dog
123  9082007.0  G  dog
123  9082007.0  G  cat

Accrue answered 30/3, 2020 at 9:59 Comment(0)

how about if you wanted to apply a keep first/last to the above situation eg:

so you kept the 2nd dog:

orig:

    v1     v2   v3

148 8751704.0 G dog 123 9082007.0 G dog 123 9082007.0 G cat

after keep last applied

    v1     v2   v3

123 9082007.0 G dog 123 9082007.0 G cat

Easterling answered 19/10, 2024 at 10:5 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags