If dataframe column value matches dictionary key, check if different column matches dictionary value

Asked 24/5, 2020 at 2:20 Answered 9/2 at 15:22

I have a dataframe with 2 columns of interest. Both are full of strings. I also have a dictionary of mapped key value pairs which are also strings. I'm using the dictionary's keys to filter the dataframe by the first column for only those keys that are in dictionary.

The ultimate goal is to then lookup the first column of the dataframe match it to the key in the dictionary and then confirm column 2's value matches the value to the dictionary.

The filtered dataframe on the keys of interest is working as expected, so I'm left with a dataframe of two columns that have only column keys that are present in the dictionary. The filtered down dataframe can be anywhere from a few rows to thousands of rows but the dictionary is static in length.

A final output should be a dataframe whose contents show rows of the filtered dataframe where the values of the second column don't match the values of the dictionary.

pairs = {'red': 'apple', 'blue': 'blueberry', 'yellow':'banana'}
filtered_data = {'Color':['red', 'blue'], 'Fruit':['appl','blueberry']}
filtered_df = pd.DataFrame(filtered_data)

#so the filtered_df would resemble
Color     Fruit
red       appl
blue      blueberry

for row in filtered_df.iterrows():
   for k,v in pairs.items():
       #Here's where I'd like to check the value of column 1, find it in the dict then if the 
       #values dont match between col 2 in the df and the dict, append the mismatched row to a 
       #new df.
       if row['Color'] == k:
          new_df.append(row).where(row['Fruit'] != v)

I'm sure I need an index with the row in the first for loop but I'm not sure how to format the rest of the nested loop structure.

Ideally, when I export my new_df dataframe in this scenario it would have 1 row with the Color column of red and the Fruit column of appl since it doesn't match the dictionary similar to the below.

Color   Fruit
red     appl

Kermitkermy answered 24/5, 2020 at 2:20 Comment(4)

can you provide the expected data frame? – Halitosis 24/5, 2020 at 2:32

I'm not quite sure but can you try df[df['Color'].map(pairs) != df['Fruit']] and see if that works for you? If not - can you explain what it should give you? – Individuate 24/5, 2020 at 2:51

Just updated what I would hope the resultant df would look like – Kermitkermy 24/5, 2020 at 2:57

@JonClements solution works fine – Boigie 24/5, 2020 at 4:50

color_fruit = pd.Series((tuple(x) for x in filtered_df.values), index=filtered_df.index)
result = filtered_df[~color_fruit.isin(pairs.items())]

The explanation:

In the first row we create a series of tuples (pairs) from columns of original dataframe, with the same index.

Then we use it to filter original dataframe, selecting only rows which do not (~) satisfy the condition being a member (.isin()) of pairs (key, value) from the pairs dictionary.

Maddeu answered 24/5, 2020 at 5:5 Comment(1)

this was also helpful in solving the above problem in addition to the suggestion provided in the comments – Kermitkermy 24/5, 2020 at 19:14

Personally, I would create a data frame from your pairs dictionary and do a left anti join, that will leave us only with matches in your left data frame that don't match the pairs dictionary.

df1 = pd.DataFrame.from_dict(pairs, orient="index", columns=["Fruit"])\
                        .rename_axis("Color")\
                        .reset_index() 

final = pd.merge(filtered_df,df1,on=['Fruit','Color'],how='outer',indicator=True)\
                    .query("_merge == 'left_only'").drop('_merge',axis=1)

print(final)

 Color Fruit
0   red  appl

Contentment answered 24/5, 2020 at 4:8 Comment(1)

Smart, seems so much simplier to convert the dict to a df and utilize merge logic. I'll try that. – Kermitkermy 24/5, 2020 at 4:15

The following one liner does the job

filtered_df[filtered_df['Fruit'] != filtered_df['Color'].map(pairs)]

pandas.Series.map accepts a dictionary as an argument and returns the values that the dictionary maps to when the series values are input.

Gallonage answered 9/2 at 15:22 Comment(2)

OP provided an MRE . Please use the input to validate your claim – Isoline 11/2 at 12:22

@Isoline this actually is a reasonable answer. – Millipede 13/2 at 11:30

Recommended topics

Hot tags