I have a dataframe with 2 columns of interest. Both are full of strings. I also have a dictionary of mapped key value pairs which are also strings. I'm using the dictionary's keys to filter the dataframe by the first column for only those keys that are in dictionary.
The ultimate goal is to then lookup the first column of the dataframe match it to the key in the dictionary and then confirm column 2's value matches the value to the dictionary.
The filtered dataframe on the keys of interest is working as expected, so I'm left with a dataframe of two columns that have only column keys that are present in the dictionary. The filtered down dataframe can be anywhere from a few rows to thousands of rows but the dictionary is static in length.
A final output should be a dataframe whose contents show rows of the filtered dataframe where the values of the second column don't match the values of the dictionary.
pairs = {'red': 'apple', 'blue': 'blueberry', 'yellow':'banana'}
filtered_data = {'Color':['red', 'blue'], 'Fruit':['appl','blueberry']}
filtered_df = pd.DataFrame(filtered_data)
#so the filtered_df would resemble
Color Fruit
red appl
blue blueberry
for row in filtered_df.iterrows():
for k,v in pairs.items():
#Here's where I'd like to check the value of column 1, find it in the dict then if the
#values dont match between col 2 in the df and the dict, append the mismatched row to a
#new df.
if row['Color'] == k:
new_df.append(row).where(row['Fruit'] != v)
I'm sure I need an index with the row in the first for loop but I'm not sure how to format the rest of the nested loop structure.
Ideally, when I export my new_df
dataframe in this scenario it would have 1 row with the Color column of red and the Fruit column of appl since it doesn't match the dictionary similar to the below.
Color Fruit
red appl
df[df['Color'].map(pairs) != df['Fruit']]
and see if that works for you? If not - can you explain what it should give you? – Individuate