how to fix the flake 8 error "E712 comparison to False should be 'if cond is False:' or 'if not cond:'" in pandas dataframe
Asked Answered
S

1

15

I am getting the flake 8 error of E712 at the line "added_parts = new_part_set[(new_part_set["duplicate"] == False) & (new_part_set["version"] == "target")]"**

Following is snippet of code which we used for spreadsheet comparison

source_df = pd.read_excel(self.source, sheet).fillna('NA')
target_df = pd.read_excel(self.target, sheet).fillna('NA')
file_path = os.path.dirname(self.source)

column_list = source_df.columns.tolist()

source_df['version'] = "source"
target_df['version'] = "target"

source_df.sort_values(by=unique_col)
source_df = source_df.reindex()
target_df.sort_values(by=unique_col)
target_df = target_df.reindex()

# full_set = pd.concat([source_df, target_df], ignore_index=True)
diff_panel = pd.concat([source_df, target_df],
                       axis='columns', keys=['df1', 'df2'], join='outer', sort=False)
diff_output = diff_panel.apply(self.__report_diff, axis=0)
diff_output['has_change'] = diff_output.apply(self.__has_change)

full_set = pd.concat([source_df, target_df], ignore_index=True)
changes = full_set.drop_duplicates(subset=column_list, keep='last')
dupe_records = changes.set_index(unique_col).index.unique()

changes['duplicate'] = changes[unique_col].isin(dupe_records)
removed_parts = changes[(changes["duplicate"] == False) & (changes["version"] == "source")]
new_part_set = full_set.drop_duplicates(subset=column_list, keep='last')
new_part_set['duplicate'] = new_part_set[unique_col].isin(dupe_records)
added_parts = new_part_set[(new_part_set["duplicate"] == False) & (new_part_set["version"] == "target")]

diff_file = file_path + "file_diff.xlsx"
if os.path.exists(diff_file):
    os.remove(diff_file)
writer = pd.ExcelWriter(file_path + "file_diff.xlsx")
diff_output.to_excel(writer, "changed")
removed_parts.to_excel(writer, "removed", index=False, columns=column_list)
added_parts.to_excel(writer, "added", index=False, columns=column_list)
writer.save()

Are there any other ways of how this can be avoided, Unsure on proceeding further.

Substation answered 1/2, 2019 at 6:29 Comment(0)
K
15

In your DataFrame masks you have (changes["duplicate"] == False) and (new_part_set["duplicate"] == False) flake8 is suggesting that you should change these. The reason it's complaining is that in python it's considered bad practice to compare to boolean values using the == operator, rather you should write if my_bool:... and if not my_bool:... etc. In pandas if you have a boolean series you can take the negation of it using the ~ operator so your new masks would be written:

~changes["duplicate"] # & ... blah blah
~new_part_set["duplicate"] # & ... blah blah
Kaleb answered 1/2, 2019 at 6:35 Comment(6)
Thank you Harris for the reply. small correction. instead of ! operator, we should use ~Substation
@KomeraObanna you're absolutely right, have updated my answer. Using R is destroying my brain cells.Kaleb
What if the column is not of type boolean? If the possible values are True, False and None, this approach will give *** TypeError: bad operand type for unary ~: 'NoneType'.Kettering
@Kettering you can use fillna to set None values to True/False depending on how do you wanna handle those. Eg: ~pd.Series([True, False, None]).fillna(False)Conditioner
@Conditioner Thank you. In my case the possible values are True, False and None, None meaning some evaluation with a boolean output has not been applied.Kettering
I got the warning "bad operand type for unary ~: 'float'" even though all values were True/False. However, their type wasn't bool, but numpy.bool_. However, I could use ~my_series.astype(bool) to avoid the warning and get valid results.Nitrate

© 2022 - 2024 — McMap. All rights reserved.