In pandas, I'd like to create a computed column that's a boolean operation on two other columns.
In pandas, it's easy to add together two numerical columns. I'd like to do something similar with logical operator AND
. Here's my first try:
In [1]: d = pandas.DataFrame([{'foo':True, 'bar':True}, {'foo':True, 'bar':False}, {'foo':False, 'bar':False}])
In [2]: d
Out[2]:
bar foo
0 True True
1 False True
2 False False
In [3]: d.bar and d.foo ## can't
...
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So I guess logical operators don't work quite the same way as numeric operators in pandas. I tried doing what the error message suggests and using bool()
:
In [258]: d.bar.bool() and d.foo.bool() ## spoiler: this doesn't work either
...
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I found a way that works by casting the boolean columns to int
, adding them together and evaluating as a boolean.
In [4]: (d.bar.apply(int) + d.foo.apply(int)) > 0 ## Logical OR
Out[4]:
0 True
1 True
2 False
dtype: bool
In [5]: (d.bar.apply(int) + d.foo.apply(int)) > 1 ## Logical AND
Out[5]:
0 True
1 False
2 False
dtype: bool
This is convoluted. Is there a better way?
&
and|
in the boolean indexing section – Horton