Pandas Boolean Operation in a Python List
Asked Answered
I

2

7

I understand that pandas dataframe type has an ability to test the logic of it's value.

here's the code:

import pandas as pd
data = pd.DataFrame(columns=['a', 'b', 'c'])
data = data.append({'a': 'I have data', 'b': 'no more complexe', 'c': 024204}, ignore_index=True)
data = data.append({'a': 'audoausd', 'b': '2048rafaf', 'c': 29313}, ignore_index=True)
data = data.append({'a': 'koplak ente gan', 'b': 'ente g bisa koplak', 'c': 29313}, ignore_index=True)

now we have the following dataframe:

                 a                   b      c
0      I have data    no more complexe  10372
1         audoausd           2048rafaf  29313
2  koplak ente gan  ente g bisa koplak  29313

test the logic value for column c and save it to a variable

c = data.c > 20000

will set c to the following value

0    False
1     True
2     True
Name: c, dtype: bool

test the logic value for column b and save it to a variable

b = data.b.str.contains('koplak')

b value

0    False
1    False
2     True
Name: b, dtype: bool

and also for column a

a = data.a.str.contains('koplak')

a value

0    False
1    False
2     True
Name: b, dtype: bool

when i compare all of this values by doing a & b & c will return:

0    False
1    False
2     True
dtype: bool

it's not well fashioned to hard code in case there are many columns involve, so i try to make a list containing all columns logic

logic = [a, b, c]

how do i compare all the items automatically to get a & b & c result?

Interviewee answered 11/1, 2014 at 3:37 Comment(0)
N
12

a & b & c is equivalent to

import functools
print(functools.reduce(lambda x,y: x & y, [a, b, c]))

which yields

0    False
1    False
2     True
dtype: bool

Unlike my original answer below (suggesting np.logical_and.reduce), I am confident functools.reduce(lambda x,y: x & y, [a, b, c]) will faithfully return the same Series as a & b & c.

(In Python2.7, reduce is a builtin function. functools.reduce is the same function as reduce. In Python3, reduce was removed from the builtins and only functools.reduce remains. So to future-proof your code, use functools.reduce.)


Edit: Using np.logical_and.reduce([logic]) may not work in all situations. Here is a counterexample:

import pandas as pd
import numpy as np
x = pd.Series([True,True,False,False], index=[1,2,3,4]) 
y = pd.Series([True,True,False,False], index=[1,2,3,4]) 
print(x & y)

prints

1     True
2     True
3    False
4    False
dtype: bool

but np.logical_and.reduce([x,y]) raises a ValueError

    print(np.logical_and.reduce([x,y]))
  File "/data1/unutbu/.virtualenvs/dev/local/lib/python2.7/site-packages/pandas-0.13.0_98_gd9b0c1f-py2.7-linux-i686.egg/pandas/core/generic.py", line 665, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Narcotize answered 11/1, 2014 at 3:51 Comment(2)
this is pretty useful; can u do a PR to add to the cookbook ? you can use this link with a nice title/descriptionGuyer
I had the same problem but with a logical OR (|) and I came up with sum(my_list_of_serieses).astype(bool).Negrete
D
0

I would use np.all()

import pandas as pd
import numpy as np

data = pd.DataFrame(columns=['a', 'b', 'c'])
data = data.append({'a': 'I have data', 'b': 'no more complexe', 'c': 024204}, ignore_index=True)
data = data.append({'a': 'audoausd', 'b': '2048rafaf', 'c': 29313}, ignore_index=True)
data = data.append({'a': 'koplak ente gan', 'b': 'ente g bisa koplak', 'c': 29313}, ignore_index=True)

a = data.a.str.contains('koplak')
b = data.b.str.contains('koplak')
c = data.c > 20000

logic = [a, b, c]

result = np.all(logic, axis=0)
Deep answered 10/6 at 0:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.