Element-wise XOR in pandas

Asked 26/8, 2015 at 22:33 Answered 3/8, 2023 at 0:52

I know that logical AND is &, and logical OR is | in a Pandas Series, but I was looking for an element-wise logical XOR. I could express it in terms of AND and OR, I suppose, but I'd prefer to use an XOR if one is available.

Thank you!

Blancheblanchette answered 26/8, 2015 at 22:33 Comment(0)

Python XOR: a ^ b

Numpy logical XOR: np.logical_xor(a,b)

Testing performance - result are equal:

1. Sequence of random booleans with size 10000

In [7]: a = np.random.choice([True, False], size=10000)
In [8]: b = np.random.choice([True, False], size=10000)

In [9]: %timeit a ^ b
The slowest run took 7.61 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

In [10]: %timeit np.logical_xor(a,b)
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

2. Sequence of random booleans with size 1000

In [11]: a = np.random.choice([True, False], size=1000)
In [12]: b = np.random.choice([True, False], size=1000)

In [13]: %timeit a ^ b
The slowest run took 21.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

In [14]: %timeit np.logical_xor(a,b)
The slowest run took 19.45 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

3. Sequence of random booleans with size 100

In [15]: a = np.random.choice([True, False], size=100)
In [16]: b = np.random.choice([True, False], size=100)

In [17]: %timeit a ^ b
The slowest run took 33.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 614 ns per loop

In [18]: %timeit np.logical_xor(a,b)
The slowest run took 45.49 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 616 ns per loop

4. Sequence of random booleans with size 10

In [19]: a = np.random.choice([True, False], size=10)
In [20]: b = np.random.choice([True, False], size=10)

In [21]: %timeit a ^ b
The slowest run took 86.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 509 ns per loop

In [22]: %timeit np.logical_xor(a,b)
The slowest run took 40.94 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 511 ns per loop

Dancer answered 27/8, 2015 at 7:37 Comment(1)

python xor operator ^ is overloaded by numpy library to carry out numpy.logical_xor internally. So readers should note that the performance testing results are equal for these, because these are the same. – Espalier 23/11, 2020 at 18:22

I found a way that a^b and np.logical_xor(a,b) are not equivalent that really tripped me up but was a simple fix in the end. Hopefully this saves someone else the headache.

I recently upgraded from Pandas 0.25.3 to 2.0.3 (and numpy from 1.19.0 to 1.24.4) which raised the issue.

Let a be a DataFrame of bool that has duplicates on the Index. Let b be a Series also of bool, where b.index == a.columns.

My intent was to broadcast b to a, and take the element-wise xor of every row of a and b, where any duplicates on a.index should just be passed on to the output.

This code worked on my old setup...

np.logical_xor(a,b.to_frame().T)

...but failed on my new setup:

TypeError: '<' not supported between instances of 'Timestamp' and 'int'

I believe because something about the broadcasting was attempting to concat b (b.index being a meaningless [0]) onto a (with index of Timestamps) I believe to sort it to make it monotonic.

The solution was, as this OP led me to consider🙏:

a^b

The aggravating/wonderful thing is that this also appears to work on my old pandas/numpy "production" setup. Coincidentally this was the first time I ever used "git blame". Answer: "Initial commit" 3 years ago 🤣, so either a^b didn't work in an even older version of Pandas or I didn't know about it.

Guido answered 3/8, 2023 at 0:52 Comment(0)

Recommended topics

Hot tags