I know that logical AND is &, and logical OR is | in a Pandas Series, but I was looking for an element-wise logical XOR. I could express it in terms of AND and OR, I suppose, but I'd prefer to use an XOR if one is available.
Thank you!
I know that logical AND is &, and logical OR is | in a Pandas Series, but I was looking for an element-wise logical XOR. I could express it in terms of AND and OR, I suppose, but I'd prefer to use an XOR if one is available.
Thank you!
Python XOR: a ^ b
Numpy logical XOR: np.logical_xor(a,b)
Testing performance - result are equal:
1. Sequence of random booleans with size 10000
In [7]: a = np.random.choice([True, False], size=10000)
In [8]: b = np.random.choice([True, False], size=10000)
In [9]: %timeit a ^ b
The slowest run took 7.61 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop
In [10]: %timeit np.logical_xor(a,b)
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop
2. Sequence of random booleans with size 1000
In [11]: a = np.random.choice([True, False], size=1000)
In [12]: b = np.random.choice([True, False], size=1000)
In [13]: %timeit a ^ b
The slowest run took 21.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop
In [14]: %timeit np.logical_xor(a,b)
The slowest run took 19.45 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop
3. Sequence of random booleans with size 100
In [15]: a = np.random.choice([True, False], size=100)
In [16]: b = np.random.choice([True, False], size=100)
In [17]: %timeit a ^ b
The slowest run took 33.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 614 ns per loop
In [18]: %timeit np.logical_xor(a,b)
The slowest run took 45.49 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 616 ns per loop
4. Sequence of random booleans with size 10
In [19]: a = np.random.choice([True, False], size=10)
In [20]: b = np.random.choice([True, False], size=10)
In [21]: %timeit a ^ b
The slowest run took 86.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 509 ns per loop
In [22]: %timeit np.logical_xor(a,b)
The slowest run took 40.94 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 511 ns per loop
I found a way that a^b
and np.logical_xor(a,b)
are not equivalent that really tripped me up but was a simple fix in the end. Hopefully this saves someone else the headache.
I recently upgraded from Pandas 0.25.3 to 2.0.3 (and numpy from 1.19.0 to 1.24.4) which raised the issue.
Let a
be a DataFrame
of bool
that has duplicates on the Index
.
Let b
be a Series
also of bool
, where b.index == a.columns
.
My intent was to broadcast b
to a
, and take the element-wise xor of every row of a
and b
, where any duplicates on a.index
should just be passed on to the output.
This code worked on my old setup...
np.logical_xor(a,b.to_frame().T)
...but failed on my new setup:
TypeError: '<' not supported between instances of 'Timestamp' and 'int'
I believe because something about the broadcasting was attempting to concat b
(b.index
being a meaningless [0]
) onto a
(with index of Timestamps) I believe to sort it to make it monotonic.
The solution was, as this OP led me to consider🙏:
a^b
The aggravating/wonderful thing is that this also appears to work on my old pandas/numpy "production" setup. Coincidentally this was the first time I ever used "git blame". Answer: "Initial commit" 3 years ago 🤣, so either a^b
didn't work in an even older version of Pandas or I didn't know about it.
© 2022 - 2024 — McMap. All rights reserved.