Why does pandas "None | True" return False when Python "None or True" returns True?
Asked Answered
D

2

25

In pure Python, None or True returns True.
However with pandas when I'm doing a | between two Series containing None values, results are not as I expected:

>>> df.to_dict()
{'buybox': {0: None}, 'buybox_y': {0: True}}
>>> df
    buybox  buybox_y
0   None    True

>>> df['buybox'] = (df['buybox'] | df['buybox_y'])
>>> df
    buybox  buybox_y
0   False   True

Expected result:

>>> df
    buybox  buybox_y
0   True    True

I get the result I want by applying the OR operation twice, but I don't get why I should do this.

I'm not looking for a workaround (I have it by applying df['buybox'] = (df['buybox'] | df['buybox_y']) twice in a row) but an explanation, thus the 'why' in the title.

Demars answered 6/4, 2021 at 14:33 Comment(18)
| and or are two entirely different operators. Note that None | True produces a type error.Septimal
@chepner: Yeah, but Pandas uses | for logical or, and we're not getting a TypeError. We're getting False somehow.Bratton
What Pandas version are you on?Bratton
Pandas doc (pandas.pydata.org/pandas-docs/stable/user_guide/…) specifies that | is used for logical or and not bitwise or. My pandas version is 1.2.0Demars
df.any(axis=1) works somehow :-).Husted
can you replicate on columns with dtype other than Object?Outfit
"Somehow" would appear to mean that __or__ is implemented to convert None to a bool first. or isn't really a boolean operator, but it uses boolean equivalents to determine which argument to return.Septimal
Additional weirdness: if you switch the argument order, you get True instead!Bratton
Also, this is likely a bug: None is interpreted as truthy when evaluating the or | and as falsey when converted to boolean. The second part is easy to verify as df['buybox'].astype(bool) gets to False.Outfit
Huh... experiment actually contradicts the Pandas documentation. The docs say Pandas logical operations on NaN always return False, but pandas.Series([True]) | pandas.Series([nan]) has a True instead of False in the result. (Putting the NaN first gives False.)Bratton
@norok2: If None were treated as truthy in the |, then we'd get True, not False.Bratton
@user2357112supportsMonica no, you would get the object, not True. Compare with 1 or True -> 1. Likely, | is short-circuiting and not even caring what is on the other side, as your finding of swapping the order of operands suggests.Outfit
the 'boolean' dtype seems to have NaN treated properly.Choate
There's a related issue on the tracker for NaN. It looks like this is just treated as known weirdness.Bratton
Related: https://mcmap.net/q/539770/-broken-symmetry-of-operations-between-boolean-pandas-series-with-unequal-index/9067615Analgesia
Note that we don't particularly deal in "why"s here. We deal in concrete, practical questions with concrete answers; a "why" doesn't always have a rationale, beyond "that scenario wasn't included during design and failed to be considered". See f/e What is the rationale for closing "why" questions on language design?Pastoral
@CharlesDuffy I don't see the question as that type of why. This why is more of a "This code does something else from what I would expect. What am I overlooking? Where is my mistake?" which to me seems like a very common and meaningful type of question on Stack Overflow. And pointing to how the or operators are defined in pandas, or what bug this behaviour is a consequence of (I don't know which is the case), would answer the question. The OP doesn't ask why the operators are defined like that or why there is a bug; only in those cases would it be a why of the type you mention.Cosmetician
@Jesper, I generally agree; it's that the comments asserting that there is a bug were ignored / treated as nonresponsive by the OP (and the question had a bounty added with a message refocusing on the interest being an explanation rather than a workaround) that led to the above comment.Pastoral
K
20

Pandas | operator does not rely on Python or expression, and behaves differently.

If both operands are boolean, the result is mathematically defined, and the same for Python and Pandas.

But in your case series "buybox" is of type object, and "buybox_y" is bool. In this case Pandas | operator is not commutative:

  • right operand is coerced to boolean
  • then bitwise or is attempted
    • None | True is invalid operation, resulting in None
  • and result is coerced to boolean

Thus,

>>> df['buybox'] | df['buybox_y']
0  False

>>> df['buybox_y'] | df['buybox']
0  True

For predictable results, you can clean up data, and cast to boolean type with Pandas astype before attempting boolean operations.

Kratz answered 9/4, 2021 at 21:28 Comment(0)
S
-1

For Boolean objects (ie Py_True and Py_False), the code will enter the fast processing branch; for other objects, PyObject_IsTrue() will be used to calculate a value of type int.

During the calculation process, the PyObject_IsTrue() function will obtain the values ​​of nb_bool, mp_length, and sq_length in turn, which should correspond to the return values ​​of the two magic methods bool() and len().

Subaudition answered 15/4, 2021 at 5:17 Comment(1)
This may well be true and interesting information about how or works in CPython 🙂, but the issue in this question is entirely different, because it's how the | operator between two pandas Series works, which is a completely different implementation and doesn't match either pure Python or or |.Willtrude

© 2022 - 2024 — McMap. All rights reserved.