Why does pandas "None | True" return False when Python "None or True" returns True?

Asked 6/4, 2021 at 14:33 Answered 15/4, 2021 at 5:17

Solved python python-3.x pandas logical-or

In pure Python, None or True returns True.
However with pandas when I'm doing a | between two Series containing None values, results are not as I expected:

>>> df.to_dict()
{'buybox': {0: None}, 'buybox_y': {0: True}}
>>> df
    buybox  buybox_y
0   None    True

>>> df['buybox'] = (df['buybox'] | df['buybox_y'])
>>> df
    buybox  buybox_y
0   False   True

Expected result:

>>> df
    buybox  buybox_y
0   True    True

I get the result I want by applying the OR operation twice, but I don't get why I should do this.

I'm not looking for a workaround (I have it by applying df['buybox'] = (df['buybox'] | df['buybox_y']) twice in a row) but an explanation, thus the 'why' in the title.

Demars answered 6/4, 2021 at 14:33 Comment(18)

| and or are two entirely different operators. Note that None | True produces a type error. – Septimal 6/4, 2021 at 14:35

@chepner: Yeah, but Pandas uses | for logical or, and we're not getting a TypeError. We're getting False somehow. – Bratton 6/4, 2021 at 14:37

What Pandas version are you on? – Bratton 6/4, 2021 at 14:38

Pandas doc (pandas.pydata.org/pandas-docs/stable/user_guide/…) specifies that | is used for logical or and not bitwise or. My pandas version is 1.2.0 – Demars 6/4, 2021 at 14:39

df.any(axis=1) works somehow :-). – Husted 6/4, 2021 at 14:39

can you replicate on columns with dtype other than Object? – Outfit 6/4, 2021 at 14:50

"Somehow" would appear to mean that __or__ is implemented to convert None to a bool first. or isn't really a boolean operator, but it uses boolean equivalents to determine which argument to return. – Septimal 6/4, 2021 at 14:53

Additional weirdness: if you switch the argument order, you get True instead! – Bratton 6/4, 2021 at 14:55

Also, this is likely a bug: None is interpreted as truthy when evaluating the or | and as falsey when converted to boolean. The second part is easy to verify as df['buybox'].astype(bool) gets to False. – Outfit 6/4, 2021 at 14:55

Huh... experiment actually contradicts the Pandas documentation. The docs say Pandas logical operations on NaN always return False, but pandas.Series([True]) | pandas.Series([nan]) has a True instead of False in the result. (Putting the NaN first gives False.) – Bratton 6/4, 2021 at 14:59

@norok2: If None were treated as truthy in the |, then we'd get True, not False. – Bratton 6/4, 2021 at 15:0

@user2357112supportsMonica no, you would get the object, not True. Compare with 1 or True -> 1. Likely, | is short-circuiting and not even caring what is on the other side, as your finding of swapping the order of operands suggests. – Outfit 6/4, 2021 at 15:1

the 'boolean' dtype seems to have NaN treated properly. – Choate 6/4, 2021 at 15:2

There's a related issue on the tracker for NaN. It looks like this is just treated as known weirdness. – Bratton 6/4, 2021 at 15:11

Note that we don't particularly deal in "why"s here. We deal in concrete, practical questions with concrete answers; a "why" doesn't always have a rationale, beyond "that scenario wasn't included during design and failed to be considered". See f/e What is the rationale for closing "why" questions on language design? – Pastoral 9/4, 2021 at 14:6

@CharlesDuffy I don't see the question as that type of why. This why is more of a "This code does something else from what I would expect. What am I overlooking? Where is my mistake?" which to me seems like a very common and meaningful type of question on Stack Overflow. And pointing to how the or operators are defined in pandas, or what bug this behaviour is a consequence of (I don't know which is the case), would answer the question. The OP doesn't ask why the operators are defined like that or why there is a bug; only in those cases would it be a why of the type you mention. – Cosmetician 9/4, 2021 at 14:43

@Jesper, I generally agree; it's that the comments asserting that there is a bug were ignored / treated as nonresponsive by the OP (and the question had a bounty added with a message refocusing on the interest being an explanation rather than a workaround) that led to the above comment. – Pastoral 9/4, 2021 at 17:41

Pandas | operator does not rely on Python or expression, and behaves differently.

If both operands are boolean, the result is mathematically defined, and the same for Python and Pandas.

But in your case series "buybox" is of type object, and "buybox_y" is bool. In this case Pandas | operator is not commutative:

right operand is coerced to boolean
then bitwise or is attempted
- None | True is invalid operation, resulting in None
and result is coerced to boolean

Thus,

>>> df['buybox'] | df['buybox_y']
0  False

>>> df['buybox_y'] | df['buybox']
0  True

For predictable results, you can clean up data, and cast to boolean type with Pandas astype before attempting boolean operations.

Kratz answered 9/4, 2021 at 21:28 Comment(0)

-1

For Boolean objects (ie Py_True and Py_False), the code will enter the fast processing branch; for other objects, PyObject_IsTrue() will be used to calculate a value of type int.

During the calculation process, the PyObject_IsTrue() function will obtain the values of nb_bool, mp_length, and sq_length in turn, which should correspond to the return values of the two magic methods bool() and len().

Subaudition answered 15/4, 2021 at 5:17 Comment(1)

This may well be true and interesting information about how or works in CPython 🙂, but the issue in this question is entirely different, because it's how the | operator between two pandas Series works, which is a completely different implementation and doesn't match either pure Python or or |. – Willtrude 20/4, 2021 at 10:17

Recommended topics

Hot tags