Functionality of Python `in` vs. `__contains__`
Asked Answered
T

3

24

I implemented the __contains__ method on a class for the first time the other day, and the behavior wasn't what I expected. I suspect there's some subtlety to the in operator that I don't understand and I was hoping someone could enlighten me.

It appears to me that the in operator doesn't simply wrap an object's __contains__ method, but it also attempts to coerce the output of __contains__ to boolean. For example, consider the class

class Dummy(object):
    def __contains__(self, val):
        # Don't perform comparison, just return a list as
        # an example.
        return [False, False]

The in operator and a direct call to the __contains__ method return very different output:

>>> dum = Dummy()
>>> 7 in dum
True
>>> dum.__contains__(7)
[False, False]

Again, it looks like in is calling __contains__ but then coercing the result to bool. I can't find this behavior documented anywhere except for the fact that the __contains__ documentation says __contains__ should only ever return True or False.

I'm happy following the convention, but can someone tell me the precise relationship between in and __contains__?

Epilogue

I decided to choose @eli-korvigo answer, but everyone should look at @ashwini-chaudhary comment about the bug, below.

Theodoretheodoric answered 23/7, 2016 at 13:54 Comment(3)
Because your contains-method returns equivalent of bool([False, False])Erythrocytometer
Related bug: in should be consistent with return value of __contains__Whosoever
@AshwiniChaudhary: Can you write this comment up as an answer? Just a one-liner or so is fine. I had never seen this bug report and it precisely answers my question. I don't so much care about the specific implementation of in as I care about the design reasoning and apparent lack of documentaiton. If you post this answer, I will select your answer as the accepted one.Theodoretheodoric
H
19

Use the source, Luke!

Let's trace down the in operator implementation

>>> import dis
>>> class test(object):
...     def __contains__(self, other):
...         return True

>>> def in_():
...     return 1 in test()

>>> dis.dis(in_)
    2           0 LOAD_CONST               1 (1)
                3 LOAD_GLOBAL              0 (test)
                6 CALL_FUNCTION            0 (0 positional, 0 keyword pair)
                9 COMPARE_OP               6 (in)
               12 RETURN_VALUE

As you can see, the in operator becomes the COMPARE_OP virtual machine instruction. You can find that in ceval.c

TARGET(COMPARE_OP)
    w = POP();
    v = TOP();
    x = cmp_outcome(oparg, v, w);
    Py_DECREF(v);
    Py_DECREF(w);
    SET_TOP(x);
    if (x == NULL) break;
    PREDICT(POP_JUMP_IF_FALSE);
    PREDICT(POP_JUMP_IF_TRUE);
    DISPATCH(); 

Take a look at one of the switches in cmp_outcome()

case PyCmp_IN:
    res = PySequence_Contains(w, v);
    if (res < 0)
         return NULL;
    break;

Here we have the PySequence_Contains call

int
PySequence_Contains(PyObject *seq, PyObject *ob)
{
    Py_ssize_t result;
    PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
    if (sqm != NULL && sqm->sq_contains != NULL)
        return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}

That always returns an int (a boolean).

P.S.

Thanks to Martijn Pieters for providing the way to find the implementation of the in operator.

Honewort answered 23/7, 2016 at 14:19 Comment(2)
Thanks for the thorough answer, but I was looking for more of the reasoning behind the design and apparent lack of documentation than the implementation of in. I'm upvoting your answer anyway because it is useful info.Theodoretheodoric
@Theodoretheodoric I guess, in this case the implementation is directly related to the reasoning. Basically, this is how Python-C API was conceived. As for the lack of documentation, the docs don't really reference True or False, they only say that __cointains__ should return something either true or false (i.e. that can be evaluated as True or False). You can see throughout the docs, that they explicitly use True and False where important. Anyway, they could've written it less ambiguously, so you can file a documentation patch report.Honewort
C
8

In Python reference for __contains__ it's written that __contains__ should return True or False.

If the return value is not boolean it's converted to boolean. Here is proof:

class MyValue:
    def __bool__(self):
        print("__bool__ function ran")
        return True

class Dummy:
    def __contains__(self, val):
        return MyValue()

Now write in shell:

>>> dum = Dummy()
>>> 7 in dum
__bool__ function ran
True

And bool() of nonempty list returns True.

Edit:

It's only documentation for __contains__, if you really want to see precise relation you should consider looking into source code although I'm not sure where exactly, but it's already answered. In documentation for comparison it's written:

However, these methods can return any value, so if the comparison operator is used in a Boolean context (e.g., in the condition of an if statement), Python will call bool() on the value to determine if the result is true or false.

So you can guess that it's similar with __contains__.

Conformable answered 23/7, 2016 at 14:5 Comment(1)
I think "__bool__ function runned" should be "__bool__ function ran"Hemorrhage
E
-3

This is for anyone who is reading this to understand which one to use, I would say use __contains__() instead of in, since it is faster.

For checking this, I did a simple experiment.

import time
startTime = time.time()
q = 'abababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababc'

print(q.__contains__('c'))
#print('c' in q)
endTime = time.time()
deltaTime = endTime - startTime
print(deltaTime)

For one iteration, I commented the in and other time, I commented __contains__. Here are the results:

(Using in)
PS C:\Users\username> & python c:/Users/username/containsvsin.py
True
0.0009970664978027344
(Using __contains__)
PS C:\Users\username> & python c:/Users/username/Downloads/containsvsin.py
True
0.0
Evannia answered 1/1, 2021 at 3:10 Comment(1)
This doesn't answer the OP's question.Lanam

© 2022 - 2024 — McMap. All rights reserved.