Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3?

Asked 6/5, 2015 at 15:32 Answered 5/9, 2022 at 12:59

Solved python performance python-3.x range python-internals

3013

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

This being the case, I would have expected the following line to take an inordinate amount of time because, in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:

1_000_000_000_000_000 in range(1_000_000_000_000_001)

Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).

I have also tried things like this, but the calculation is still almost instant:

# count by tens
1_000_000_000_000_000_000_000 in range(0,1_000_000_000_000_000_000_001,10)

If I try to implement my own range function, the result is not so nice!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

What is the range() object doing under the hood that makes it so fast?

Martijn Pieters's answer was chosen for its completeness, but also see abarnert's first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__ function optimization across Python implementations. abarnert's other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.

Infeasible answered 6/5, 2015 at 15:32 Comment(13)

Note that this is the case only if the item we are checking is a bool or long type, with other object types it will go crazy. Try with: 100000000000000.0 in range(1000000000000001) – Densmore 6/5, 2015 at 15:44

One last thing: Does Python 3 actually guarantee this behavior? I know every version of CPython at least 3.1+ and PyPy3 from the first beta on provided it, but I think it would be perfectly valid if, say, IronPython 3.4 came out tomorrow and had an O(N) __contains__ method. – Curium 6/5, 2015 at 16:19

@RickTeachey: Actually, I'm pretty sure I already know the answer; when I first started telling people they can just use 2 in r in Python 3 (but not Python 2), someone challenged me to find that in the docs, and it's not there, and when I asked (on python-ideas or the bug tracker? I forget…) whether it should be guaranteed, nobody seemed to have much interest in answering one way or the other until there was another Python 3 implementor to talk to. – Curium 6/5, 2015 at 16:23

@AshwiniChaudhary isn't Python2 xrange the same as Python3 range? – Terr 6/5, 2015 at 20:16

@Terr This might be part of what abarnert is getting at above? That the implementation of range (and presumably xrange) has never been specced anywhere, so the specific details - including how __contains__, __item__, etc, are calculated, is up to the different language implementors? So it's possible xrange was implemented differently in the past. I'm just guessing here. – Infeasible 6/5, 2015 at 20:21

@RickTeachey: Pretty much. That still raises the practical question of why the 3.0 range.__contains__ optimization wasn't backported to 2.6 xrange.__contains__ (the 2.x C API had sq_contains in the tp_as_sequence struct, etc., so there's no reason it couldn't have been…), but for that, you'd probably have to track down the hg change where the optimization was added to the py3k branch, and see if there's a corresponding bug and/or thread… – Curium 6/5, 2015 at 21:19

@Terr xrange() objects have no __contains__ method, so the item check has to loop through all the items. Plus there are few other changes in range(), like it supports slicing(which again returns a range object) and now also has count and index methods to make it compatible with collections.Sequence ABC. – Densmore 6/5, 2015 at 21:42

@Curium It wasn't backported because python 2 is in a life-support mode, no updates other than bugfixes. They considered giving it all the performance goodies, but they said they wouldn't in a PEP (don't remember which). That prompted me to migrate to python 3. – Icebox 18/7, 2015 at 1:25

Here's Python 3 range.__contains__ method implemented in pure Python – Punctuate 23/2, 2018 at 8:49

bottom line , range in python supports membership checking out of the box with generating elements and infact it is faster than the membership checking in list. membership checking in range is done in constant time. – Appeal 8/3, 2018 at 8:3

so in the end the point is that the in keyword has (at least) two different meanings in Python – Hast 13/5, 2020 at 13:56

@WalterTross that's certainly one way to look at it! in can be a signal for starting iteration, or it can be a signal for testing membership. – Infeasible 13/5, 2020 at 20:51

1_000_000_000_000_000 in list(range(1_000_000_000_000_001)) causes memory error, obviously. – Chemise 27/2, 2022 at 15:3

3083

The Python 3 range() object doesn't produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

The object also implements the object.__contains__ hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation ^*. There is never a need to scan through all possible integers in the range.

From the range() object documentation:

The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).

So at a minimum, your range() object would do:

class my_range:
    def __init__(self, start, stop=None, step=1, /):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('my_range object index out of range')

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

This is still missing several things that a real range() supports (such as the .index() or .count() methods, hashing, equality testing, or slicing), but should give you an idea.

I also simplified the __contains__ implementation to only focus on integer tests; if you give a real range() object a non-integer value (including subclasses of int), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.

* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.

Blende answered 6/5, 2015 at 15:33 Comment(15)

Fun fact: because you have a working implementation of __getitem__ and __len__, the __iter__ implementation is actually unnecessary. – Houseman 6/5, 2015 at 20:55

@Lucretiel: In Python 2.3, a special xrangeiterator was added specifically because that wasn't fast enough. And then somewhere in 3.x (I'm not sure if it was 3.0 or 3.2) it was tossed and they use the same listiterator type that list uses. – Curium 6/5, 2015 at 22:1

I would define the constructor as def __init__(self, *start_stop_step) and parse it out from there; the way the arguments are labelled now are now are kind of confusing. Nevertheless, +1; you still definitely explained the behavior. – Accommodation 8/5, 2015 at 15:15

@CodyPiersall: Unfortunately, that's the signature of the real class's initializer. range is older than *args (much less the argclinic API that lets C-API functions have complete Python signatures). A few other old functions (and a few newer functions, like xrange, slice, and itertools.islice, for consistency) work the same way, but for the most part, Guido and the rest of the core devs seem to agree with you. The 2.0+ docs even describe range and friends as if they were C++-style overloads rather than show the actual confusing signature. – Curium 16/5, 2015 at 9:22

@CodyPiersall: Actually, here's a quote from Guido the argclinic discussion, when Nick Coghlan came up with a way to allow defining range unambiguously: "Please don't make it easier for people to copy my worst design decision." So, I'm pretty sure he agrees that range is confusing as written. – Curium 16/5, 2015 at 9:25

What on earth sort of numeric type actually demands this slow-scan treatment? I would expect the following logic to work: try to determine int(num); if an exception is raised return False; if the resulting integer doesn't compare equal to num return False; otherwise proceed to check whether that integer is in the range. Do you know of a non-contrived counterexample? – Sorbitol 13/7, 2019 at 13:40

@KarlKnechtel you can’t predict how other types behave, full stop. There is no guarantee that range was passed an actual numeric type. It is not enough to just convert the argument to int because why bother with a custom type then? It is up to the developer to make the call on whether or not to use int(custom_type) in range(....). – Blende 13/7, 2019 at 14:12

Even if __getitem__ + __len__ allows for iteration, a good __iter__ implementation can be faster or otherwise straight up better at it. I consider getitem iteration to be a fallback at best. – Callida 17/10, 2019 at 8:10

@Gloweye: __iter__ returns an iterator object, which has to provide individual elements when calling the __next__ method. For sequences, the best way to do that is to use __getitem__ with an incremented index until you reach the length. The only reason the listiterator is better here is that it can access the list object array directly and so has a little less indirection. For pure-python code, yes, a generator function for __iter__ can be more efficient as it avoids creating a new stack frame for each __next__ call, a relatively expensive op. – Blende 17/10, 2019 at 10:3

so therefore, using "list(range(x,y))" would result in the performance hit OP was expecting, correct? – Quixotism 16/9, 2022 at 14:8

@PixelMaster: list(range(...)) would create a list with all the integers in the range and if the range is large, you'd indeed see performance issues as that would require a lot of memory. The performance hit the OP was expecting would only apply if you then used a containment test on the resulting list. – Blende 17/9, 2022 at 17:12

What does that / symbol represent in the __init__ method? – Floatation 14/12, 2022 at 10:17

@FaCoffee: see What does the slash mean in help() output? – Blende 14/12, 2022 at 10:58

I think it also helps to think of a range as a set. I took a Scala course that had us create sets by creating a data class with methods. Eg, the "set" of real numbers is infinite, but all you need to know is if something is in it. In this case, is num % 2 == 0. The range does something similar with the __contains__ method: it just tests to see if the number is in there. – Machiavellian 3/8, 2023 at 9:32

@MikeWilliamson: I'm not sure how helpful it is to do that as sets are unordered, while a range is definitely ordered; range(1, 10) and

range(9, 0, -1) contain the same values but their order is very different. Containment testing is not limited to sets here, you can apply that to [any *container*](https://docs.python.org/3/library/collections.abc.html#collections.abc.Container). Because a range is a sequence, you can also use _indexing_, where

range(9, 0, -1)[3]` gives you 6. – Blende 9/8, 2023 at 14:24

1208

The fundamental misunderstanding here is in thinking that range is a generator. It's not. In fact, it's not any kind of iterator.

You can tell this pretty easily:

>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]

If it were a generator, iterating it once would exhaust it:

>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]

What range actually is, is a sequence, just like a list. You can even test this:

>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True

This means it has to follow all the rules of being a sequence:

>>> a[3]         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1

The difference between a range and a list is that a range is a lazy or dynamic sequence; it doesn't remember all of its values, it just remembers its start, stop, and step, and creates the values on demand on __getitem__.

(As a side note, if you print(iter(a)), you'll notice that range uses the same listiterator type as list. How does that work? A listiterator doesn't use anything special about list except for the fact that it provides a C implementation of __getitem__, so it works fine for range too.)

Now, there's nothing that says that Sequence.__contains__ has to be constant time—in fact, for obvious examples of sequences like list, it isn't. But there's nothing that says it can't be. And it's easier to implement range.__contains__ to just check it mathematically ((val - start) % step, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn't it do it the better way?

But there doesn't seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it's an obvious good idea and easy to do, there's no reason that IronPython or NewKickAssPython 3.x couldn't leave it out. (And in fact, CPython 3.0-3.1 didn't include it.)

If range actually were a generator, like my_crappy_range, then it wouldn't make sense to test __contains__ this way, or at least the way it makes sense wouldn't be obvious. If you'd already iterated the first 3 values, is 1 still in the generator? Should testing for 1 cause it to iterate and consume all the values up to 1 (or up to the first value >= 1)?

Curium answered 6/5, 2015 at 16:1 Comment(8)

This is a pretty important thing to get straight. I suppose the differences between Python 2 and 3 may have lead to my confusion on this point. In any case, I should have realized since range is listed (along with list and tuple) as a sequence type. – Infeasible 6/5, 2015 at 16:5

@RickTeachey: Actually, in 2.6+ (I think; maybe 2.5+), xrange is a sequence too. See 2.7 docs. In fact, it was always an almost-sequence. – Curium 6/5, 2015 at 16:17

@RickTeachey: Actually, I was wrong; in 2.6-2.7 (and 3.0-3.1), it claims to be a sequence, but it's still just an almost-sequence. See my other answer. – Curium 6/5, 2015 at 22:3

It's not an iterator, it's a sequence (Iterable in terms of Java, IEnumerable of C#) - something with an .__iter__() method that will return an iterator. It in its turn can be used only once. – Affixation 18/6, 2016 at 19:40

seems strange that 's' in range(10**10) isn't optimized to immediately return False. – Pavkovic 29/8, 2016 at 10:57

@ThomasAhle: Because range isn't checking types when it's not an integer, since it's always possible a type has a __eq__ that is compatible with int. Sure, str obviously won't work, but they didn't want to slow things down by explicitly checking all the types that can't be in there (and after all, a str subclass could override __eq__ and be contained in the range). – Tuscarora 17/10, 2016 at 12:47

For simplicity (as well as python 2 compatibility since some of these points generalize to python 2, even if the question itself doesn't) maybe cite collections.Sequence instead of collections.abc.Sequence ? – Lemnos 19/11, 2018 at 20:54

"And it's easier to implement range.__contains__ to just check it mathematically […] than to actually generate and test all the values" – I dispute this, since you could simply omit the implementation of __contains__ to get the latter behaviour, and that's certainly easier than any implementation you need to explicitly write. :) – Clastic 14/12, 2018 at 12:13

513

Use the source, Luke!

In CPython, range(...).__contains__ (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we're using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:

Check that the number is between start and stop, and
Check that the stride value doesn't "step over" our number.

For example, 994 is in range(4, 1000, 2) because:

4 <= 994 < 1000, and
(994 - 4) % 2 == 0.

The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:

static int
range_contains_long(rangeobject *r, PyObject *ob)
{
    int cmp1, cmp2, cmp3;
    PyObject *tmp1 = NULL;
    PyObject *tmp2 = NULL;
    PyObject *zero = NULL;
    int result = -1;

    zero = PyLong_FromLong(0);
    if (zero == NULL) /* MemoryError in int(0) */
        goto end;

    /* Check if the value can possibly be in the range. */

    cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
    if (cmp1 == -1)
        goto end;
    if (cmp1 == 1) { /* positive steps: start <= ob < stop */
        cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
        cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
    }
    else { /* negative steps: stop < ob <= start */
        cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
        cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
    }

    if (cmp2 == -1 || cmp3 == -1) /* TypeError */
        goto end;
    if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
        result = 0;
        goto end;
    }

    /* Check that the stride does not invalidate ob's membership. */
    tmp1 = PyNumber_Subtract(ob, r->start);
    if (tmp1 == NULL)
        goto end;
    tmp2 = PyNumber_Remainder(tmp1, r->step);
    if (tmp2 == NULL)
        goto end;
    /* result = ((int(ob) - start) % step) == 0 */
    result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
  end:
    Py_XDECREF(tmp1);
    Py_XDECREF(tmp2);
    Py_XDECREF(zero);
    return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

The "meat" of the idea is mentioned in the comment lines:

/* positive steps: start <= ob < stop */
/* negative steps: stop < ob <= start */
/* result = ((int(ob) - start) % step) == 0 */

As a final note - look at the range_contains function at the bottom of the code snippet. If the exact type check fails then we don't use the clever algorithm described, instead falling back to a dumb iteration search of the range using _PySequence_IterSearch! You can check this behaviour in the interpreter (I'm using v3.5.0 here):

>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
... 
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :) 
True
>>> x_ in r  # iterates for ages.. :( 
^\Quit (core dumped)

Sabinesabino answered 6/5, 2015 at 15:41 Comment(1)

This one should be the answer. – Lashondalashonde 20/7, 2023 at 21:50

199

To add to Martijn’s answer, this is the relevant part of the source (in C, as the range object is written in native code):

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

So for PyLong objects (which is int in Python 3), it will use the range_contains_long function to determine the result. And that function essentially checks if ob is in the specified range (although it looks a bit more complex in C).

If it’s not an int object, it falls back to iterating until it finds the value (or not).

The whole logic could be translated to pseudo-Python like this:

def range_contains (rangeObj, obj):
    if isinstance(obj, int):
        return range_contains_long(rangeObj, obj)

    # default logic by iterating
    return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
    if r.step > 0:
        # positive step: r.start <= num < r.stop
        cmp2 = r.start <= num
        cmp3 = num < r.stop
    else:
        # negative step: r.start >= num > r.stop
        cmp2 = num <= r.start
        cmp3 = r.stop < num

    # outside of the range boundaries
    if not cmp2 or not cmp3:
        return False

    # num must be on a valid step inside the boundaries
    return (num - r.start) % r.step == 0

Grin answered 6/5, 2015 at 15:41 Comment(0)

135

If you're wondering why this optimization was added to range.__contains__, and why it wasn't added to xrange.__contains__ in 2.7:

First, as Ashwini Chaudhary discovered, issue 1766304 was opened explicitly to optimize [x]range.__contains__. A patch for this was accepted and checked in for 3.2, but not backported to 2.7 because "xrange has behaved like this for such a long time that I don't see what it buys us to commit the patch this late." (2.7 was nearly out at that point.)

Meanwhile:

Originally, xrange was a not-quite-sequence object. As the 3.1 docs say:

Range objects have very little behavior: they only support indexing, iteration, and the len function.

This wasn't quite true; an xrange object actually supported a few other things that come automatically with indexing and len,^* including __contains__ (via linear search). But nobody thought it was worth making them full sequences at the time.

Then, as part of implementing the Abstract Base Classes PEP, it was important to figure out which builtin types should be marked as implementing which ABCs, and xrange/range claimed to implement collections.Sequence, even though it still only handled the same "very little behavior". Nobody noticed that problem until issue 9213. The patch for that issue not only added index and count to 3.2's range, it also re-worked the optimized __contains__ (which shares the same math with index, and is directly used by count).^** This change went in for 3.2 as well, and was not backported to 2.x, because "it's a bugfix that adds new methods". (At this point, 2.7 was already past rc status.)

So, there were two chances to get this optimization backported to 2.7, but they were both rejected.

_{* In fact, you even get iteration for free with indexing alone, but in 2.3 xrange objects got a custom iterator.}

_{** The first version actually reimplemented it, and got the details wrong—e.g., it would give you MyIntSubclass(2) in range(5) == False. But Daniel Stutzbach's updated version of the patch restored most of the previous code, including the fallback to the generic, slow _PySequence_IterSearch that pre-3.2 range.__contains__ was implicitly using when the optimization doesn't apply.}

Curium answered 6/5, 2015 at 21:42 Comment(8)

From the comments here: improve xrange.__contains__, it looks like they didn't backport it to Python 2 just to leave an element of surprise for users and it was too late o_O. The count and index patch was added later on. File at that time: hg.python.org/cpython/file/d599a3f2e72d/Objects/rangeobject.c – Densmore 6/5, 2015 at 22:11

I have a sinister suspicion that some core python devs are partial to "tough love" for python 2.x because they want to encourage people to switch to the far-superior python3 :) – Sabinesabino 7/5, 2015 at 2:8

@wim: A couple of them, definitely, but I don't think that's the case with Benjamin Peterson. He seems more in the camp of "people still on 2.7 are doing so because they want to make sure we don't fix what ain't broke, so change as little as possible". But the most common (or at least vocal) attitude at the moment seems to be "making it easy to port from 2.x to 3.x is priority #1, #2, and #3", possibly because multiple people paid to work on Python are also currently paid to work on Fedora and Ubuntu porting to 3.5. – Curium 7/5, 2015 at 3:7

Also I bet it's a huge burden to have to add new features to old versions. Imagine if you went to Oracle and said, "Look, I'm on Java 1.4 and I deserve lambda expressions! Backport them for nothing." – Adellaadelle 11/5, 2015 at 11:34

@RobertGrant Well the problem with that explanation is, 2.7 isn't old. It was released around the same time as 3.3, I believe. – Infeasible 12/5, 2015 at 13:12

@RickTeachey yeah it's just an example. If I said 1.7 it would still apply. It's a quantitative difference not qualitative. Basically the (unpaid) devs can't forever make cool new stuff in 3.x and backport it to 2.x for those who don't want to upgrade. It's a huge and ridiculous burden. Do you think there's still something wrong with my reasoning? – Adellaadelle 12/5, 2015 at 13:41

@RobertGrant no not at all, your reasoning is sound, and i'm not strenuously differing. i'm just saying it doesn't fully explain why there was no xrange optimization done for 2.7 (i understand that's not what you were saying). 2.7 was new, not an old version - people were excited about all the new stuff 2.7 would bring as with any other new release. and there was definitely the opportunity to do it during development. it just didn't make the cut. – Infeasible 12/5, 2015 at 15:26

@RickTeachey: 2.7 was between 3.1 and 3.2, not around 3.3. And that means 2.7 was in rc when the last changes to 3.2 went in, which makes the bug comments easier to understand. Anyway, I think they made a few mistakes in retrospect (especially assuming people would migrate via 2to3 instead of via dual-version code with the help of libraries like six, which is why we got things like dict.viewkeys that nobody's ever going to use), and there were a few changes that just came too late in 3.2, but for the most part 2.7 was a pretty impressive "last 2.x ever" release. – Curium 16/5, 2015 at 9:3

The other answers explained it well already, but I'd like to offer another experiment illustrating the nature of range objects:

>>> r = range(5)
>>> for i in r:
        print(i, 2 in r, list(r))
        
0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]

As you can see, a range object is an object that remembers its range and can be used many times (even while iterating over it), not just a one-time generator.

Thump answered 6/5, 2015 at 16:4 Comment(0)

It's all about a lazy approach to the evaluation and some extra optimization of range. Values in ranges don't need to be computed until real use, or even further due to extra optimization.

By the way, your integer is not such big, consider sys.maxsize

sys.maxsize in range(sys.maxsize) is pretty fast

due to optimization - it's easy to compare given integer just with min and max of range.

but:

Decimal(sys.maxsize) in range(sys.maxsize) is pretty slow.

(in this case, there is no optimization in range, so if python receives unexpected Decimal, python will compare all numbers)

You should be aware of an implementation detail but should not be relied upon, because this may change in the future.

Antimicrobial answered 16/3, 2018 at 10:47 Comment(1)

Be careful floating large integers. On most machines, float(sys.maxsize) != sys.maxsize) even though sys.maxsize-float(sys.maxsize) == 0. – Amadeo 13/9, 2019 at 23:7

TL;DR

The object returned by range() is actually a range object. This object implements the iterator interface so you can iterate over its values sequentially, just like a generator, list, or tuple.

But it also implements the __contains__ interface which is actually what gets called when an object appears on the right-hand side of the in operator. The __contains__() method returns a bool of whether or not the item on the left-hand side of the in is in the object. Since range objects know their bounds and stride, this is very easy to implement in O(1).

Outflow answered 15/1, 2019 at 16:56 Comment(0)

Due to optimization, it is very easy to compare given integers just with min and max range.
The reason that the range() function is so fast in Python3 is that here we use mathematical reasoning for the bounds, rather than a direct iteration of the range object.
So for explaining the logic here:

Check whether the number is between the start and stop.
Check whether the step precision value doesn't go over our number.

Take an example, 997 is in range(4, 1000, 3) because:

4 <= 997 < 1000, and (997 - 4) % 3 == 0.

Alek answered 25/11, 2019 at 17:50 Comment(2)

Can you share source for that? Even if that sounds legit, it would be good to back these claims by actual code – Esquivel 5/12, 2019 at 21:55

I think this is an example of it could be implemented. Not the exact way it is implemented. Although no reference provided it is good hint good enough to understand why inclusion checking for range can be much faster than list or tuple – Speedometer 22/1, 2020 at 1:32

Try x-1 in (i for i in range(x)) for large x values, which uses a generator comprehension to avoid invoking the range.__contains__ optimisation.

Blate answered 11/3, 2020 at 2:45 Comment(0)

TLDR; the range is an arithmetic series so it can very easily calculate whether the object is there. It could even get the index of it if it were list like really quickly.

Quantifier answered 9/10, 2020 at 16:29 Comment(0)

-5

__contains__ method compares directly with the start and end of the range

Luckless answered 5/9, 2022 at 12:59 Comment(1)

That's not really the whole story, because the step is important too. wim's answer covers the actual math involved and RBF06's answer includes a shorter version. I don't see any reason to duplicate that information in this answer. – Schwing 17/9, 2022 at 20:36

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

TL;DR

Recommended topics

Hot tags