python bit array (performant)

Asked 30/12, 2013 at 18:58 Answered 23/6, 2014 at 8:46

Solved python performance bitarray bloom-filter

I'm designing a bloom filter and I'm wondering what the most performant bit array implementation is in Python.

The nice thing about Python is that it can handle arbitrary length integers out of the box and that's what I use now, but I don't know enough about Python internals to know if that's the most performant way to do it in Python.

I found bitarray but it handles a lot of other things like slicing, which I don't need. I only need the & and | and << operations.

Dardan answered 30/12, 2013 at 18:58 Comment(6)

CPythons set type is already implemented using a bloom filter. Might it be sufficient for your needs? – Spectroradiometer 30/12, 2013 at 19:29

If you have code that you want to optimize, and two implementations to test, why not run the tests yourself? Testing on your actual code and data (or a reasonable subset of it) will tell you a lot more than asking "which implementation is usually faster". – Tobytobye 30/12, 2013 at 19:29

@Max: It is? The source looks an awful lot like the exact same open hash-table implementation used for dict. – Tobytobye 30/12, 2013 at 19:32

Also, can you afford to waste space? An array that holds each bit as a byte or larger type, like an np.ndarray(dtype=np.bool_), should be a lot faster than bitarray, but page or cache misses could easily cancel out the benefits. That one probably isn't even worth testing without knowing what sizes you're actually interested in. – Tobytobye 30/12, 2013 at 19:36

@abarnert: looks like I'm wrong. I wonder where I read that. – Spectroradiometer 31/12, 2013 at 3:56

What is the question? – Gerlach 1/4, 2020 at 7:58

The built-in int is pretty nicely optimized, and it already supports &, |, and <<.

There's at least one alternative implementation of arbitrary-length integers, based on GMP, known as gmpy2. (There's also the original gmpy, PyGMP, Sophie, and a few other wrappers around the same library, but I doubt they'd have any real performance differences.)

And there are two major implementations of the "bit array" idea, bitarray (the one you linked) and bitstring, as well as a few libraries like intbitset that give you a set-like interface instead (which should also work for your uses).

So, let's toss them all together and compare:

import random
import struct
import timeit
import bitarray
import bitstring
import gmpy2

n = random.randrange((1<<31)+1, 1<<32)

bs = bitstring.pack('<q', n)
ba = bitarray.bitarray(64)
ba.frombytes(struct.pack('<q', n))
gm = gmpy2.mpz(n)
py = n

for x in 'bs', 'ba', 'gm', 'py':
    def f(x=locals()[x]): x | x; x & x
    t = timeit.timeit(f, number=10000)
    print(x, t)

On my Mac, running Python.org 64-bit CPython 3.3.2, here's what I get:

bs 0.7623525890521705
ba 0.006623028079047799
gm 0.0016346259508281946
py 0.002280334010720253

That seems to be enough to summarily dismiss bitstring.

Also, while I didn't show intbitset here, because neither it nor any similar libraries I found work with Python 3, from a variety of comparisons (using intbitset.intbitset([i for i, bit in enumerate(bin(n)[2:]) if bit != '0'])) it's anywhere from 14 to 70 times slower than int in every test I throw at it, so I've summarily dismissed it as well.

So let's try the other three with more reps:

ba 6.385123810963705
gm 1.5937359740491956
py 2.129726824001409

And the numbers hold up. bitarray is nowhere near as fast as built-in int. It's also more clumsy to use (note that what should be an "alternate constructor" classmethod is an instance method, there's no fast and easy way to convert from or to an int, and the reason I was only testing x | x and x & x is that there are limitations on the operators). If you need an integer as an array of bits, it's great; if you need to do C-style bit-munging on an integer, that's not what it's best at.

As for gmpy2, it seems to be about a third faster than the native int. What if we make the numbers a whole lot bigger, like 1.5kbits?

gm 0.19562570203561336
py 0.29293217696249485

And here are the numbers with Apple Python 2.7.5:

('gm', 0.2890629768371582)
('py', 0.36592698097229004)

So, is it worth it? It's less friendly to use, it's slower rather than faster on some other operations that you didn't ask about, it requires a third-party C library which is LGPL-licensed, it has much worse error-handling behavior in memory overflow cases (as in, your app may segfault instead of raising), and there's at least one person on StackOverflow who is going to show up and tell you that GMP sucks whenever it gets mentioned (I believe because of a bug in an older version).

But if you need that extra speed, maybe it's worth it.

On the other hand, here's PyPy, 3.2.3/2.1.0b1 and PyPy 2.7.3/2.1.0, using the native int type—compare with the 0.19562570203561336 and 0.2890629768371582 results from gmpy2 above:

py 0.2135779857635498
('py', 0.20878291130065918)

So, switching from CPython to PyPy gives you almost as much benefit as switching from int to gmpy2.mpz in Python 3, and significantly more benefit in Python 2. And it will almost certainly accelerate the rest of your code as well.

Tobytobye answered 30/12, 2013 at 19:23 Comment(3)

For the fastest bit operations in gmpy2, you should also test gmpy2.xmpz in combination with inplace operations. The xmpz type is a mutable integer type and the inplace operations eliminate the need to create a new result object. The xmpz type also support directly setting/clearing bits. Note: I maintain gmpy and gmpy2. – Josie 31/12, 2013 at 4:55

wow, what a terrifically complete explanation! I think for me, the simplicity and lack of dependencies make up for the small speed penalty of the built-in int. but one cannot say such a thing without the numbers to back it up, which you have provided. thanks! – Dardan 31/12, 2013 at 6:31

@Tobytobye You seem to have quite some skill in this area. Can you give your ideas about transforming two-complements to int? – Grangerize 23/10, 2014 at 13:36

Disclaimer: I am the main developer of intbitset :-) which was mentioned above in one of the comments. This is just to let you know that since some weeks intbitset is now compatible with Python 3.3 and 3.4. Additionally it looks like it goes almost twice as fast WRT the native int functionality:

import random
from intbitset import intbitset
x = random.sample(range(1000000), 10000)
y = random.sample(range(1000000), 10000)
m = 0
for i in x:                 
    m += 1 << i
n = 0
for i in x:                 
    n += 1 << i
mi = intbitset(x)
ni = intbitset(y)

%timeit m & n ## native int
10000 loops, best of 3: 27.3 µs per loop

%timeit mi & ni ## intbitset
100000 loops, best of 3: 13.9 µs per loop

%timeit m | n ## native int
10000 loops, best of 3: 26.8 µs per loop

%timeit mi | ni ## intbitset
100000 loops, best of 3: 15.8 µs per loop

## note the above were just tested on Python 2.7, Ubuntu 14.04.

Additionally intbitset supports some unique features such as infinite sets, which are useful e.g. to build search engine where you have the concept of universe (e.g. taking the union of an infinite set with a regular set will return an infinite set, etc.)

For more information about intbitset performance WRT Python sets see instead: http://intbitset.readthedocs.org/en/latest/#performance

Maldives answered 23/6, 2014 at 8:46 Comment(0)

Perhaps check this out. It's pure python, and uses an array of int's: http://stromberg.dnsalias.org/svn/bits/trunk/

Also, there are already several Python bloom filters. Check Pypi: https://pypi.python.org/pypi?%3Aaction=search&term=bloom+filter&submit=search

HTH

Rotarian answered 30/12, 2013 at 19:35 Comment(5)

From a quick test, this is both more painful to use than bitarray (no easy way to construct it, doesn't offer the |, &, << ops he asked for, …), and more than two orders of magnitude slower. So, it doesn't seem like a good alternative for someone looking for better performance. And I don't see why you'd expect that trying to do naive arbitrary-length integers on top of Python's built-in optimized arbitrary-length integers would improve anything. – Tobytobye 30/12, 2013 at 19:49

If you try it on Pypy, you may get a different picture of the performance: Python != CPython. Also, I derived that code from a bloom filter I wrote myself. It should be adequate for a bloom filter. BTW, what's naive about using hardware integers? Usually, it's a pretty good idea for performance. – Rotarian 31/12, 2013 at 1:29

OK, in PyPy 2.7.3/2.1.0, it takes 2.9113s for the same test that took 0.20878s with plain int. So that's just over 1 order of magnitude slower instead of 2. It's still an order of magnitude slower, just to make something less usable. Do you have a real-life test case that shows otherwise? I may be testing numbers way too large, or way too small? (And it's not using hardware integers that's naive, it's using them to implementing arbitrary-length integer operations in the naive way.) – Tobytobye 31/12, 2013 at 2:16

Also, what versions are you using? In early versions of PyPy, integer arithmetic was significantly slower than CPython instead of a little faster, in which case this could easily have been a different story. – Tobytobye 31/12, 2013 at 2:19

Finally, "it should be adequate for a bloom filter" doesn't mean much, since plain-old int also should be adequate for a bloom filter. Unless it's actually faster, smaller, easier-to-user, or otherwise better, why bother? – Tobytobye 31/12, 2013 at 2:20

Recommended topics

Hot tags