algorithm to find subset of large int array which matches boolean query

Asked 25/8, 2011 at 10:5 Answered 26/8, 2011 at 21:3

arrays algorithm optimization boolean-operations

Say I have a large array of M 32 bit ints in which each value has no more than N bits set. Now I want to return the subset which matches the query Target AND Value == Target, i.e. values in which the targets bits appear set in the array's values.

Brute force is easy, simply iterator the array and extract where target&value == target. This becomes way too slow if M gets very large. Anyone have an idea of how to convert the array into a data structure that is more optimal to search?

One way is to store arrays or value for each bit (thus for 32 bit array you need 32 of these) and then only search the values that match each bit in the target value. This helps a little unless N gets close to 32 or the target has close to N bits set. Since what I am looking for is essentially a partial match, hashing or sorting doesn't appear to help.

Exact correct results are a requirement. This will have to work without access to parallel hardware (like a GPU or using SIMD).

I will be using C++ but just some pointers to algorithms or ideas is fine. The most likely case would be M=100000 and N=8 and be called frequently.

Just to reiterate: I need a partial match (like item = 011000 matching target = 001000) not an exact match. Although the M items is known ahead of time, the possible values of targets can be anything.

I finally decided to stick with brute force. For 80,000 items it's not worth doing anything else. I imagine if the size of the dataset were more like 800,000,000 it might be worth it.

Decare answered 25/8, 2011 at 10:5 Comment(4)

So is this like if target&value=target, value contains target's bits but not viceversa? Can you give an example of target and value – Pierce 25/8, 2011 at 10:17

item is in list is 00011100 and target is 00010100 thus target matches item but not all bits – Decare 25/8, 2011 at 13:24

How fast do you need it to be? I tested out the brute force approach and was getting consistently an average of 20ms for M=100,000 and N=8. And that was in a higher order language. I imagine that C++ should be significantly faster. My trie answer below was ~100 slower. – Triste 26/8, 2011 at 18:17

Might be running on anything from a desktop to a mobile device. So far brute force seems the fastest which I didn't expect. – Decare 27/8, 2011 at 15:41

You could build a bitwise trie.

When traversing the trie, for each 0 in the target, you would need to traverse both branches.

Edit After testing a quick implementation I would NOT recommend this approach. The brute force approach was ~100 faster than this one.

Triste answered 25/8, 2011 at 22:0 Comment(3)

Interesting, I remember reading about the nedtrie. I don't see how it might work with partial matches however. It would seem to work better with finding similar prefixes. – Decare 26/8, 2011 at 1:2

When you encounter a zero in the target, you need to traverse both nodes. It might end up being quite a long traversal if there are lots of zeros in the target. – Triste 26/8, 2011 at 1:41

I think idea is worth of further considering. You can split your tree not by bit-level but by 2-bits or even 4-bits per level. That might improve speed a bit. So say you have Target = 10 00 than you'll skip sub-nodes for 00 and 01, but will descend into sub-nodes 10 and 11. – Cadi 26/8, 2011 at 20:29

How about looking at this problem from another view point?.. Consider your set of integers as a collection of one-dimension pictures. One of the way to organize them is to split each picture into two parts A and B and sort all pictures by categories:

A contains only zeroes and B contains some bits is set (at least one)
A contains some bits set and B contains only zeroes
A and B contains some bits set (superset of 1 and 2)
A and B contains only zeroes

Now you do the same split of your target/mask into the same parts and categorize in the same way. After that you can deduce next (by target/mask category):

You can skip pictures from categories 2 and 4
You can skip pictures from categories 1 and 4
You can skip pictures from category 4
All pictures match this target/mask

On the next level parts A and B is splitted again (so you have 4 parts) and so on.

Of course I don't expect it to give some speed-up. But for some sets of data where there is not so much bits is set (as opposite to variants with bit-based-tree) it might work better.

Update: I've got speedup for 34% in Haskell variant:

    benchmarking burte-force list search
    mean: 14.67350 ms, lb 14.65103 ms, ub 14.71614 ms, ci 0.950
    std dev: 153.6920 us, lb 95.70642 us, ub 246.6497 us, ci 0.950

    benchmarking tree-lookup search
    mean: 9.592271 ms, lb 9.564509 ms, ub 9.667668 ms, ci 0.950
    std dev: 216.6084 us, lb 100.3315 us, ub 455.2730 us, ci 0.950

Source code:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances #-}

import Control.Arrow (first)
import Control.DeepSeq
import Data.Word
import Data.Bits
import Data.List

import Criterion.Main
import Criterion.Config
import System.Random

class BitmapsCollection a where
    type BitmapOf a
    bitmapsCollection :: [BitmapOf a] -> a
    findMaskedPattern :: a -> BitmapOf a -> [BitmapOf a]

-- Plain lookup through array
newtype BitmapsList p = BitmapsList [p]
    deriving (Show, NFData)

instance Bits p => BitmapsCollection (BitmapsList p) where
    type BitmapOf (BitmapsList p) = p
    bitmapsCollection = BitmapsList
    findMaskedPattern (BitmapsList xs) m = filter (\e -> e .&. m == m) xs

-- Tree of bitmap partitions
data Bits p => BitmapsCoverTree p = EmptyBitmapsCoverNode
                                  | BitmapsCoverNode (p,p) (BitmapsCoverTree p) (BitmapsCoverTree p) [p] [p]
                                  | LeafBitmapsCoverNode [p]
    deriving Show

instance (Bits p, NFData p) => NFData (BitmapsCoverTree p) where
    rnf EmptyBitmapsCoverNode = ()
    rnf (LeafBitmapsCoverNode xs) = rnf xs
    rnf (BitmapsCoverNode mask node1 node2 category3 category4) = mask `deepseq` node1 `deepseq` node2 `deepseq` category3 `deepseq` rnf category4

data BitmapCoverCategory = CoverA | CoverB | CoverAB | CoverZero

coverCategory :: Bits a => (a, a) -> a -> BitmapCoverCategory
coverCategory (maskA, maskB) x = case (x .&. maskA, x .&. maskB) of
                                     (0, 0) -> CoverZero
                                     (0, _) -> CoverB
                                     (_, 0) -> CoverA
                                     _ -> CoverAB

coverCategorize :: Bits a => (a, a) -> [a] -> ([a], [a], [a], [a])
coverCategorize mask = walk (id, id, id, id) where
    category = coverCategory mask
    walk (a, b, ab, z) [] = (a [], b [], ab [], z [])
    walk (a, b, ab, z) (x:xs) = case (category x) of
                                    CoverA -> walk (a . (x:), b, ab, z) xs
                                    CoverB -> walk (a, b . (x:), ab, z) xs
                                    CoverAB -> walk (a, b, ab . (x:), z) xs
                                    CoverZero -> walk (a, b, ab, z . (x:)) xs

suffixMask, prefixMask :: Bits a => Int -> a
suffixMask n = complement 0 `shiftL` n
prefixMask n = complement (suffixMask n)

rangeMask :: Bits a => (Int, Int) -> a
rangeMask (n, m) = suffixMask n .&. prefixMask m

instance Bits p => BitmapsCollection (BitmapsCoverTree p) where
    type BitmapOf (BitmapsCoverTree p) = p
    bitmapsCollection bms = buildCoverNode (0, bitSize (head bms)) bms where
        splitBoundary = 4
        buildCoverNode :: Bits a => (Int, Int) -> [a] -> BitmapsCoverTree a
        buildCoverNode _ [] = EmptyBitmapsCoverNode
        buildCoverNode (n, m) xs | (m - n) < splitBoundary = LeafBitmapsCoverNode xs -- too small
        buildCoverNode (n, m) xs = BitmapsCoverNode mask node1 node2 category3 category4 where
            mm = (n+m) `div` 2

            mask = (rangeMask (n, mm), rangeMask (mm, m))

            (category1, category2, category3, category4) = coverCategorize mask xs

            node1 = buildCoverNode (n, mm) category1
            node2 = buildCoverNode (mm, m) category2

    findMaskedPattern EmptyBitmapsCoverNode _ = []
    findMaskedPattern (LeafBitmapsCoverNode ps) m = filter (\e -> e .&. m == m) ps

    findMaskedPattern (BitmapsCoverNode _ node1 node2 category3 category4) 0 = flatten where
        flatten = findMaskedPattern node1 0 ++ findMaskedPattern node2 0 ++ category3 ++ category4

    findMaskedPattern (BitmapsCoverNode mask node1 node2 category3 category4) m = result where
        targetCategory = coverCategory mask m
        filterTarget = filter (\p -> p .&. m == m)
        result = case targetCategory of
                     CoverA -> findMaskedPattern node1 m ++ filterTarget category3
                     CoverB -> findMaskedPattern node2 m ++ filterTarget category3
                     CoverAB -> filterTarget category3
                     CoverZero -> category1 ++ category2 ++ category3 ++ category4

        category1 = findMaskedPattern node1 0
        category2 = findMaskedPattern node2 0


main = do
    gen <- getStdGen
    let size = 1000000
        bitmaps :: [Word32]
        (bitmap, genm) = first fromIntegral (random gen :: (Int, StdGen))
        bitmaps = map fromIntegral (take size (randoms genm) :: [Int])
        bitmapsList = bitmapsCollection bitmaps :: BitmapsList Word32
        bitmapsTree = bitmapsCollection bitmaps :: BitmapsCoverTree Word32

    bitmapsList `deepseq` bitmapsTree `deepseq` return ()

    defaultMainWith defaultConfig (return ()) [
        bench "burte-force list search" $ nf (findMaskedPattern bitmapsList) bitmap,
        bench "tree-lookup search" $ nf (findMaskedPattern bitmapsTree) bitmap
        ]

Update: Kind of C++11 code. It gives 10.9444 ms for brute-force and 8.69286 ms for this algorithm. I've cheated by making output of distribution of turned on bits more sparse.

#include <iostream>
#include <vector>
#include <list>
#include <random>
#include <functional>
#include <cassert>
#include <memory>


#include <sys/time.h>
#include <sys/resource.h>

// benchmark boiler plate code
double cputime()
{
    struct rusage usage;
    int check = getrusage( RUSAGE_SELF, &usage );
    assert(check == 0);
    return (usage.ru_utime.tv_sec + usage.ru_utime.tv_usec*1.0e-6);
    //return (((double)clock())/((double)CLOCKS_PER_SEC));
}

double measure(std::function<void()> func, size_t iterations)
{
    double t1, t2;
    size_t i;
    t1 = cputime();
    for(i = 0; i < iterations; ++i) func();
    t2 = cputime();
    return (t2 - t1);
}

std::pair<std::string, double> human(double value)
{
    static const std::vector<std::pair<std::string, double>> prefixes = {
        { "pico",  1e-12 },
        { "nano",  1e-9  },
        { "micro", 1e-6  },
        { "milli", 1e-3  },
        { "",      1     },
        { "kilo",  1e3   },
        { "mega",  1e6   },
        { "giga",  1e9   },
        { "tera",  1e12  }
    };

    for(auto it = prefixes.begin(); it != prefixes.end(); ++it)
    {
        if (it->second > value) 
        {
            auto prev = *(--it);
            return std::pair<std::string, double>(prev.first, value/prev.second);
        }
    }
    auto last = *prefixes.rbegin();
    return std::pair<std::string, double>(last.first, value/last.second);
}

void bench(std::string name, std::function<void()> func, double bench_seconds = 10)
{
    const double accurate_seconds = 0.1;

    std::cout << "benchmarking " << name << std::endl
              << "estimating iterations" << std::endl;
    size_t base_iterations = 1;
    double base_seconds = measure(func, base_iterations);
    while(base_seconds < accurate_seconds)
    {
        base_iterations *= 2;
        base_seconds = measure(func, base_iterations);
    }

    const size_t iterations = bench_seconds * base_iterations / base_seconds;
    const double estimated_seconds = iterations * base_seconds / base_iterations;
    std::cout << "estimated time " << estimated_seconds << " seconds (" << iterations << " iterations)" << std::endl;

    const double seconds = measure(func, iterations);
    const auto ips = human(iterations / seconds);
    const auto spi = human(seconds / iterations);
    std::cout << "benchmark took " << seconds << " seconds" << std::endl
              << "average speed " << ips.second  << ' ' << ips.first << " iterations per second" << std::endl
              << "average time " << spi.second << ' ' << spi.first << " seconds per iteration" << std::endl;
}

// plain brute-force lookup
template<class iterator>
std::list<typename iterator::value_type> brute_lookup(const typename iterator::value_type pattern, iterator begin, const iterator &end)
{
    typedef typename iterator::value_type value_type;
    std::list<value_type> result;

    for(;begin != end; ++begin)
    {
        if ((*begin & pattern) == pattern) result.push_back(*begin);
    }

    return result;
}

// tree-traversing lookup

template<class _value_type>
struct cover_node
{
    typedef _value_type value_type;

    value_type mask_a, mask_b;
    std::auto_ptr<cover_node<value_type>> node_a, node_b;
    std::vector<value_type> category_ab, category_zero;
};


template<class _value_type>
std::ostream &pprint(std::ostream &s, const std::auto_ptr<cover_node<_value_type>> &node, const std::string indent = "")
{
    if (!node.get())
    {
        s << indent << "cover_node: (null)" << std::endl;
        return s;
    }

    s << indent
      << "cover_node: mask = " << std::hex << node->mask_a << "/" << node->mask_b
      << ", leafs = " << std::dec << node->category_ab.size() << "/" << node->category_zero.size() << std::endl;

    const std::string sub = indent + "  ";
    pprint(s, node->node_a, sub);
    return pprint(s, node->node_b, sub);
}

enum class cover_category { a, b, ab, zero };

template<class vt>
cover_category
identify_cover(const vt mask_a, const vt mask_b, const vt x)
{
    const auto a = (x & mask_a) != 0;
    const auto b = (x & mask_b) != 0;

    if (!a)
    {
        if (!b) return cover_category::zero;
        else return cover_category::b;
    }
    else
    {
        if (!b) return cover_category::a;
        else return cover_category::ab;
    }
}

template<class vt>
vt bitmask(const size_t n, const size_t m)
{
    return (~0 << n) & ~(~0 << m);
}

template<class iterator>
std::auto_ptr<cover_node<typename iterator::value_type>>
build_cover_node(size_t n, size_t m, iterator begin, const iterator &end)
{
    const size_t split_boundary = 4;

    typedef typename iterator::value_type value_type;
    std::auto_ptr<cover_node<value_type>> node(new cover_node<value_type>);

    if ((m - n) < split_boundary) // too small group
    {
        // overlapped mask for simplification of sub-tree into list
        node->mask_a = ~0;
        node->mask_b = ~0;
        node->category_ab.insert(node->category_ab.end(), begin, end);
        return node;
    }

    std::list<value_type> category_a, category_b;

    const size_t h = (n + m) / 2;

    node->mask_a = bitmask<value_type>(n, h);
    node->mask_b = bitmask<value_type>(h, m);
    auto &category_ab = node->category_ab;
    auto &category_zero = node->category_zero;

    // categorize
    for(;begin != end; ++begin)
    {
        switch(identify_cover(node->mask_a, node->mask_b, *begin))
        {
        case cover_category::a:
            category_a.push_back(*begin);
            break;
        case cover_category::b:
            category_b.push_back(*begin);
            break;
        case cover_category::ab:
            category_ab.push_back(*begin);
            break;
        case cover_category::zero:
            category_zero.push_back(*begin);
            break;
        }
    }

    // build sub-nodes
    if (!category_a.empty()) node->node_a = build_cover_node(n, h, category_a.begin(), category_a.end());
    if (!category_b.empty()) node->node_b = build_cover_node(h, m, category_b.begin(), category_b.end());

    return node;
}

template<class _value_type>
struct cover_walker
{
    typedef _value_type value_type;
    typedef cover_node<value_type> node_type;

    cover_walker(value_type target_pattern, const node_type &root_node) :
        target(target_pattern)
    { 
        walk(root_node);
    }

    const std::list<value_type> &get_result() const
    {
        return result;
    }

private:
    value_type target;

    std::list<value_type> result;

    template<class Container>
    void filtered_add(const Container &xs)
    {
        for(auto it = xs.begin(); it != xs.end(); ++it)
        {
            const auto &x = *it;
            if ((x & target) == target) result.push_back(x);
        }
    }

    template<class Container>
    void add(const Container &xs)
    {
        result.insert(result.end(), xs.begin(), xs.end());
    }

    void flatout(const node_type &node)
    {
        if (node.node_a.get()) flatout(*node.node_a);
        if (node.node_b.get()) flatout(*node.node_b);
        add(node.category_ab);
        add(node.category_zero);
    }

    void walk(const node_type &node)
    {
        const auto &mask_a = node.mask_a;
        const auto &mask_b = node.mask_b;

        if (mask_a == mask_b)
        {
            filtered_add(node.category_ab);
            return;
        }

        switch(identify_cover(mask_a, mask_b, target))
        {
        case cover_category::a:
            if (node.node_a.get()) walk(*node.node_a);
            filtered_add(node.category_ab);
            break;

        case cover_category::b:
            if (node.node_b.get()) walk(*node.node_b);
            filtered_add(node.category_ab);
            break;

        case cover_category::ab:
            filtered_add(node.category_ab);
            break;

        case cover_category::zero:
            flatout(node);
            break;
        }
    }

};


int main()
{
    std::mt19937 rng;
    std::uniform_int_distribution<uint32_t> uint_dist;

    const auto bitmap = uint_dist(rng);
    //const uint32_t bitmap = 0;

    std::vector<uint32_t> bitmaps;
    bitmaps.resize(10000000);

    //for(auto it = bitmaps.begin(); it < bitmaps.end(); ++it) *it = uint_dist(rng);
    for(auto it = bitmaps.begin(); it < bitmaps.end(); ++it) *it = uint_dist(rng) & uint_dist(rng) & uint_dist(rng); // sparse

    const auto brute = [&bitmaps, bitmap](){
        brute_lookup(bitmap, bitmaps.begin(), bitmaps.end());
    };

    std::auto_ptr<cover_node<uint32_t>> cover_tree = build_cover_node<std::vector<uint32_t>::const_iterator>(0, 32, bitmaps.begin(), bitmaps.end());
    pprint(std::cout, cover_tree);

    const auto traversal = [&cover_tree, bitmap]() {
        cover_walker<uint32_t>(bitmap, *cover_tree).get_result();
    };

    bench("brute-force array search", brute);
    bench("tree-traversal search", traversal);


    return 0;
}

Cadi answered 26/8, 2011 at 21:3 Comment(2)

Interesting but I don't know Haskell. – Decare 27/8, 2011 at 15:43

My feeling about this problem is that you can gain more perfomance by tweaking algorithms according to you data samples distribution. I.e. you can sort bits in the order of their occurance. To make tree more balanced and/or to make pattern/mask to exclude more of data at first iterations. etc. – Cadi 27/8, 2011 at 18:42

This solution will take memory proportional to the number of '1' bits in M, but should run reasonably quickly. I am assuming that the set M is static with many Target matching requests.

Preprocessing:

Given the set M, sort it into ascending order. Next construct an array containing one slot per bit. You are using 32 bit numbers so you need an array of 32 slots. Call this array: MBit[0..31]. Each slot contains a pointer to a linked list (call it: MPtr). The linked list contains numbers from M where the corresponding bit is set. For example all numbers from M having bit 3 set would be found in the linked list: MBit[3].MPtr.

The basic algorithm is to process each of the MBit lists where the corresponding Target number has a '1' bit set. Only numbers common to all of the processed lists are selected. Since each MPtr list contains sorted numbers we can scan forward until the number we are looking for is found (a match), a larger number is found (no match) or the list is exhausted (no more matches possible).

The major drawback to this approach is that the same number from M will appear in as many linked lists as it has '1' bits. This is a bit of a memory hit but you have to give something somewhere!

Outline:

Build the MBit array as outlined above.

Build another array data structure for the Target number. The array contains 1 slot per bit in the Target (call this: TBit[0..31]). Each slot contains a linked list pointer (call it: MPtr) and a boolean (call it: BitSet). BitSet indicates whether the corresponding bit of Target is set.

Given a new Target:

/* Initialize each slot of TBit to the head of the corresponding MBit Linked list */
if Target == 0 then goto Done      /* Target contains only zero bits - no matches */
for (i = 0; i < 32; i++) {         /* Bit 0 is LSB, Bit 31 is MSB */
   TBit[i].MPtr = MBit[i].MPtr     /* List of numbers with bit i set */
   TBit[i].BitSet = (Target && 1)  /* Target bit i set? */
   Target = Target >> 1            /* Shift 1 bit right */
}

/* Iterate until one of the linked lists in TBit is exhausted */
for(;;) {
   First1Bit = False          /* Found first '1' bit in Target for this iteration */
   AcceptCandidate = True     /* Assume Candidate number matches all '1' bits in Target */
   for (i = 0; i < 32 & AcceptCandidate; i++) { /* For each bit in TBit Array... */
      if !TBit[i].BitSet then iterate          /* Target bit is zero, nothing to add */
      if !First1Bit then {                     /* First Target '1' bit, initialize for iteration */
         if TBit[i].MPtr == Nil then goto Done /* List exhausted, no more matches possible */
         Candidate = value(TBit[i].MPtr)       /* Candidate Number from linked list */
         TBit[i].MPtr = next(TBit[i].MPtr)     /* setup for next cycle */
         First1Bit = True                      /* First 1 bit for this cycle completed */
      } else {
         /* Scan list until Candidate or larger number found... */
         while (TBit[i].MPtr != Nil & value(TBit[i].MPtr) < Candidate) {
            TBit[i].MPtr = next(TBit[i].MPtr)
         }
         if TBit[i].MPtr = Nil then goto Done  /* List exhausted, no more matches possible */
         AcceptCandidate = (value(TBit[i].MPtr) == Candidate)
      }
   }
   if AcceptCandidate then {
      /* Candidate contains a '1' bit in the same positions Target contains a '1' bit */
      /* Do what you need to do with Candidate */
   }
}
Done: /* No further matches on Target are possible */

I can see a number of optimizations to the above outline but figured this would be a good start.

Diesel answered 25/8, 2011 at 18:17 Comment(4)

Average behavior might be a little better than raw searching but you could still wind up searching virtually the entire list of items in a bad case. Imagine 1111111111111110111111111111111 as the target. – Decare 25/8, 2011 at 21:9

This solution is likely worse than brute-force. Consider that if the M numbers are uniformly distributed, each number has a probabilty of 0.5 to appear in one of the slots. If the target pattern contains just 8 bits set, the expected amount of numbers to check is 4*M instead of M for a brute-force approach. – Philter 25/8, 2011 at 23:41

@Simon Stelling. Good points. Upon reflection I agree, my "solution" above is actually worse than the "straw man" burte force method given in the question for all cases except where Target has only 1 bit set and then the answer has been pre-computed. – Diesel 26/8, 2011 at 13:33

@codist Given a Target of 1111111111111110111111111111111 there are only 2 possible 32 bit integers that could be selected - lots of filtering to get there. As Simon pointed out each of my MBit lists would contain about half the numbers from set M. So if M contains 100000 numbers we need run through an average of 50000 of them times the number of '1' bits in Target. For 31 '1' bits that works out to 1550000 numbers - Over 15 times the size of set M! Scratch this solution. – Diesel 26/8, 2011 at 14:57

This seems like something a SQL database would be good at. If you put a composite index on (MSB, BitsSet, Value) your results should be very fast.

IntegerList:
    Value INT
    BitsSet INT
    MSB INT

INSERT INTO IntegerList(Value, BitsSet, MSB) VALUES(@Value, GetBitsSet(@Value), GetMSB(@Value)

SELECT Value FROM IntegerList WHERE MSB = GetMSB(@Target) AND BitsSet >= GetBitsSet(@Target) AND (Value & @Target) = @Target

---GetMSB
DECLARE @b BIGINT
DECLARE @c INT
SELECT @b = 0x80000000
SELECT @c = 32
WHILE (@b <> 0)
BEGIN
    IF (@b & @value) = @b
    BEGIN
        RETURN @c
    END
    SELECT @b = @b / 2
    SELECT @c = @c - 1
END

---GetBitsSet
DECLARE @b BIGINT
DECLARE @c INT
SELECT @b = 0x80000000
SELECT @c = 0
WHILE (@b <> 0)
BEGIN
    IF (@b & @value) = @b
    BEGIN
        SELECT @c = @c + 1
    END
    SELECT @b = @b / 2
END
RETURN @c

If you must do it in straight C++ I suggest trying to emulate the SQL approach.

Create a struct or class with int Value, BitsSet, MSB

Create 2 arrays of the nodes one sorted for MSB and the other for BitsSet.

Use binary search on the MSB (matching the MSB of Target) array and the BitsSet (matching all BitsSet >= Target) array.

Get the union of those two results, then perform your Target & Value == Target check.

Adhern answered 25/8, 2011 at 13:29 Comment(0)

General approach.

Build tree by bits. Level one node is fisrt bit, than level 2 node is 2nd bit, ...

When you get mask you just negate it and you know which parts of tree you have to exclude.
Than can quickly traverse only trough nodes that are relevant.

N_bits space solution*
Just sort this integers in place and use binary search to traverse this tree.

Complexity O(N_results*N_bits))

it looks like that it run faster by factor 3 compared to bruteforce O(N). But this is my first code in c++, so I might missed something. Any comment about code would be also cool.

How code works?
Only data structure it uses is sorted array of inputs.
At each step it splits array on two parts based on bound using binary search whit std::lower_bound();
in case that mask[depth] is 1 it does not need to go on left part of this tree It has to go right in any case.

If you put mask like 0xFFFFFFFF it will always go only right and will perform in log(n) time if you put mask 0x00000000 it will return all solutions, so it will go at each step both left and right and will perform worse than naive loop. Once size of array is less than 10 (can be changed) it uses naive approach to return all solutions in output vector.

On random input vector of lenght 100k and mask 0x11111111 (8 bits) it runs twice faster than naive loop.

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;

void find_masks(const int mask,const int bound,const int depth,const vector<int>::iterator begin,const vector<int>::iterator end, vector<int> &output )
{
    vector<int>::iterator i,split;
    if( ( distance(begin,end)<10 ) | (depth==0) ) //if less than 10 we just bruteforce it is also stopping condition
    {
        for(i=begin; i!=end; i++)
        {
            if(mask == (int)(mask & (*i)))
            {
                output.push_back(*i);
            }
        }
        return;
    }

    int bitmask =  (1<<depth) ;
    split=lower_bound(begin,end,bound | bitmask );

    if( !(mask & bitmask) ) //go left if mask == 0 at this point
    {
        find_masks(mask,bound,depth-1,begin,split, output );
    }
    find_masks(mask,bound | bitmask ,depth-1,split, end, output );
}


int main ()
{
    vector<int> result,v,bruteforce;
    vector<int>::iterator i;

    //100k random vector
    for(int i=0; i<100000; i++)
    {
        int r=0;
        for(int j=0; j<4; j++)
        {
            r=r<<8;
            r=r^rand();
        }
        v.push_back(r);
    }

    sort(v.begin(),v.end());

    int mask=0xF0F;
    //use sorted vector and binary search for traversing tree
    find_masks(mask,0,31,v.begin(),v.end(), result );

    //use naive loop
    bruteforce.erase(bruteforce.begin(),bruteforce.end());
    for(i=v.begin(); i!=v.end(); i++)
    {
        if(mask == (int)(mask & (*i)))
        {
            bruteforce.push_back(*i);
        }
    }

    cout<<"n solutions binary search " << distance(result.begin(),result.end())<<endl;
    cout<<"n solutions loop "  << distance(bruteforce.begin(),bruteforce.end())<<endl;
    cout<<"are solutions same => " << equal(result.begin(),result.end(),bruteforce.begin());

    return 0;
}

Hiatus answered 25/8, 2011 at 14:25 Comment(5)

intresting solution, but won't it require 2^32 nodes in the tree? [every bit has a branch factor of 2, so 2^32 total nodes]. – Sears 25/8, 2011 at 17:33

This was my first idea too... until I did the math on tree size. – Diesel 25/8, 2011 at 18:46

point is that insted of bulding tree, you just use sorted array of integers and traverse this "tree" using binary search – Hiatus 25/8, 2011 at 19:34

It is not exact match just code runs this way, replace mask whit something like 1. It has great performance if mask has lot of bits set up and if number of solutions is low. But is crappy othervise – Hiatus 25/8, 2011 at 21:33

Yeah that will save you space by taking extra ticks for lower_bound. I'd also add that it will work better for most significant bits set to 1. I.e. one bit at the higher digit will exclude half of search tree, but one bit at the lower will result in many paths to that bit and potentially will split them all by half. – Cadi 27/8, 2011 at 18:26

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags