E

4

9

Assume, that I am tracking the usage of slots in a Fenwick tree. As an example, lets consider tracking 32 slots, leading to a Fenwick tree layout as shown in the image below, where the numbers in the grid indicate the index in the underlying array with counts manipulated by the Fenwick tree where the value in each cell is the sum of "used" items in that segment (i.e. array cell 23 stores the amount of used slots in the range [16-23]). The items at the lowest level (i.e. cells 0, 2, 4, ...) can only have the value of "1" (used slot) or "0" (free slot).

Example Fenwick tree layout

What I am looking for is an efficient algorithm to find the first range of a given number of contiguous free slots.

To illustrate, suppose I have the Fenwick tree shown in the image below in which a total of 9 slots are used (note that the light gray numbers are just added for clarity, not actually stored in the tree's array cells).

Example tree

Now I would like to find e.g. the first contiguous range of 10 free slots, which should find this range:

Example search result

I can't seem to find an efficient way of doing this, and it is giving me a bit of a headache. Note, that as the required amount of storage space is critical for my purposes, I do not wish to extend the design to be a segment tree.

Any thoughts and suggestions on an O(log N) type of solution would be very welcome.

EDIT

Time for an update after bounty period has expired. Thanks for all comments, questions, suggestions and answers. They have made me think things over again, taught me a lot and pointed out to me (once again; one day I may learn this lesson) that I should focus more on the issue I want to solve when asking questions.

Since @Erik P was the only one that provided a reasonable answer to the question that included the requested code/pseudo code, he will receive the bounty.

He also pointed out correctly that O(log N) search using this structure is not going to be possible. Kudos to @DanBjorge for providing a proof that made me think about worst case performance.

The comment and answer of @EvgenyKluev made me realize I should have formulated my question differently. In fact I was already doing in large part what he suggested (see https://gist.github.com/anonymous/7594508 - which shows where I got stuck before posting this question), and asked this question hoping there would be an efficient way to search contiguous ranges, thereby preventing changing this design to a segment tree (which would require an additional 1024 bytes). It appears however that such a change might be the smart thing to do.

For anyone interested, a binary encoded Fenwick tree matching the example used in this question (32 slot fenwick tree encoded in 64 bits) can be found here: https://gist.github.com/anonymous/7594245.

Edme answered 12/11, 2013 at 18:12 Comment(9)

not too sure i understand, but if cell 23 stores #used in 16-23, and you have to find > 8, then you only have to search the tips (7,15,23, etc). Discard if it's > 0, check the next 2 if it is 0? That way you search only the tips (log) of the ranges, plus a small constant increase in the event of partial success? – Sherburn 12/11, 2013 at 18:53

Can you please clarify what you mean by "efficient"? Are you looking to optimize for worst-case time or average-case time? If average-case, do you have any information about expected input distributions? – Dragon 19/11, 2013 at 12:5

@DanBjorge by efficient I mean that it would preferably be O(log N) amortized, i.e. optimize for average-case time. For my specific use case, I expect there to be large consecutive fully used ("1") regions (note that the tree I am actually using is quite a bit larger than the one in the example). I expect searches to generally be for ranges of 1-16 free slots, with a median (guessing here) probably around 4. – Edme 19/11, 2013 at 15:52

Is it important that you find the first such range, as opposed to any arbitrary range of the appropriate size? – Dragon 20/11, 2013 at 1:40

@DanBjorge not critically so, but yes strongly preferred as it allows a somewhat sequential allocation of free slots on repeated requests. – Edme 20/11, 2013 at 5:38

From your description, I assume that you cannot afford additional O(N) memory for any auxiliary structure, correct? – Sybilsybila 20/11, 2013 at 5:55

@Sybilsybila correct, storage space is currently N. I do not wish to have a structure that doubles this amount of space. – Edme 20/11, 2013 at 11:13

IMHO it would be much easier to find an answer if question is reformulated like this "having N items with value 0 or 1 and N words, implement O(log N) data structure to obtain both the number of nonzero items in some range and the first range of a given number of contiguous zeros". Then you just place all values to a bit vector (size N/64), implement Fenwick tree of decreased depth (size N/64) to count items in range, and implement segment tree of decreased depth (size 2*N/64) to find runs of zeros. The only problem is deciding what to do with the remaining unused 15*N/16 words :) – Tenuous 21/11, 2013 at 7:21

@EvgenyKluev true, that would be a feasible solution to this problem. O(log N) updates, point and range queries are important as well as O(N) storage space. Would you care to elaborate it into an answer? – Edme 21/11, 2013 at 15:47

T

3

I think the easiest way to implement all the desired functionality with O(log N) time complexity and at the same time minimize memory requirements is using a bit vector to store all 0/1 (free/used) values. Bit vector can substitute 6 lowest levels of both Fenwick tree and segment tree (if implemented as 64-bit integers). So height of these trees may be reduced by 6 and space requirements for each of these trees would be 64 (or 32) times less than usual.

Segment tree may be implemented as implicit binary tree sitting in an array (just like a well-known max-heap implementation). Root node at index 1, each left descendant of node at index i is placed at 2*i, each right descendant - at 2*i+1. This means twice as much space is needed comparing to Fenwick tree, but since tree height is cut by 6 levels, that's not a big problem.

Each segment tree node should store a single value - length of the longest contiguous sequence of "free" slots starting at a point covered by this node (or zero if no such starting point is there). This makes search for the first range of a given number of contiguous zeros very simple: start from the root, then choose left descendant if it contains value greater or equal than required, otherwise choose right descendant. After coming to some leaf node, check corresponding word of bit vector (for a run of zeros in the middle of the word).

Update operations are more complicated. When changing a value to "used", check appropriate word of bit vector, if it is empty, ascend segment tree to find nonzero value for some left descendant, then descend the tree to get to rightmost leaf with this value, then determine how newly added slot splits "free" interval into two halves, then update all parent nodes for both added slot and starting node of the interval being split, also set a bit in the bit vector. Changing a value to "free" may be implemented similarly.

If obtaining the number of nonzero items in some range is also needed, implement Fenwick tree over the same bit vector (but separate from the segment tree). There is nothing special in Fenwick tree implementation except that adding together 6 lowest nodes is substituted by "population count" operation for some word of the bit vector. For an example of using Fenwick tree together with bit vector see first solution for Magic Board on CodeChef.

All necessary operations for bit vector may be implemented pretty efficiently using various bitwise tricks. For some of them (leading/trailing zero count and population count) you could use either compiler intrinsics or assembler instructions (depending on target architecture).

If bit vector is implemented with 64-bit words and tree nodes - with 32-bit words, both trees occupy 150% space in addition to bit vector. This may be significantly reduced if each leaf node corresponds not to a single bit vector word, but to a small range (4 or 8 words). For 8 words additional space needed for trees would be only 20% of bit vector size. This makes implementation slightly more complicated. If properly optimized, performance should be approximately the same as in variant for one word per leaf node. For very large data set performance is likely to be better (because bit vector computations are more cache-friendly than walking the trees).

Tenuous answered 21/11, 2013 at 17:24 Comment(4)

Thanks for your additional thoughts. I like the idea and had been toying with it for a bit already. I agree, somewhere a compromise must be made in bit vector size for leafs to balance cache friendliness v.s. limiting the amount search steps and storage space. You may have seen from my edit above that I started out by even encoding 32 leaf slots in a 64 bit word as a 5 level tree, which was a nice exercise, but not the way to go as I quickly realized. – Edme 22/11, 2013 at 17:19

@Alex: Yes, I've understood some of your code (unfortunately my C# knowledge is lacking to understand it completely, also you implement everything in a single class which complicates the matters). Your code for counting occupied slots in a range seems OK. But code for searching contiguous block uses the same Fenwick tree, where I expected to see a segment tree... – Tenuous 22/11, 2013 at 18:6

If you find proposed here idea of using segment tree too difficult, you could go simpler way: make one segment tree for maximal contiguous block length as proposed, and make other segment tree storing rightmost starting position for some contiguous block. This takes more memory, but you can use a simple top-down search in this additional tree instead of more complicated search proposed in the answer, this makes update operations simpler. – Tenuous 22/11, 2013 at 18:11

Evgeny, that is because the code I posted does not actually solve the problem of finding contiguous ranges. It just makes a few baby steps in that direction and shows where I got stuck, before I posted this question on SO. – Edme 22/11, 2013 at 18:12

P

3

As mcdowella suggests in their answer, let K2 = K/2, rounding up, and let M be the smallest power of 2 that is >= K2. A promising approach would be to search for contiguous blocks of K2 zeroes fully contained in one size-M block, and once we've found those, check neighbouring size-M blocks to see if they contain sufficient adjacent zeroes. For the initial scan, if the number of 0s in a block is < K2, clearly we can skip it, and if the number of 0s is >= K2 and the size of the block is >= 2*M, we can look at both sub-blocks.

This suggests the following code. Below, A[0 .. N-1] is the Fenwick tree array; N is assumed to be a power of 2. I'm assuming that you're counting empty slots, rather than nonempty ones; if you prefer to count empty slots, it's easy enough to transform from the one to the other.

initialize q as a stack data structure of triples of integers
push (N-1, N, A[n-1]) onto q
# An entry (i, j, z) represents the block [i-j+1 .. i] of length j, which
# contains z zeroes; we start with one block representing the whole array.
# We maintain the invariant that i always has at least as many trailing ones
# in its binary representation as j has trailing zeroes. (**)
initialize r as an empty list of pairs of integers
while q is not empty:
    pop an entry (i,j,z) off q
    if z < K2:
        next

    if FW(i) >= K:
        first_half := i - j/2
        # change this if you want to count nonempty slots:
        first_half_zeroes := A[first_half]
        # Because of invariant (**) above, first_half always has exactly
        # the right number of trailing 1 bits in its binary representation
        # that A[first_half] counts elements of the interval
        # [i-j+1 .. first_half].

        push (i, j/2, z - first_half_zeroes) onto q
        push (first_half, j/2, first_half_zeroes) onto q
    else:
        process_block(i, j, z)

This lets us process all size-M blocks with at least K/2 zeroes in order. You could even randomize the order in which you push the first and second half onto q in order to get the blocks in a random order, which might be nice to combat the situation where the first half of your array fills up much more quickly than the latter half.

Now we need to discuss how to process a single block. If z = j, then the block is entirely filled with 0s and we can look both left and right to add zeroes. Otherwise, we need to find out if it starts with >= K/2 contiguous zeroes, and if so with how many exactly, and then check if the previous block ends with a suitable number of zeroes. Similarly, we check if the block ends with >= K/2 contiguous zeroes, and if so with how many exactly, and then check if the next block starts with a suitable number of zeroes. So we will need a procedure to find the number of zeroes a block starts or ends with, possibly with a shortcut if it's at least a or at most b. To be precise: let ends_with_zeroes(i, j, min, max) be a procedure that returns the number of zeroes that the block from [i-j+1 .. j] ends with, with a shortcut to return max if the result will be more than max and min if the result will be less than min. Similarly for starts_with_zeroes(i, j, min, max).

def process_block(i, j, z):
    if j == z:
        if i > j:
            a := ends_with_zeroes(i-j, j, 0, K-z)
        else:
            a := 0

        if i < N-1:
            b := starts_with_zeroes(i+j, j, K-z-a-1, K-z-a)
        else:
            b := 0

        if b >= K-z-a:
            print "Found: starting at ", i - j - a + 1
        return

    # If the block doesn't start or end with K2 zeroes but overlaps with a
    # correct solution anyway, we don't need to find it here -- we'll find it
    # starting from the adjacent block.
    a := starts_with_zeroes(i, j, K2-1, j)
    if i > j and a >= K2:
        b := ends_with_zeroes(i-j, j, K-a-1, K-a)
        if b >= K-a:
            print "Found: starting at ", i - j - a + 1
        # Since z < 2*K2, and j != z, we know this block doesn't end with K2
        # zeroes, so we can safely return.
        return

    a := ends_with_zeroes(i, j, K2-1, j)
    if i < N-1 and a >= K2:
        b := starts_with_zeroes(i+j, K-a-1, K-a)
        if b >= K-a:
            print "Found: starting at ", i - a + 1

Note that in the second case where we find a solution, it may be possible to move the starting point left a bit further. You could check for that separately if you need the very first position that it could start.

Now all that's left is to implement starts_with_zeroes and ends_with_zeroes. In order to check that the block starts with at least min zeroes, we can test that it starts with 2^h zeroes (where 2^h <= min) by checking the appropriate Fenwick entry; then similarly check if it starts with 2^H zeroes where 2^H >= max to short cut the other way (except if max = j, it is trickier to find the right count from the Fenwick tree); then find the precise number.

def starts_with_zeroes(i, j, min, max):
    start := i-j

    h2 := 1
    while h2 * 2 <= min:
        h2 := h2 * 2
        if A[start + h2] < h2:
            return min
    # Now h2 = 2^h in the text.
    # If you insist, you can do the above operation faster with bit twiddling
    # to get the 2log of min (in which case, for more info google it).

    while h2 < max and A[start + 2*h2] == 2*h2:
        h2 := 2*h2
    if h2 == j:
        # Walk up the Fenwick tree to determine the exact number of zeroes
        # in interval [start+1 .. i]. (Not implemented, but easy.) Let this
        # number be z.

        if z < j:
            h2 := h2 / 2

    if h2 >= max:
        return max

    # Now we know that [start+1 .. start+h2] is all zeroes, but somewhere in 
    # [start+h2+1 .. start+2*h2] there is a one.
    # Maintain invariant: the interval [start+1 .. start+h2] is all zeroes,
    # and there is a one in [start+h2+1 .. start+h2+step].
    step := h2;
    while step > 1:
        step := step / 2
        if A[start + h2 + step] == step:
             h2 := h2 + step
    return h2

As you see, starts_with_zeroes is pretty bottom-up. For ends_with_zeroes, I think you'd want to do a more top-down approach, since examining the second half of something in a Fenwick tree is a little trickier. You should be able to do a similar type of binary search-style iteration.

This algorithm is definitely not O(log(N)), and I have a hunch that this is unavoidable. The Fenwick tree simply doesn't give information that is that good for your question. However, I think this algorithm will perform fairly well in practice if suitable intervals are fairly common.

Phosphene answered 19/11, 2013 at 22:46 Comment(0)

T

3

I think the easiest way to implement all the desired functionality with O(log N) time complexity and at the same time minimize memory requirements is using a bit vector to store all 0/1 (free/used) values. Bit vector can substitute 6 lowest levels of both Fenwick tree and segment tree (if implemented as 64-bit integers). So height of these trees may be reduced by 6 and space requirements for each of these trees would be 64 (or 32) times less than usual.

Segment tree may be implemented as implicit binary tree sitting in an array (just like a well-known max-heap implementation). Root node at index 1, each left descendant of node at index i is placed at 2*i, each right descendant - at 2*i+1. This means twice as much space is needed comparing to Fenwick tree, but since tree height is cut by 6 levels, that's not a big problem.

Each segment tree node should store a single value - length of the longest contiguous sequence of "free" slots starting at a point covered by this node (or zero if no such starting point is there). This makes search for the first range of a given number of contiguous zeros very simple: start from the root, then choose left descendant if it contains value greater or equal than required, otherwise choose right descendant. After coming to some leaf node, check corresponding word of bit vector (for a run of zeros in the middle of the word).

Update operations are more complicated. When changing a value to "used", check appropriate word of bit vector, if it is empty, ascend segment tree to find nonzero value for some left descendant, then descend the tree to get to rightmost leaf with this value, then determine how newly added slot splits "free" interval into two halves, then update all parent nodes for both added slot and starting node of the interval being split, also set a bit in the bit vector. Changing a value to "free" may be implemented similarly.

If obtaining the number of nonzero items in some range is also needed, implement Fenwick tree over the same bit vector (but separate from the segment tree). There is nothing special in Fenwick tree implementation except that adding together 6 lowest nodes is substituted by "population count" operation for some word of the bit vector. For an example of using Fenwick tree together with bit vector see first solution for Magic Board on CodeChef.

All necessary operations for bit vector may be implemented pretty efficiently using various bitwise tricks. For some of them (leading/trailing zero count and population count) you could use either compiler intrinsics or assembler instructions (depending on target architecture).

If bit vector is implemented with 64-bit words and tree nodes - with 32-bit words, both trees occupy 150% space in addition to bit vector. This may be significantly reduced if each leaf node corresponds not to a single bit vector word, but to a small range (4 or 8 words). For 8 words additional space needed for trees would be only 20% of bit vector size. This makes implementation slightly more complicated. If properly optimized, performance should be approximately the same as in variant for one word per leaf node. For very large data set performance is likely to be better (because bit vector computations are more cache-friendly than walking the trees).

Tenuous answered 21/11, 2013 at 17:24 Comment(4)

Thanks for your additional thoughts. I like the idea and had been toying with it for a bit already. I agree, somewhere a compromise must be made in bit vector size for leafs to balance cache friendliness v.s. limiting the amount search steps and storage space. You may have seen from my edit above that I started out by even encoding 32 leaf slots in a 64 bit word as a 5 level tree, which was a nice exercise, but not the way to go as I quickly realized. – Edme 22/11, 2013 at 17:19

@Alex: Yes, I've understood some of your code (unfortunately my C# knowledge is lacking to understand it completely, also you implement everything in a single class which complicates the matters). Your code for counting occupied slots in a range seems OK. But code for searching contiguous block uses the same Fenwick tree, where I expected to see a segment tree... – Tenuous 22/11, 2013 at 18:6

If you find proposed here idea of using segment tree too difficult, you could go simpler way: make one segment tree for maximal contiguous block length as proposed, and make other segment tree storing rightmost starting position for some contiguous block. This takes more memory, but you can use a simple top-down search in this additional tree instead of more complicated search proposed in the answer, this makes update operations simpler. – Tenuous 22/11, 2013 at 18:11

Evgeny, that is because the code I posted does not actually solve the problem of finding contiguous ranges. It just makes a few baby steps in that direction and shows where I got stuck, before I posted this question on SO. – Edme 22/11, 2013 at 18:12

C

2

One quick check, when searching for a range of K contiguous slots, is to find the largest power of two less than or equal to K/2. Any K continuous zero slots must contain at least one Fenwick-aligned range of slots of size <= K/2 that is entirely filled with zeros. You could search the Fenwick tree from the top for such chunks of aligned zeros and then look for the first one that can be extended to produce a range of K contiguous zeros.

In your example the lowest level contains 0s or 1s and the upper level contains sums of descendants. Finding stretches of 0s would be easier if the lowest level contained 0s where you are currently writing 1s and a count of the number of contiguous zeros to the left where you are currently writing zeros, and the upper levels contained the maximum value of any descendant. Updating would mean more work, especially if you had long strings of zeros being created and destroyed, but you could find the leftmost string of zeros of length at least K with a single search to the left branching left where the max value was at least K. Actually here a lot of the update work is done creating and destroying runs of 1,2,3,4... on the lowest level. Perhaps if you left the lowest level as originally defined and did a case by case analysis of the effects of modifications you could have the upper levels displaying the longest stretch of zeros starting at any descendant of a given node - for quick search - and get reasonable update cost.

Crease answered 12/11, 2013 at 19:48 Comment(4)

yes, but there is information available from the higher levels that could be used to skip searching several of the K/2 levels. In the example, it is clear after inspecting cell #15, that no 10 contiguous slots can be found in [0-15] and that any lower levels in that segment only need exploring in case cell #16 is "0". Maybe a good solution could be found by doing a "lower bound" in the top-down manner and checking the "upper bound" using bottom up indexing. – Edme 12/11, 2013 at 20:43

Yes, there are a lot of things you might take into account, but I couldn't see a way of taking account of all of them neatly, so I just picked on one I could do easily. I wonder if this is the right data structure to use, so I have edited my answer to suggest a Fenwick-like structure that makes searching easier but is more expensive to update. – Crease 13/11, 2013 at 5:40

Finding contiguous ranges of used or unused slots is not the only use case for the current structure. If I understood your suggested alternative structure correctly, a cell at the lowest level with value "0" means "used" and a value > 0 means "this many slots are contiguously free to the left, including myself". Where the upper levels contain either the maximum of a stored descendant or an inferred descendant? I had some trouble visualizing this, so I transcribed my original example: cubeupload.com/im/b3NGNR.png. Is this what you suggest? – Edme 13/11, 2013 at 15:44

I'm a bit hazy myself on the finer details of Fenwick trees, but I'm pretty sure from you diagram that you've got my idea, such as it is. – Crease 13/11, 2013 at 19:38

D

2

@Erik covered a reasonable sounding algorithm. However, note that this problem has a lower complexity bound of Ω(N/K) in the worst-case.

Proof:

Consider a reduced version of the problem where:

N and K are both powers of 2
N > 2K >= 4

Suppose your input array is made up of (N/2K) chunks of size 2K. One chunk is of the form K 0s followed by K 1s, every other chunk is the string "10" repeated K times. There are (N/2K) such arrays, each with exactly one solution to the problem (the beginning of the one "special" chunk).

Let n = log2(N), k = log2(K). Let us also define the root node of the tree as being at level 0 and the leaf nodes as being at level n of the tree.

Note that, due to our array being made up of aligned chunks of size 2K, level n-k of the tree is simply going to be made up of the number of 1s in each chunk. However, each of our chunks has the same number of 1s in it. This means that every node at level n-k will be identical, which in turn means that every node at level <= n-k will also be identical.

What this means is that the tree contains no information that can disambiguate the "special" chunk until you start analyzing level n-k+1 and lower. But since all but 2 of the (N/K) nodes at that level are identical, this means that in the worst case you'll have to examine O(N/K) nodes in order to disambiguate the solution from the rest of the nodes.

Dragon answered 20/11, 2013 at 11:21 Comment(3)

Very interesting. Some pictures (like OP's) would be appreciated. – Sybilsybila 20/11, 2013 at 11:25

Thanks, I would assume though that the worst case situation is that where the lowest level consists only of chunks of "1" + (n-k) zeros or (n-k) zeros + "1". In this worst case (if I figured it out correctly), a search would have to inspect a total number of of N/K + 2 * (N/K - 1) nodes. To illustrate my understanding (and for @Mikhail) I uploaded an image (yellow = inspected nodes) here: cubeupload.com/im/wPHXyv.png – Edme 20/11, 2013 at 19:3

And a correction: worst case would be N/K + k * (N/K - 1) if I got it correct this time :D – Edme 20/11, 2013 at 20:48

Proof:

Recommended topics

Hot tags