is there a non-iterative way of finding the index of the Nth set bit? [duplicate]
Asked Answered
C

1

3
#include <cassert>
#include <cstdint>

uint64_t pos_of_nth_bit(uint64_t X, uint64_t bit) {
  while (X) {
    if (!bit--)
      return __builtin_ctzll(X);
    X = X & (X - 1);
  }
  assert(false && "no such bit");
}

Is there a faster way to do the logic from pos_of_nth_bit ?

for example:

pos_of_nth_bit(0b101001000, 0) = 3
pos_of_nth_bit(0b101001000, 1) = 6
pos_of_nth_bit(0b101001000, 2) = 8
Cristinacristine answered 13/11, 2023 at 18:11 Comment(18)
@500-InternalServerError: How would log2 help?Tarmac
I don't understand the question. The index of the Nth bit ... is just bit, isn't it?Tetrad
@TedLyngmo I added examples to clarify what I mean.Cristinacristine
@TedLyngmo: They want the index (of the position) of the nth set bit. So in 10010001001, the set bits are those at 1, 1000, 10000000, and 10000000000. The third of those would have index 7 (counting from 0 on the right; __builtin_ctzll indexes them differently).Tarmac
@EricPostpischil Aha, "the nth set bit" ... Tyker: May I suggest a clearer function name? :-)Tetrad
You could use a lookup table. Though for uint64_t that would be very large (probably a waste of space). But if you do it for a byte. Then you can loop over bytes and not single bits to calcuate the value.Struma
This question does not deserve votes down. The code makes it clear what is being asked, as X & (X-1) turns off the lowest set bit, so the loop is counting set bits. While not everybody may recognize that X & (X - 1) turns off the lowest set bit, the question is tagged bit-manipulation, and one ought to exercise restraint on down voting bit manipulation questions if they are not familiar with this elementary bit manipulation idiom.Tarmac
I'd say downvoting or voting to close when the question does not provide clarity (readers shouldn't have to evaluate code in detail to determine the author's intent, especially as that may mislead) is fair game. However, those who do so should pay close attention to the question and reverse those actions if the issue is rectified.Pebbly
I'd use std::countl_zero(X) for C++20, and return std::numeric_limits<std::uint64_t>::max(); instead of assert. This'd make the code portable and constexprable.Pelion
From the pos_of_nth_bit(0b101001000, 0) = 3 example we can infer that pos_of_nth_bit(0b1, 0) = 0 - what is pos_of_nth_bit(0b0, 0)?Argentiferous
This is the select part of Rank-Select, so a relatively well-studied problem. For example: A Fast x86 Implementation of SelectHardigg
When you say "non-iterative" do you mean "without repeatedly clearing the least significant set bit"? Because I don't think you actually want to forgo "iteration" entirely. I would think that the 6-or-so iterations of the O(log N) solution would be acceptable to you, wouldn't it? Your title has XY wording of your true objective.Guardado
@harold after some more experiementing I tested something that should match what the paper proposed tzcnt + pdep + shift. but it is 42% slower than the naive baseline and 3.6x slower than the fastest implementation I found. so it isnt really intresting.Cristinacristine
@Cristinacristine are you testing it on AMD Zen 2? That processor family has very slow pext/pdep. It's fast on Intel and Zen 3 though.Hardigg
@harold it was on a AMD Zen2Cristinacristine
I don't see how this question is a duplicate of the flagged one. That one requests a value with a single bit set, this one requests the index of the bit. If there's an answer on the other question that illustrates that, please point it out.Craigie
@MarkRansom: OP here assumes __builtin_ctzll is available, so finding the index from the value is trivial (and so is the converse).Tarmac
@EricPostpischil you got me there, thanks.Craigie
J
0

At least if we restrict the answer to normal hardware and reasonably practical solutions, you're almost certainly going to be stuck with some iteration.

The obvious O(1) solution that's not practical is a lookup table. The reason it's not practical is that for a M-bit input, the lookup table for the position of the N'th set bit will require a lookup table with N * 2M entries (each if which would need to be log2(M) bits in size). For 64-bit inputs, I'm pretty sure that's more memory than has been produced in all of history.

It's pretty easy, however, to reduce the maximum number of iterations quite a lot, while keeping the lookup table size quite reasonable. For example, we can look up the number of bits set in a byte with only a 256-entry table. With this, we do a maximum of 8 table lookups to isolate one byte that contains the i'th set bit, and then 8 iterations within that byte to find the correct bit. This reduces the worst case from 64 iterations to 16 iterations (and an expected average from 32 to around 8). The lookup table is small enough to fit in L1 cache of almost any CPU that has a cache at all, so access to it will generally be quite fast.

Julissa answered 13/11, 2023 at 19:5 Comment(1)
If you're using a lookup table just to count bits, you may as well use std::popcountHardigg

© 2022 - 2024 — McMap. All rights reserved.