In perl, how do I count bits in a bit vector which has bits set higher than 2_147_483_639?
Asked Answered
L

1

7

Perl is pretty great at doing bit strings/vectors. Setting bits is as easy as

vec($bit_string, 123, 1) = 1;

Getting the count of set bits is lightning quick

$count = unpack("%32b*", $bit_string);

But if you set a bit above 2_147_483_639, your count will silently go to zero without any apparent warning or error.

Is there any way around this?

The following code demonstrates the problem

#!/usr/bin/env perl

# create a string to use as our bit vector
my $bit_string = undef;

# set bits a position 10 and 2_000_000_000
# and the apparently last valid integer position 2_147_483_639
vec($bit_string, 10, 1) = 1;
vec($bit_string, 2_000_000_000, 1) = 1;
vec($bit_string, 2_147_483_639, 1) = 1;


# get a count of the bits which are set
my $bit_count = unpack("%32b*", $bit_string);
print("Bits set in bit string: $bit_count\n");
## Bits set in bit string: 3

# check the bits at positions 10, 11, 2_000_000_000, 2_147_483_639
for my $position (10,11,2_000_000_000, 2_147_483_639) {
    my $bit_value = vec($bit_string, $position, 1);
   print("Bit at $position is $bit_value\n");
}
## Bit at 10 is 1
## Bit at 11 is 0
## Bit at 2000000000 is 1
## Bit at 2147483639 is 1

# Adding the next highest bit,  2_147_483_640, causes the count to become 0
# with no complaint, error or warning
vec($bit_string, 2_147_483_640, 1) = 1;
$bit_count = unpack("%32b*", $bit_string);
print("Bits set in bit string after setting bit 2_147_483_640: $bit_count\n");
## Bits set in bit string after setting bit 2_147_483_640: 0

# But the bits are still actually set
for my $position (10, 2_000_000_000, 2_147_483_639, 2_147_483_640) {
    my $bit_value = vec($bit_string, $position, 1);
   print("Bit at $position is $bit_value\n");
}
## Bit at 10 is 1
## Bit at 2000000000 is 1
## Bit at 2147483639 is 1
## Bit at 2147483640 is 1

# Set even higher bits
vec($bit_string, 3_000_000_000, 1) = 1;
vec($bit_string, 4_000_000_000, 1) = 1;

# verify these are also set
for my $position (3_000_000_000, 4_000_000_000) {
    my $bit_value = vec($bit_string, $position, 1);
   print("Bit at $position is $bit_value\n");
}
## Bit at 3000000000 is 1
## Bit at 4000000000 is 1
Lateritious answered 26/7, 2018 at 22:26 Comment(2)
You have a bitmap with 2 billion bits?!Woollen
Quite often. It's insane how fast unpack("%32b*") is at counting set bits. You can do some amazing things with bit operations on bit vectors.Lateritious
R
5

You can try counting by smaller pieces. It's slower, but it seems to work:

$bit_count = 0;
$bit_count += unpack '%32b*', $1
    while $bit_string =~ /(.{1,32766})/g;

Or slightly faster using substr instead of m//:

$bit_count = 0;
my ($pos, $step) = (0, 2 ** 17);
$bit_count += unpack '%32b*', substr $bit_string, $step * $pos++, $step
    while $pos * $step <= length $bit_string;

2 ** 17 seems to give the best performance on my machine, but YMMV.

Another possibility (slower, BTW) is to do a table of number of bits for any possible byte and use that:

my %by_bits;
for my $byte (1 ..255) {
    my $bits_in_byte = sprintf('%b', $byte) =~ tr/1//;  # Fix SO hiliting bug: /
    $by_bits{$bits_in_byte} .= sprintf '\\x%02x', $byte;
}

$bit_count = 0;
for my $count (keys %by_bits) {
    $bit_count += $count * eval('$bit_string =~ tr/' . $by_bits{$count}. '//');
}

Update:

It works correctly in recent Perl. See Another 32-bit residual in 64-bit perl 5.18.

Ringler answered 26/7, 2018 at 22:57 Comment(1)
Thanks for your assistance on this one. You're a lifesaver (peppomint or perhaps cherry).Lateritious

© 2022 - 2024 — McMap. All rights reserved.