Find n-th set of a powerset
Asked Answered
P

3

5

I'm trying to find the n-th set in a powerset. By n-th I mean that the powerset is generated in the following order -- first by the size, and then, lexicographically --, and so, the indices of the sets in the powerset of [a, b, c] is:

0 - []
1 - [a]
2 - [b]
3 - [c]
4 - [a, b]
5 - [a, c]
6 - [b, c]
7 - [a, b, c]

While looking for a solution, all I could find was an algorithm to return the n-th permutation of a list of elements -- for example, here.

Context:

I'm trying to retrieve the entire powerset of a vector V of elements, but I need to do this with one set at a time.

Requirements:

  • I can only maintain two vectors at the same time, the first one with the original items in the list, and the second one with the n-th set from the powerset of V -- that's why I'm willing to have an n-th set function here;
  • I need this to be done not in linear time on the space of solutions -- which means it cannot list all the sets and them pick the n-th one;
  • my initial idea is to use bits to represent the positions, and get a valid mapping for what I need -- as the "incomplete" solution I posted.
Pepito answered 5/2, 2013 at 17:42 Comment(2)
You should probably include what you understand by the n-th set in a powerset, since sets have no order. I had never heard that before inferring from your answer what was meant.Kilocalorie
@Kilocalorie Thanks for the comment! I'll add that explanation.Pepito
W
5

I don't have a closed form for the function, but I do have a bit-hacking non-looping next_combination function, which you're welcome to, if it helps. It assumes that you can fit the bit mask into some integer type, which is probably not an unreasonable assumption given that there are 264 possibilities for the 64-element set.

As the comment says, I find this definition of "lexicographical ordering" a bit odd, since I'd say lexicographical ordering would be: [], [a], [ab], [abc], [ac], [b], [bc], [c]. But I've had to do the "first by size, then lexicographical" enumeration before.

// Generate bitmaps representing all subsets of a set of k elements,
// in order first by (ascending) subset size, and then lexicographically.
// The elements correspond to the bits in increasing magnitude (so the
// first element in lexicographic order corresponds to the 2^0 bit.)
//
// This function generates and returns the next bit-pattern, in circular order
// (so that if the iteration is finished, it returns 0).
//
template<typename UnsignedInteger>
UnsignedInteger next_combination(UnsignedInteger comb, UnsignedInteger mask) {
  UnsignedInteger last_one = comb & -comb;
  UnsignedInteger last_zero = (comb + last_one) &~ comb & mask;
  if (last_zero) return comb + last_one + (last_zero / (last_one * 2)) - 1;
  else if (last_one > 1) return mask / (last_one / 2);
  else return ~comb & 1;
}

Line 5 is doing the bit-hacking equivalent of the (extended) regular expression replacement, which finds the last 01 in the string, flips it to 10 and shifts all the following 1s all the way to the right.

s/01(1*)(0*)$/10\2\1/

Line 6 does this one (only if the previous one failed) to add one more 1 and shift the 1s all the way to the right:

s/(1*)0(0*)/\21\1/

I don't know if that explanation helps or hinders :)


Here's a quick and dirty driver (the command-line argument is the size of the set, default 5, maximum the number of bits in an unsigned long):

#include <iostream>

template<typename UnsignedInteger>
std::ostream& show(std::ostream& out, UnsignedInteger comb) {
  out << '[';
  char a = 'a';
  for (UnsignedInteger i = 1; comb; i *= 2, ++a) {
    if (i & comb) {
      out << a;
      comb -= i;
    }
  }
  return out << ']';
}

int main(int argc, char** argv) {
  unsigned int n = 5;
  if (argc > 1) n = atoi(argv[1]);
  unsigned long mask = (1UL << n) - 1;
  unsigned long comb = 0;
  do {
    show(std::cout, comb) << std::endl;
    comb = next_combination(comb, mask);
  } while (comb);
  return 0;
}

It's hard to believe that this function might be useful for a set of more than 64 elements, given the size of the enumeration, but it might be useful to enumerate some limited part, such as all subsets of three elements. In this case, the bit-hackery is only really useful if the modification fits in a single word. Fortunately, that's easy to test; you simply need to do the computation as above on the last word in the bitset, up to the test for last_zero being zero. (In this case, you don't need to bitand mask, and indeed you might want to choose a different way of specifying the set size.) If last_zero turns out to be zero (which will actually be pretty rare), then you need to do the transformation in some other way, but the principle is the same: find the first 0 which precedes a 1 (watch out for the case where the 0 is at the end of a word and the 1 at the beginning of the next one); change the 01 to 10, figure out how many 1s you need to move, and move them to the end.

Whirlwind answered 5/2, 2013 at 21:39 Comment(15)
Didn't read it carefully yet, but it surely helps, a lot (: I'll reply as soon as I get it understood. Also, thanks for pointing out the problem with the definition; I'll change that right away! I'll think a bit also about this, because actually I'm not sure which ordering is the, say, appropriate one ^^Pepito
would you mind to add an example of usage? I'm not very acquainted with this bit hacking mastering techniques you've used (:Pepito
There you go. Note that since the iteration is circular, starting and ending with 0, you cannot use the for-loop-style pre-test/post-increment. If that bothers use, you can adapt it to a different interface.Whirlwind
Well, apart from thank you VERY MUCH, all I can say is that you should be arrested for doing this kind of sorcery! Thanks very much indeed! (:Pepito
Everything is working perfectly! I just have two questions: the first one, that I'm really interested in, if you can help me once more, is about the number of elements: each bit represents a value that is/isn't in the set, but if I have up to thousands of values, could I use a byte array and some kind of cast to perform the operations? Should I use some big int representation? Or it isn't really possible/grows to costly in complexity?Pepito
The second one you don't have to worry about answering if it requires too much explanation, but, how did you manage to reach those expressions? I'd like to know just to be able to do different versions of what you did. Thanks in advance!Pepito
@Rubens: I sketched out the multiword procedure. But honestly, if you have thousands of values, you won't get very far in the enumeration and you might want to use a list-base approach. As for "where did I dig up those expressions", they're standards manipulations of bits; the main thing to remember is that consecutive strings of 1s transmit carries, and 0s transmit borrows. (Also, in 2's complement, -a == ~a + 1. Although gcc knows that, too, so you don't have to.)Whirlwind
I didn't read your code since I don't speak C nor any of its dialects, but do I understand this correctly, your code calculates all 2^k subsets where k is the size of the alphabet, up until the set that's actually being asked for?Whall
@G.Bach: no. you give it a subset (comb) and a set (mask) (yes, those names could have been better), and it tells you the next subset. I think you can ignore everything other than the arithmetic and the if statements, and figure out what it does. & is bitwise and; ~ is bitwise negate; +*/ have their normal meanings. Where I use -, it means 2's complement negative (the argument is unsigned, but it still means that.) I count on divide truncating in the division of mask in the second-last line.Whirlwind
Alright, so this works inductively? How do you figure out the set to start with, or do you simply start with the empty set and iterate n-1 times?Whall
@g.bach: like the comment says, it generates subsets forever. So you can start anywhere, and stop when you get back to where you started. I recommend 0 (the empty set), since that's clearly the first set in OP's desired ordering.Whirlwind
yeah that's what i thought it said; so unless he figures out a way to get a better starting point for the enumeration than the empty set, this way it'll take worst case 2^(N-1) iterations if he wants an element in the "middle" of the power set?Whall
@g.bach. Yup. It's not an answer to "find the nth item". It's an answer to "find me all the subsets in order", which was not what was asked, but may have been what was desired. (In fact, there is an O(n) algorithm, where n is the number of elements in the base set, but that's boring.)Whirlwind
oh that'd be cool, can you sketch that algorithm? cause i've been thinking about that problem (finding the nth set) and couldn't come up with anything efficient.Whall
@G.Bach see tmyklebu's answer. Instead of precomputing a table, use the following identities: C(N, k) = C(N, k - 1) * (N - k) / k (for the first part), and C(M - 1, k - 1) = C(M, k) * k / M (for the second part). Integer arithmetic will work; the divisions are exact. Looking at this, I realize that it's really more like O(N^2) once you take into account the time it takes to multiply and divide (taking N as roughly the log of the size of the multiplicand and numerator; the multiplier and dividend are not bignums).Whirlwind
P
4

Considering a list of elements L = [a, b, c], the powerset of L is given by:

P(L) = {
    [],
    [a], [b], [c],
    [a, b], [a, c], [b, c],
    [a, b, c]
}

Considering each position as a bit, you'd have the mappings:

id  | positions - integer | desired set
 0  |  [0 0 0]  -    0    |  []
 1  |  [1 0 0]  -    4    |  [a]
 2  |  [0 1 0]  -    2    |  [b]
 3  |  [0 0 1]  -    1    |  [c]
 4  |  [1 1 0]  -    6    |  [a, b]
 5  |  [1 0 1]  -    5    |  [a, c]
 6  |  [0 1 1]  -    3    |  [b, c]
 7  |  [1 1 1]  -    7    |  [a, b, c]

As you see, the id is not directly mapped to the integers. A proper mapping needs to be applied, so that you have:

id  | positions - integer |  mapped  - integer
 0  |  [0 0 0]  -    0    |  [0 0 0] -    0
 1  |  [1 0 0]  -    4    |  [0 0 1] -    1
 2  |  [0 1 0]  -    2    |  [0 1 0] -    2
 3  |  [0 0 1]  -    1    |  [0 1 1] -    3
 4  |  [1 1 0]  -    6    |  [1 0 0] -    4
 5  |  [1 0 1]  -    5    |  [1 0 1] -    5
 6  |  [0 1 1]  -    3    |  [1 1 0] -    6
 7  |  [1 1 1]  -    7    |  [1 1 1] -    7

As an attempt on solving this, I came up using a binary tree to do the mapping -- I'm posting it so that someone may see a solution from it:

                                        #
                          ______________|_____________
        a               /                             \
                  _____|_____                   _______|______
        b        /           \                 /              \
              __|__         __|__           __|__            __|__
        c    /     \       /     \         /     \          /     \
           [ ]     [c]    [b]   [b, c]    [a]   [a, c]    [a, b]  [a, b, c]
index:      0       3      2       6       1      5         4         7
Pepito answered 5/2, 2013 at 17:42 Comment(8)
For the 4th set, this will give you [1,0,0] = [a] while the third set will be [0,1,1] = [b,c]. You will need to find a way to map numbers to the lexicographical order, because the order of the numbers representing the sets using the method described in your answer will not match the lexicographical order of the sets described in your question.Whall
@G.Bach Thanks for the comment. I did the mapping but I didn't really noticed the solution was still not ready ^^. I've edited my answer.Pepito
I doubt reversing bits will give you what you want. The link you linked to - although the whole discussion there ignored that point - talks about "permutations with repetition" since the letters may occur more than once. Your problem on the other hand requires permutations without repetitions (which are simply called permutations in group theory and combinatorics) since you want subsets, and sets are simple (meaning they do not contain the same element multiple times).Whall
This leads to the problem that while there are n^k possibilities for permutations of length k with repetitions, there are only n choose k permutations, skewing the counting. Unfortunately I can only point to this since I couldn't come up with an efficient way to solve this problem.Whall
@G.Bach Geez, Bach, I thought my problems were solved, and now I'm so utterly lost ^^Pepito
Hm well you could always do it iteratively. Start out by finding out how many elements your set needs to have. To do this, go "i - (n choose k)" (starting with k = 0, increasing it after every iteration) until i < n choose k. You then know your set needs k elements. You can find an implementation for calculating binomial coefficients (the "n choose k" stuff) in the apache commons library. I didn't think about how to figure out which elements need to be in the set though.Whall
That won't always be linear, this can cost quadratically many multiplications during the calculation of the binomial coefficients though.Whall
@G.Bach I'm looking for some Karnaugh map pages, so that I can do the mapping, but, if I do recall it, it's costly to solved them maps too ^^ My original intention is to get all sets from a powerset, but one set at a time, as I can only afford to maintain an original list, and the n-th list that represents the n-th set in a powerset.Pepito
R
2

Suppose your set has size N.

So, there are (N choose k) sets of size k. You can find the right k (i.e. the size of the nth set) very quickly just by subtracting off (N choose k) from n until n is about to go negative. This reduces your problem to finding the nth k-subset of an N-set.

The first (N-1 choose k-1) k-subsets of your N-set will contain its least element. So, if n is less than (N-1 choose k-1), pick the first element and recurse on the rest of the set. Otherwise, you have one of the (N-1 choose k) other sets; throw away the first element, subtract (N-1 choose k-1) from n, and recurse.

Code:

#include <stdio.h>

int ch[88][88];
int choose(int n, int k) {
 if (n<0||k<0||k>n) return 0;
 if (!k||n==k) return 1;
 if (ch[n][k]) return ch[n][k];
 return ch[n][k] = choose(n-1,k-1) + choose(n-1,k);
}

int nthkset(int N, int n, int k) {
 if (!n) return (1<<k)-1;
 if (choose(N-1,k-1) > n) return 1 | (nthkset(N-1,n,k-1) << 1);
 return nthkset(N-1,n-choose(N-1,k-1),k)<<1;
}

int nthset(int N, int n) {
 for (int k = 0; k <= N; k++)
  if (choose(N,k) > n) return nthkset(N,n,k);
  else n -= choose(N,k);
 return -1; // not enough subsets of [N].
}

int main() {
 int N,n;
 scanf("%i %i", &N, &n);
 int a = nthset(N,n);
 for (int i=0;i<N;i++) printf("%i", !!(a&1<<i));
 printf("\n");
}
Robinett answered 5/2, 2013 at 21:16 Comment(6)
Sounds about right (didnt check the recursion thing), but i wouldnt talk about "very quickly". calculating one binomial coefficient will take O(n^2) time worst-case, and you possibly have to calculate n/2 of those so theres O(n^3) complexity in this (certainly better than the naive brute-force enumeration approach, but still). assuming large sets the power set of which will be considered, this might scale not that nicely. i cant come up with anything better myself, though.Whall
It's most certainly not cubic.Robinett
you're right, no idea why i though calculating the binomials would be in O(n^2).Whall
@G.Bach: It's slightly subtler than that. Computing a single binomial coefficient form scratch will indeed be quadratic. Computing n of them from scratch would be cubic. But I don't compute them all from scratch; I memoise. So you get overall quadratic complexity for computing the relevant binomial coefficients.Robinett
hm the way i see it, if you calculate one binomial coefficient as (n!)/(k!(n-k)!) which is unnecessarily complex since we can leave one of the factorials in the denominator and the bigger "half" of the numerator since they cancel out, you get n multiplications in the numerator and k + n -k in the denominator, so a total of 2n multiplications and one division. am i missing something?Whall
ah i understand the recursion now and the reuse of previously calculated binomials; +1 quite elegant. still not clear on the complexity thoughWhall

© 2022 - 2024 — McMap. All rights reserved.