Time complexity of this code to list all subsets of a set?
Asked Answered
C

4

7

I have found numerous solutions across the web having O(2^n) complexity. Can someone help me figure out the time complexity of the code given below. Also it involves lot of bit manipulation and I am really weak in that area so I didn't quite get the hang of the code. Would be great if someone could explain the code.

private static void findSubsets(int array[])
{
  int numOfSubsets = 1 << array.length; 

  for(int i = 0; i < numOfSubsets; i++)
  {
    int pos = array.length - 1;
    int bitmask = i;

    System.out.print("{");
    while(bitmask > 0)
    {
      if((bitmask & 1) == 1)
         System.out.print(array[pos]+",");
     {
        bitmask >>= 1;
        pos--;
     }
     System.out.print("}");
  }
}

Is this the most optimal solution?

Chalet answered 20/12, 2013 at 20:20 Comment(0)
A
8

Time Complexity:

Here is an alternative way to derive the time complexity (compared to @templatetypedef).

Let M be the total number of steps in the code. Your outer for-loop runs 2N times and the inner one runs log(i) times so we have:

enter image description here

Raise 2 to each side of the above equation and simplify:

enter image description here

Take the log of both sides of the above equation, and use Sterlings Approximation (Log(x!) ~ xLog(x) - x)

enter image description here


Bit Operations:

To address your weakness in bit manipulation you actually don't need it. It's used in three ways in your code, all of which can be replaced with less obfuscated functions:

  1. Power of 2: (1 << array.length) → (Math.pow(2, array.length))
  2. Divide by 2: (x >>= 1) → (x /= 2)
  3. modulus 2: (x & 1)(x % 2)

Code Explaination:

Also, to address the what the code is doing, it's essentially converting each number (0 to 2N) into it's binary form using the method shown here, and as @templatetypedef states, 1 means that corresponding element is in the set. Here is an example of converting 156 to binary with this method:

enter image description here

As an example with your code:

pos = array.length - 1;
bitmask = 156;                              // as an example, take bitmask = 156
while(bitmask > 0){
    if(bitmask%2 == 1)                      // odd == remainder 1, it is in the set
         System.out.print(array[pos]+",");
    bitmask /= 2;                           // divide by 2
    pos--;
}

By doing this for all bitmasks (0 to 2N) you are finding all unique possible sets.


EDIT:

Here is a look at the ratio (n2n/log2(2n!) in sterling approximation you can see that it quickly approaches unity as n gets large:

enter image description here

Acidify answered 20/12, 2013 at 21:49 Comment(5)
That is a nifty explanation! That said, the number of operations in the inner loop is at most log(i) rather than exactly log(i), so the derivation you're giving only shows an upper bound rather than a tight bound.Windup
@templatetypedef, it's the same bound as the one you give. Look at the plot I added, in fact n2^n is larger for n >= 3. But the only thing that matters is that they approach a constant (in this case 1) as n->infinity meaning they have the same big-O time complexity.Acidify
Sorry, I should clarify. I completely agree with the bound you have. My concern is that the logic you used to derive it only shows that it's an upper bound, so it's not a complete replacement for the math I did. It's a lot cleaner, though!Windup
@Windup I'm not sure why you say that though. It seems that the inner bound must be on the order of log(i) because you divide i by 2 until you get 1. At most you're off by a constant but still the tightest bound you can get is log(i).Acidify
My apologies - you're absolutety right. There's a way using bit twiddling to find all of the set bits in time O(1) each that I thought the code was using, but that's not what it's doing here. Your analysis is correct.Windup
W
8

This code works by using the connection between binary numbers with exactly n bits and subsets of a set of n elements. If you assign each element of the set to a single bit and treat "1" to mean "include the element in the subset" and "0" to mean "exclude the element from the subset," then you can easily convert between the two. For example, if the set contains a, b, and c, then 100 might correspond to the subset a, 011 to the subset bc, etc. Try reading through the code again with this insight.

In terms of efficiency, the above code is very fast both practically and theoretically. Any code that lists subsets has to spend some large amount of time just listing off those subsets. The time required is proportional to the total number of elements that have to be listed. This code spends O(1) work per item listed and therefore is asymptotically optimal (assuming, of course, that there aren't so many elements that you overflow the ints being used).

The total complexity of this code can be determined by counting how many total elements get printed. We can work out some math to solve this. Note that there are n choose 0 subsets of size 0, n choose 1 subsets of size 1, n choose 2 subsets of size 2, etc. Therefore, the total number of elements printed is given by

C = 0 × (n choose 0) + 1 × (n choose 1) + 2 × (n choose 2) + … + n × (n choose n)

Note that (n choose k) = (n choose n - k). Therefore:

C = 0 × (n choose n) + 1 × (n choose n - 1) + 2 × (n choose n - 2) + … + n × (n choose 0)

If we add these two together, we get

2C = n × (n choose 0) + n × (n choose 1) + … + n × (n choose n)

      = n × (n choose 0 + n choose 1 + … + n choose n)

The binomial theorem says that the parenthesized expression is 2n, so

2C = n2n

So

C = n2n-1

So exactly n2n-1 elements are printed, so the time complexity of this approach is Θ(n 2n).

Hope this helps!

Windup answered 20/12, 2013 at 20:31 Comment(0)
A
8

Time Complexity:

Here is an alternative way to derive the time complexity (compared to @templatetypedef).

Let M be the total number of steps in the code. Your outer for-loop runs 2N times and the inner one runs log(i) times so we have:

enter image description here

Raise 2 to each side of the above equation and simplify:

enter image description here

Take the log of both sides of the above equation, and use Sterlings Approximation (Log(x!) ~ xLog(x) - x)

enter image description here


Bit Operations:

To address your weakness in bit manipulation you actually don't need it. It's used in three ways in your code, all of which can be replaced with less obfuscated functions:

  1. Power of 2: (1 << array.length) → (Math.pow(2, array.length))
  2. Divide by 2: (x >>= 1) → (x /= 2)
  3. modulus 2: (x & 1)(x % 2)

Code Explaination:

Also, to address the what the code is doing, it's essentially converting each number (0 to 2N) into it's binary form using the method shown here, and as @templatetypedef states, 1 means that corresponding element is in the set. Here is an example of converting 156 to binary with this method:

enter image description here

As an example with your code:

pos = array.length - 1;
bitmask = 156;                              // as an example, take bitmask = 156
while(bitmask > 0){
    if(bitmask%2 == 1)                      // odd == remainder 1, it is in the set
         System.out.print(array[pos]+",");
    bitmask /= 2;                           // divide by 2
    pos--;
}

By doing this for all bitmasks (0 to 2N) you are finding all unique possible sets.


EDIT:

Here is a look at the ratio (n2n/log2(2n!) in sterling approximation you can see that it quickly approaches unity as n gets large:

enter image description here

Acidify answered 20/12, 2013 at 21:49 Comment(5)
That is a nifty explanation! That said, the number of operations in the inner loop is at most log(i) rather than exactly log(i), so the derivation you're giving only shows an upper bound rather than a tight bound.Windup
@templatetypedef, it's the same bound as the one you give. Look at the plot I added, in fact n2^n is larger for n >= 3. But the only thing that matters is that they approach a constant (in this case 1) as n->infinity meaning they have the same big-O time complexity.Acidify
Sorry, I should clarify. I completely agree with the bound you have. My concern is that the logic you used to derive it only shows that it's an upper bound, so it's not a complete replacement for the math I did. It's a lot cleaner, though!Windup
@Windup I'm not sure why you say that though. It seems that the inner bound must be on the order of log(i) because you divide i by 2 until you get 1. At most you're off by a constant but still the tightest bound you can get is log(i).Acidify
My apologies - you're absolutety right. There's a way using bit twiddling to find all of the set bits in time O(1) each that I thought the code was using, but that's not what it's doing here. Your analysis is correct.Windup
S
1

Let we say that array.length is n. This code works by selecting or excluding elements from a set based on binary representation of all numbers from 0 to 2^n which are exactly all combinations of a set.

Your complexity is O(2^n) for the outer for loop since numOfSubsets = 1 << array.length is 2^n. For inner loop you are shifting right and the worst case is for 111...1 (n bits set to 1) so you'll get O(n) complexity for the worst scenario. So total complexity will be O(n*(2^n)).

Selachian answered 20/12, 2013 at 20:26 Comment(0)
Z
1

https://github.com/yaojingguo/subsets gives two algorithms to solve subsets problem. Iter algorithm is the same as the code given in the question. Recur algorithm uses recursion to visit each possible subset. The time complexity of both algorithms is Θ(n*2^n). In the Iter algorithm, 1 statement executes n*2^n times. 2 statement executes n*2^(n-1) (based on @templatetypedef's analysis). Use a to indicate the cost of 1. And use b to indicate the cost of 2. The total cost is n*2^n*a + n*2^(n-1)*b.

if ((i & (1 << j)) > 0) // 1
    list.add(A[j]); // 2

Here is the main logic of the Recur algorithm:

result.add(new ArrayList<Integer>(list)); // 3
for (int i = pos; i < num.length; i++) { // 4
  list.add(num[i]);
  dfs(result, list, num, i + 1);
  list.remove(list.size() - 1);
}

Statement 3 has the same cost n*2^(n-1)*b as 1. The other cost is the 4 loop. Each loop iteration includes three function calls. 4 executes 2^n times in total. Use d to indicate the cost of 4. The total cost is 2^n*d + n*2^(n-1)*b. The following diagram is the recursion tree for this algorithm with set {1, 2, 3, 4}. A more precise analysis needs to handle the 2^(n-1) leaf nodes differently.

Ø --- 1 --- 2 --- 3 --- 4 | | |- 4 | |- 3 --- 4 | |- 4 |- 2 --- 3 --- 4 | |- 4 |- 3 --- 4 |- 4

To compare the complexity of these two algorithms is to compare n*2^n*a (1) with 2^n*d (2). Divide (1) by (2), we have n * a / d. If n*a is less than d, Iter is faster than Recur. I use Driver to benchmark the efficiency of these two algorithms. Here is the result of one run:

n: 16
Iter mills: 40
Recur mills: 19
n: 17
Iter mills: 78
Recur mills: 32
n: 18
Iter mills: 112
Recur mills: 10
n: 19
Iter mills: 156
Recur mills: 149
n: 20
Iter mills: 563
Recur mills: 164
n: 21
Iter mills: 2423
Recur mills: 1149
n: 22
Iter mills: 7402
Recur mills: 2211

It shows that for small n, Recur is faster than Iter.

Zymogen answered 17/2, 2017 at 2:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.