Partitioning a String of Brackets

Asked 18/11, 2012 at 2:45 Answered 7/6 at 5:13

Could anyone help me with this problem? I'll type out the problem, then give some of my thoughts/alternative solutions.

So the problem pretty much is, given a single string of brackets like this:

[[]]

We want to assign each bracket a group number (group 1 or group 2). A valid assignment means that if you ONLY look at brackets in group one, it forms a valid, balanced bracket string (which is pretty much stuff like [ ] [ [ ] ] and stuff NOT like ]]]][ ]. The same has to be true of group two. Groups don't have to be contiguous. We want to count the ways to split these brackets into 2 groups.

On the sample string above [ [ ] ], the answer would be six, here are the enumerations: (1 = group 1, 2 = group 2)

  [[]]
1.1111
2.2222
3.1221
4.2112
5.1212
6.2121

The arrangement doesn't have to include the all groups (like in arrangements 1. and 2.)

Thoughts

An obvious brute force solution that works with up to 32 brackets, rather quickly, is to have a 32 bit integer representing which brackets are part of a single group. Or we could use an array. Runtime is O(2^N) (I think), which is too slow?

From looking at the problem, I think that the original string of brackets you are given has to be pre-balanced, or else there is no way to pick a subset such that group 1 and 2 are balanced.

I also noticed that you can separate components - the string "[]" has 2 arrangements, so the string "[][]" has 4 arrangements. (You can find the the number of ways in each component and multiply them together).

I'm confused on how to get these ideas into an algorithm though. I wrote the brute force program and I checked the strings "[]", "[[]]", "[[[]]]", and "[[[[]]]]", and I don't really see a pattern.

From plugging these strings into my brute force program, I get:

"[]" = 2
"[[]]" = 6
"[[]]" = 20
"[[[[]]]]" = 70

Code:

char buf[1000];
int N;
bool isValid(int mask)
{
    int lv = 0;
    for (int i = 0; i < N; i++)
    {
        if (mask & (1 << i))
        {
            if (buf[i] == '(')
            {
                lv++;
            }
            else
            {
                lv--;
            }
            if (lv<0)
            {
                return false;
            }
        }
    }
    return lv==0;
}

int main() 
{
    scanf("%s", buf);
    N = strlen(buf);
    int ways = 0;
    for (int i = 0; i < (1 << N); i++)
    {
        if (isValid(i) && isValid(~i))
        {
            ways++;
        }
    }
    printf("Number of ways is %d\n", ways);
    return 0;
}

Calvin answered 18/11, 2012 at 2:45 Comment(4)

I have removed the earlier unhelpful comment. I have derived some partial solution to the problem. Define a function f that counts the number of valid partitions. Then if 2 balanced bracket groups A and B are placed together, then f(AB) = f(A) * f(B) - since you cannot form a matching bracket with opening from A and closing from B. The hard part is calculate f([A]) - I know that f([A]) = something + f(A) * 2 since we can put the outside paren to either groups, and the inside is f(A) ways, but haven't figured out the rest. – Nonprofessional 18/11, 2012 at 11:12

I totally agree. I noticed the separation of partitions too, but ishi found a formula for "centric" strings. – Calvin 18/11, 2012 at 15:53

I also found the formula (probably the same) for centric string, but it is actually a sub case of f([A]) case. – Nonprofessional 18/11, 2012 at 15:57

Cute problem. How did you meet it? – Jaffa 18/11, 2012 at 20:22

I give an O(n^3)-time, O(n^2)-space dynamic programming solution in C++ below. But the justification for this algorithm requires some explanation first. In the following, I use "substring" to mean an ordered subset of elements that must be contiguous, and "subsequence" to mean an ordered subset that need not be.

Generating strings that we know are valid

Define the depth of a string to be number of [s it contains minus the number of ]s.

Let's establish some rules that all valid ("balanced") bracket-strings must obey:

There must be an equal number of [s and ]s.
No prefix of the string may have negative depth, i.e. more ]s than [s.

These are plainly necessary conditions -- if a string violates either rule, it cannot be valid. But in order to be able to conveniently generate strings that we know are valid, we need to show that these conditions are also sufficient: that any string that obeys these rules must be valid. To help with this, let's introduce a lemma:

Lemma: If a nonempty string obeys conditions (1) and (2), then it must contain [] as a substring.

Proof: It must begin with a [, since otherwise the length-1 prefix will contain more ]s than [s and violate (2). Therefore it must contain at least one ], since otherwise there will be i >= 1 [s and 0 ]s, and a valid string must contain an equal number of each by (1). Therefore there must be a first occurrence of a ] at some position j > 1, and the character to its left must be a [.

Suppose we have a nonempty string x that obeys conditions (1) and (2). By the lemma, it must contain a []. Deleting this pair cannot cause either of these conditions to be violated, so the resulting string, if nonempty, must still obey conditions (1) and (2) and so must still contain a [] somewhere. Thus we can continue deleting []s until we are left with the empty string.

Inserting a [] into a valid string at any position must produce a new valid string, because the new bracket pair always match each other and don't disturb any other matched pairs. Observe that it is possible to build up our original string x by repeatedly inserting []s into the empty string (which is trivially valid) in the reverse order that we deleted them in the previous paragraph: so we have now proved that x (i.e. any string that obeys conditions (1) and (2)) is valid.

The right recursion

An equivalent way to phrase the OP's question is: "How many ways can we select a valid subsequence of character positions so that the remaining subsequence is also valid?" It's possible to solve this problem using recursion if we first generalise it to:

Given that our selected subsequence so far has depth d, and our unselected subsequence so far has depth e, how many ways can we select a valid subsequence from the suffix beginning at position k so that the remaining subsequence is also valid?

Call this function count(d, e, k). The answer to the original question is now count(0, 0, 0).

In fact we can simplify the problem further by noticing that d+e must equal the total depth after k characters, so we can determine e from d and k, and count() need only have 2 parameters.

Also, when testing whether it's possible to select an empty subsequence, we only need to test that d == 0. We don't need to bother testing that e plus the remaining suffix gets down to 0 without going below it, since if d == 0 then we have subtracted out a net depth of 0 from the original string (which must end with a depth of 0, and not go below 0, assuming it is valid).

To solve this recursively, we need to break off a first decision point from the process of searching through all possible subsequences. A subsequence of a length-n string must fall into one of the following n+1 cases: either it is empty, or it has a leftmost element, which could be any of the n characters in the string. The subsequences produced by recursing after making this first decision will all be distinct.

With the recursion working properly, memoising it is straightforward: just record the correct answer for any given call in the 2D vector memo[][], which is initially filled with -1 values. Since the function count(d, k) has 2 parameters that can range from 0 up to n/2 and from 0 up to n respectively for a length-n string, O(n^2) space is needed, and the inside of the if (memo[d][k] == -1) { block will execute at most O(n^2) times. Each time that happens, an O(n) loop runs, taking the time complexity up to O(n^3).

The actual code

#include <iostream>
#include <vector>
#include <string>

using namespace std;

class PartitionCounter {
    // Return the number of subsequences of the suffix of v beginning at position k
    // that are (a) valid, given that the initial depth of the subsequence is d (on
    // account of it being the suffix of some larger subsequence), and (b)
    // leave behind a remainder subsequence that is also valid, given that
    // the remainder sequence has initial depth depths[k]-d.
    int count(int d, int k) {
        // If a prefix of either sequence (selected or remaining) has more ']'s
        // than '['s then there can't be any completing subsequences.
        if (d < 0 || depths[k] - d < 0) {
            return 0;
        }

        // Only compute the answer if we haven't already.
        if (memo[d][k] == -1) {
            // A subsequence must either contain no elements, or a leftmost element
            // at some position.  All subsequences produced by recursion after this
            // initial choice are distinct (when considering the sequence of
            // character indices included, though not necessarily when considering
            // the sequence of characters themselves).

            // Try including no elements.  This effectively terminates the larger
            // subsequence that the selected subsequence is part of, so it can be
            // legal only if its depth is 0.  It also effectively includes all
            // remaining characters in the remainder sequence, but if the selected
            // subsequence has depth 0 and the original string does too, then it's
            // implied that the remainder must also have total depth 0, so we don't
            // need to check it.
            int n = (d == 0);

            // Try including a leftmost element at each remaining position.
            // If this would cause a remainder subsequence that has negative
            // depth, stop: any later loop iterations would also create illegal
            // remainder subsequences.
            for (int i = k; i < v.size() && depths[i] - d >= 0; ++i) {
                n += count(d + v[i], i + 1);
            }

            memo[d][k] = n;
        }

        return memo[d][k];
    }

    vector<int> v;          // 1 for '[', -1 for ']'
    vector<int> depths;     // depths[i] is the sum of the 1st i elements
    vector<vector<int> > memo;  // DP matrix.  -1 => not computed yet

public:
    PartitionCounter(string s) : memo(s.size() / 2 + 1, vector<int>(s.size() + 1, -1)) {
        depths.push_back(0);
        int total = 0;
        for (int i = 0; i < s.size(); ++i) {
            v.push_back(1 - 2 * (s[i] == ']')); // Map '[' to 1 and ']' to -1
            depths.push_back(total += v[i]);
        }
    }

    int count() {
        if (depths.back() == 0) {
            return count(0, 0);
        } else {
            return 0;       // Need to handle invalid strings specially
        }
    }
};

int main(int argc, char **argv) {
    PartitionCounter c(argv[1]);
    cout << c.count() << '\n';
}

Results

C:\>partitioncounter []
2

C:\>partitioncounter [[]]
6

C:\>partitioncounter [[[]]]
20

C:\>partitioncounter [[[[]]]]
70

C:\>stopwatch partitioncounter [][[[[[][][][][[][][]]]]]][]
10001208
stopwatch: Terminated. Elapsed time: 15ms
stopwatch: Process completed with exit code 0.

C:\>stopwatch partitioncounter [][[[[[][[[]][][][[]]][[][]]]]]][]
562547776
stopwatch: Terminated. Elapsed time: 0ms
stopwatch: Process completed with exit code 0.

You can of course use long long or whatever instead of int if you need the extra bits.

EDIT: Fixed bug pointed out by ishi. As we skip over characters to exclude from the selected subsequence, the remainder subsequence accumulates them. What was happening was that we were effectively only excluding remainder subsequences that had more ]s than [s on the entire subsequence so far -- but in order to avoid violating condition (2), we need to check that this is true for all prefixes of the string. We now do this by stopping the loop early, so that these violating remainder subsequences are never generated in the first place. As a bonus, the algorithm gets faster! :)

Sirree answered 19/11, 2012 at 18:5 Comment(6)

How did you verify that your answers are correct? From the looks of it, the 16145977 looks to be incorrect. I hope you agree that for []A, where A is any valid string, you should get result 2 * count(A). Since 16145977, it's clearly an incorrect answer. FYI, the correct one is: 10001208. – Minhminho 20/11, 2012 at 14:40

I don't have C++ compiler at work, so I'll be able to fiddle with your code at home. Could you please test it with string [[[][]]] (should give 180)... – Minhminho 20/11, 2012 at 14:46

heh... just noticed. I meant to say 'since 16145977 is odd,' – Minhminho 20/11, 2012 at 16:54

@ishi: Thanks for pointing that out, you're right about 16145977 being wrong. I was mistakenly allowing some remainder sequences that violated condition (2). The fixed code now gets 10001208 for [][[[[[][][][][[][][]]]]]][]. However I think your claim that count([[[][]]])=180 is wrong: my code now gets 68 for this, and I've verified that answer using a brute force approach (I can supply code if you want). – Sirree 20/11, 2012 at 17:36

@ishi: Also if you can't resist tinkering before you get home :) you can use ideone: ideone.com/Xd0RnQ – Sirree 20/11, 2012 at 17:38

You are right about the 180 :). I have unit-tests and '[[[]][][]]' (producing 180) is the one after '[[[][]]]'. I must have blinked :). – Minhminho 20/11, 2012 at 19:54

If it's any help, for 'centric' strings like [[[]]], you can calculate your number of ways using ways(1) = 2 and ways(n) = ways(n-1)*(4*n-2)/n (or C(2n,n) if you prefer), where n is the depth of nesting.

Nested but not 'centric' groups (like [[][]]) seem to follow similar pattern but I cannot figure out the correct formula for those.

Edit

We are running out of notation power, so I'll use texify to express math formulas. I came up with something like this:

formula

Surrounding groups (you can change the formula by this).

Minhminho answered 18/11, 2012 at 15:29 Comment(10)

Also for the centric ones, the count is the terms of the mathworld.wolfram.com/CentralBinomialCoefficient.html . I got this by googling the sequence (2, 6, 20, 70). – Calvin 18/11, 2012 at 17:1

I was following a different path. Thus far I have a 'theorem': countWays([AB]) = countWays([A])*countWays([B])/2. If this proves correct, I think we'll have solution :) – Minhminho 18/11, 2012 at 17:21

Interesting. What does [AB] represent? Something like "[[][]]" or something like "[[]] [[]]" – Calvin 18/11, 2012 at 17:36

Nice! Let me verify that with a few cases. It shouldn't be too hard to come up with a recursive solution with this. – Calvin 18/11, 2012 at 18:4

A, B and A_i in formula means any group. It seems to work :) – Minhminho 18/11, 2012 at 18:12

Heh... yes, I think we all would like to know how to calculate something like this ;). However, if you know 'how much' is [[[][]]], you can use my formula to calculate [ [[][]] B ] (where B is, again, anything... provided that you know how much is [B], that is. – Minhminho 18/11, 2012 at 18:29

But [[[][]]] has nested, except the second layer is [[][]] which is by the formula [[]][[]] divided by 2. (which is 18). but then there is another layer of [] around it. Is it trivial to propagate the divided by 2 factor up to this expression? @Minhminho – Calvin 18/11, 2012 at 18:33

B/c the answer was 68 to above comment. – Calvin 18/11, 2012 at 19:12

Your above comment says if you know what [[][]] is, which is 18 according to your formula. How do you use that number with string with an extra parentheses around it? [ [[][]] ]? – Calvin 18/11, 2012 at 19:43

let us continue this discussion in chat – Minhminho 18/11, 2012 at 19:45

Extending ishi's answer, I think it can be done in O(N^2) as d + e is equal to the depth of the prefix. The code below gets the same results.

/*
How many ways are there to split a string
of brackets into two such that both are balanced.
*/

#include <iostream>
#include <string>
using namespace std;

#define MAXN 1000
#define M 1000000007

int N, dp[MAXN][MAXN];
string s;

int recurse(int k, int i, int p) {
  if (i >= N) {
    return k == 0 ? 1 : 0;
  }
  if (k < 0 || p-k < 0) {
    return 0;
  }

  int &ans = dp[k][i];
  if (ans != -1) {
    return ans;
  }

  ans = 0;
  if (s[i] == '[') {
    ans += recurse(k+1,i+1,p+1)+recurse(k,i+1,p+1);
    return ans;
  }
  if (s[i] == ']') {
    ans += recurse(k-1,i+1,p-1)+recurse(k,i+1,p-1);
    return ans;
  }
  return 0;
}

int main() {
  cin >> s;
  N = s.size();
  for (int k = 0; k < N; k++) {
    for (int i = 0; i < N; i++) {
      dp[k][i] = -1;
    }
  }
  cout << recurse(0,0,0) << endl;
}

Chaotic answered 31/7, 2017 at 0:9 Comment(0)

Easier to understand solution.

  public int splitString(String s){
    int gOneC=0;
    int gTwoC=0;

    Map<String, Integer> memo = new HashMap<>();
    return splitStringRecur(s,0, gOneC, gTwoC, memo);
}

private int splitStringRecur(String s, int i, int gOneC, int gTwoC, Map<String, Integer> memo) {
    if(i == s.length()){
        if(gOneC==0 || gTwoC==0) return 1;
    }
    String t = i+"-"+gOneC+"-"+gTwoC+"";
    if(memo.containsKey(t)) return memo.get(t);

    int gc =0;
    int gs =0;
    if(s.charAt(i)=='('){
        gc = splitStringRecur(s, i+1, gOneC+1, gTwoC,memo);
        gs = splitStringRecur(s, i+1, gOneC, gTwoC+1,memo);
    }else {
        if(gOneC > 0){
          gc +=  splitStringRecur(s, i+1, gOneC-1, gTwoC,memo);
        }
        if( gTwoC >0){
           gs += splitStringRecur(s, i+1, gOneC, gTwoC-1,memo);
        }
    }
    memo.put(t, gc+gs);
    return gc+gs;
}

Warman answered 18/7, 2020 at 10:53 Comment(1)

This answer can be improved by adding some explanation. What makes this solution easier to understand? – Multiplicity 18/7, 2020 at 14:15

int partition(string s){
    int n = s.size();
    vector<long long>dp(n+1,0);
    dp[0] = 1;
    for(int i=0;i<n;i++){
        if(s[i]=='['){
            for(int j=n-1;j>0;j--){
                dp[j]+=dp[j-1];
            }
        }else{
            for(int j=0;j<n;j++){
                dp[j]+=dp[j+1];
            } 
        }
    }
    return dp[0];
}

Time Complexity: O(N^2) Points to Note:

Assume that a open bracket => +1 and a close bracket => -1.
A subsequence with negative sum e.g ]][ can never become a valid subsequence.

So maintain a dp of all the positive sums possible.

For every '[' encountered, the sum of the current sequence increases by one, so no of sequences possible with sum i increases by dp[i-1]
Similarily, if ']' is encountered, dp[i]+=dp[i+1]

Also note the way I am running the inner loop is different for '[' and ']'. This is to prevent overcounting and correct dp updates.

Ans = dp[0]

Copyhold answered 7/6 at 5:13 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Generating strings that we know are valid

The right recursion

The actual code

Results

Recommended topics

Hot tags