I give an O(n^3)-time, O(n^2)-space dynamic programming solution in C++ below. But the justification for this algorithm requires some explanation first. In the following, I use "substring" to mean an ordered subset of elements that must be contiguous, and "subsequence" to mean an ordered subset that need not be.
Generating strings that we know are valid
Define the depth of a string to be number of [
s it contains minus the number of ]
s.
Let's establish some rules that all valid ("balanced") bracket-strings must obey:
- There must be an equal number of
[
s and ]
s.
- No prefix of the string may have negative depth, i.e. more
]
s than [
s.
These are plainly necessary conditions -- if a string violates either rule, it cannot be valid. But in order to be able to conveniently generate strings that we know are valid, we need to show that these conditions are also sufficient: that any string that obeys these rules must be valid. To help with this, let's introduce a lemma:
Lemma: If a nonempty string obeys conditions (1) and (2), then it must contain []
as a substring.
Proof: It must begin with a [
, since otherwise the length-1 prefix will contain more ]
s than [
s and violate (2). Therefore it must contain at least one ]
, since otherwise there will be i >= 1 [
s and 0 ]
s, and a valid string must contain an equal number of each by (1). Therefore there must be a first occurrence of a ]
at some position j > 1, and the character to its left must be a [
.
Suppose we have a nonempty string x
that obeys conditions (1) and (2). By the lemma, it must contain a []
. Deleting this pair cannot cause either of these conditions to be violated, so the resulting string, if nonempty, must still obey conditions (1) and (2) and so must still contain a []
somewhere. Thus we can continue deleting []
s until we are left with the empty string.
Inserting a []
into a valid string at any position must produce a new valid string, because the new bracket pair always match each other and don't disturb any other matched pairs. Observe that it is possible to build up our original string x
by repeatedly inserting []
s into the empty string (which is trivially valid) in the reverse order that we deleted them in the previous paragraph: so we have now proved that x
(i.e. any string that obeys conditions (1) and (2)) is valid.
The right recursion
An equivalent way to phrase the OP's question is: "How many ways can we select a valid subsequence of character positions so that the remaining subsequence is also valid?" It's possible to solve this problem using recursion if we first generalise it to:
Given that our selected subsequence so far has depth d
, and our unselected subsequence so far has depth e
, how many ways can we select a valid subsequence from the suffix beginning at position k
so that the remaining subsequence is also valid?
Call this function count(d, e, k)
. The answer to the original question is now count(0, 0, 0)
.
In fact we can simplify the problem further by noticing that d+e
must equal the total depth after k
characters, so we can determine e
from d
and k
, and count()
need only have 2 parameters.
Also, when testing whether it's possible to select an empty subsequence, we only need to test that d
== 0. We don't need to bother testing that e
plus the remaining suffix gets down to 0 without going below it, since if d
== 0 then we have subtracted out a net depth of 0 from the original string (which must end with a depth of 0, and not go below 0, assuming it is valid).
To solve this recursively, we need to break off a first decision point from the process of searching through all possible subsequences. A subsequence of a length-n string must fall into one of the following n+1 cases: either it is empty, or it has a leftmost element, which could be any of the n characters in the string. The subsequences produced by recursing after making this first decision will all be distinct.
With the recursion working properly, memoising it is straightforward: just record the correct answer for any given call in the 2D vector memo[][]
, which is initially filled with -1 values. Since the function count(d, k)
has 2 parameters that can range from 0 up to n/2 and from 0 up to n respectively for a length-n string, O(n^2) space is needed, and the inside of the if (memo[d][k] == -1) {
block will execute at most O(n^2) times. Each time that happens, an O(n) loop runs, taking the time complexity up to O(n^3).
The actual code
#include <iostream>
#include <vector>
#include <string>
using namespace std;
class PartitionCounter {
// Return the number of subsequences of the suffix of v beginning at position k
// that are (a) valid, given that the initial depth of the subsequence is d (on
// account of it being the suffix of some larger subsequence), and (b)
// leave behind a remainder subsequence that is also valid, given that
// the remainder sequence has initial depth depths[k]-d.
int count(int d, int k) {
// If a prefix of either sequence (selected or remaining) has more ']'s
// than '['s then there can't be any completing subsequences.
if (d < 0 || depths[k] - d < 0) {
return 0;
}
// Only compute the answer if we haven't already.
if (memo[d][k] == -1) {
// A subsequence must either contain no elements, or a leftmost element
// at some position. All subsequences produced by recursion after this
// initial choice are distinct (when considering the sequence of
// character indices included, though not necessarily when considering
// the sequence of characters themselves).
// Try including no elements. This effectively terminates the larger
// subsequence that the selected subsequence is part of, so it can be
// legal only if its depth is 0. It also effectively includes all
// remaining characters in the remainder sequence, but if the selected
// subsequence has depth 0 and the original string does too, then it's
// implied that the remainder must also have total depth 0, so we don't
// need to check it.
int n = (d == 0);
// Try including a leftmost element at each remaining position.
// If this would cause a remainder subsequence that has negative
// depth, stop: any later loop iterations would also create illegal
// remainder subsequences.
for (int i = k; i < v.size() && depths[i] - d >= 0; ++i) {
n += count(d + v[i], i + 1);
}
memo[d][k] = n;
}
return memo[d][k];
}
vector<int> v; // 1 for '[', -1 for ']'
vector<int> depths; // depths[i] is the sum of the 1st i elements
vector<vector<int> > memo; // DP matrix. -1 => not computed yet
public:
PartitionCounter(string s) : memo(s.size() / 2 + 1, vector<int>(s.size() + 1, -1)) {
depths.push_back(0);
int total = 0;
for (int i = 0; i < s.size(); ++i) {
v.push_back(1 - 2 * (s[i] == ']')); // Map '[' to 1 and ']' to -1
depths.push_back(total += v[i]);
}
}
int count() {
if (depths.back() == 0) {
return count(0, 0);
} else {
return 0; // Need to handle invalid strings specially
}
}
};
int main(int argc, char **argv) {
PartitionCounter c(argv[1]);
cout << c.count() << '\n';
}
Results
C:\>partitioncounter []
2
C:\>partitioncounter [[]]
6
C:\>partitioncounter [[[]]]
20
C:\>partitioncounter [[[[]]]]
70
C:\>stopwatch partitioncounter [][[[[[][][][][[][][]]]]]][]
10001208
stopwatch: Terminated. Elapsed time: 15ms
stopwatch: Process completed with exit code 0.
C:\>stopwatch partitioncounter [][[[[[][[[]][][][[]]][[][]]]]]][]
562547776
stopwatch: Terminated. Elapsed time: 0ms
stopwatch: Process completed with exit code 0.
You can of course use long long
or whatever instead of int
if you need the extra bits.
EDIT: Fixed bug pointed out by ishi. As we skip over characters to exclude from the selected subsequence, the remainder subsequence accumulates them. What was happening was that we were effectively only excluding remainder subsequences that had more ]
s than [
s on the entire subsequence so far -- but in order to avoid violating condition (2), we need to check that this is true for all prefixes of the string. We now do this by stopping the loop early, so that these violating remainder subsequences are never generated in the first place. As a bonus, the algorithm gets faster! :)
f(AB) = f(A) * f(B)
- since you cannot form a matching bracket with opening from A and closing from B. The hard part is calculatef([A])
- I know thatf([A]) = something + f(A) * 2
since we can put the outside paren to either groups, and the inside isf(A)
ways, but haven't figured out the rest. – Nonprofessionalf([A])
case. – Nonprofessional