Algorithm improvement for enumerating binary trees

Asked 6/3, 2017 at 13:43 Answered 6/3, 2017 at 20:26

Solved algorithm language-agnostic binary-tree oeis motzkin-numbers

Currently I can enumerate rooted planar unlabeled binary trees using the following brute-force Prolog code.

e --> u | b | t.
u --> ['[op(u),['], e, [']]'].
b --> ['[op(b),['], e, [','], e, [']]'].
t --> ['number(n)'].

Note: See output listings below.

and output them in increasing size using

es(S) :-
    length(Ls, _),
    phrase(e, Ls),     % or e(Ls,[]),
    atomic_list_concat(Ls,S).

However this is inefficient brute-force algorithm.

Is there a more efficient algorithm for enumerating rooted planar unlabeled binary trees?

Note: The trees can be generated by using the trees from the previous two iterations, think Fibonacci numbers, and adding either a unary branch or a binary branch, however this leads to duplicate trees. I myself could do that version, what I am looking for is an algorithm that generates the trees in an efficient manner the first time without duplicates.

Note: A binary tree is also known as a binary expression tree or a K-ary tree with K <=2.

Supplement

Results

My brute-force version for M(15) took 1 hr and 27 minutes, while the efficient version for M(15) took about 2 seconds.

Obviously the efficient algorithm is just that, much more efficient and why I asked the question.

Motzkin numbers

The number of trees that have N nodes for a rooted planar unlabeled binary trees is given by Motzkin numbers. See: OEIS A001006

Nodes  Trees
1      1
2      1
3      2
4      4
5      9

The number of trees that have N internal nodes for a rooted planar unlabeled binary tree is given by Catalan numbers. There is a more efficient algorithm for generating rooted planar binary trees using Catalan numbers.

Note:
The number of trees based on Catalan numbers do not have unary branches and count only internal nodes.

while

the number of trees based on Motzkin numbers do have unary branches and count all nodes.

See:
OEIS A000108
Catalan Numbers by Tom Davis

Relating Prolog list elements to Motzkin number

% M is Motzkin number, N is number of  list elements passed to atomic_list_concat\2
m_to_n(1,1).
m_to_n(2,3).
m_to_n(M,N) :-
    M > 2,
    N is (M*2)-1.

es_m(M,S) :-
    m_to_n(M,N),
    length(Ls,N),
    e(Ls,[]),
    atomic_list_concat(Ls,S).

Stats with inefficient brute-force version

es_c(M,Count) :-
    aggregate_all(count, es_m(M,_), Count).

?- time(es_c(1,Count)).
% 57 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips)
Count = 1.

?- time(es_c(2,Count)).
% 141 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips)
Count = 1.

?- time(es_c(3,Count)).
% 571 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips)
Count = 2.

?- time(es_c(4,Count)).
% 2,740 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips)
Count = 4.

?- time(es_c(5,Count)).
% 13,780 inferences, 0.000 CPU in 0.001 seconds (0% CPU, Infinite Lips)
Count = 9.

?- time(es_c(6,Count)).
% 70,072 inferences, 0.000 CPU in 0.002 seconds (0% CPU, Infinite Lips)
Count = 21.

?- time(es_c(7,Count)).
% 357,358 inferences, 0.016 CPU in 0.012 seconds (136% CPU, 22870912 Lips)
Count = 51.

?- time(es_c(8,Count)).
% 1,824,082 inferences, 0.063 CPU in 0.056 seconds (111% CPU, 29185312 Lips)
Count = 127.

?- time(es_c(9,Count)).
% 9,313,720 inferences, 0.297 CPU in 0.290 seconds (102% CPU, 31372531 Lips)
Count = 323.

?- time(es_c(10,Count)).
% 47,561,878 inferences, 1.469 CPU in 1.467 seconds (100% CPU, 32382555 Lips)
Count = 835.

?- time(es_c(11,Count)).
% 242,896,160 inferences, 7.672 CPU in 7.665 seconds (100% CPU, 31660599 Lips)
Count = 2188.

?- time(es_c(12,Count)).
% 1,240,493,974 inferences, 38.797 CPU in 38.841 seconds (100% CPU, 31974069 Lips)
Count = 5798.

?- time(es_c(13,Count)).
% 6,335,410,822 inferences, 206.047 CPU in 213.116 seconds (97% CPU, 30747425 Lips)
Count = 15511.

?- time(es_c(14,Count)).
% 32,356,235,848 inferences, 1016.156 CPU in 1018.955 seconds (100% CPU, 31841792 Lips)
Count = 41835.

?- time(es_c(15,Count)).
% 165,250,501,417 inferences, 5231.766 CPU in 5268.363 seconds (99% CPU, 31585991 Lips)
Count = 113634.

References

A free downloadable book as a pdf that might help is "Analytic Combinatorics" by Philippe Flajolet and Robert Sedgewick

Also see the references in the Catalan tag.

Motzkin numbers

BNF

<expression> ::= 
      <unary expression>
    | <binary expression>
    | <terminal>

<unary expression> ::= 
    "(u" <expression> ")"

<binary expression> ::= 
    "(b" <expression> " " <expression> ")"

<terminal> ::= 
    "t"

Using answer by David Eisenstat

Think of these as notes, or breadcrumbs, for me in case I need to use this again in many months after I forget it.

To test the answer I used WSL (Windows Subsystem for Linux) with Python 3 installed

Using Windows 10 I created a file named motzkin.py in the directory

C:\Users\Eric\Documents\Prolog

with the Python code

def ubtrees(n):
    if n == 1:
        yield 't'
    elif n > 1:
        for t in ubtrees(n - 1):
            yield '(u {})'.format(t)
        for i in range(1, n - 1):
            for t1 in ubtrees(i):
                for t2 in ubtrees(n - 1 - i):
                    yield '(b {} {})'.format(t1, t2)

then in WSL I created a symbolic link to the Windows Prolog directory

$ ln -s "/mnt/c/Users/Eric/Documents/Prolog" /home/eric/Prolog

and changed to the WSL Prolog directory

$ cd Prolog

then started Python3

~/Prolog$ python3

and imported the Python code

>>> import motzkin

and ran the following with the argument to ubtrees being the Motzkin number

>>> for value in ubtrees(1):
...   print(value)
...
t

>>> for value in ubtrees(2):
...   print(value)
...
(u t)

>>> for value in ubtrees(3):
...   print(value)
...
(u (u t))
(b t t)

>>> for value in ubtrees(4):
...   print(value)
...
(u (u (u t)))
(u (b t t))
(b t (u t))
(b (u t) t)

>>> for value in ubtrees(5):
...   print(value)
...
(u (u (u (u t))))
(u (u (b t t)))
(u (b t (u t)))
(u (b (u t) t))
(b t (u (u t)))
(b t (b t t))
(b (u t) (u t))
(b (u (u t)) t)
(b (b t t) t)

and to check the Motzkin numbers

def m_count(m):
    count = sum(1 for x in ubtrees(m))
    print("Count: ", count)

>>> m_count(1)
Count:  1
>>> m_count(2)
Count:  1
>>> m_count(3)
Count:  2
>>> m_count(4)
Count:  4
>>> m_count(5)
Count:  9
>>> m_count(6)
Count:  21
>>> m_count(7)
Count:  51
>>> m_count(8)
Count:  127
>>> m_count(9)
Count:  323
>>> m_count(10)
Count:  835
>>> m_count(11)
Count:  2188
>>> m_count(12)
Count:  5798
>>> m_count(13)
Count:  15511
>>> m_count(14)
Count:  41835
>>> m_count(15)
Count:  113634

To exit interactive Python use

quit()

Notes on duplicates

The way I learned about Motzkin numbers was by hand enumerating the trees with pen and paper and finding a duplicate by using the method of adding a unary branch to the preceding trees M(N-1) and binary branches to the preceding preceding M(N-2) trees.

This one tree was generated twice for M(5) from M(4) trees

(b (u t) (u t))

the first by adding a unary branch to

(b (u t) t)

and second by adding a unary branch to

(b t (u t))

After doing this I had the sequence of numbers 1,2,4,9,21 which I used with a OEIS search and the top result was A001006 for Motzkin numbers. Once I had the larger list of Motzkin numbers I used the Prolog code to generate the counts for the larger input values and they all agreed. Now you can add OEIS to your programming tool box with a valid example to demonstrate to others.

The bigger picture

If you have read this far then you probably see that this is part of a bigger problem which is building a system first in Prolog that can use term rewriting to solve math expressions up to basic calculus but more importantly show the steps taken. So this gets one part of the way toward generating binary expression trees to be used as test cases. The next step is to be able to set individually the number of nodes that are unary and binary instead of having them fixed by the Motzkin number. I only used the Motzkin numbers to verify that I was generating a subset of the combinations correctly. Now that I have the pattern for that I can modify it to accept one argument for the number of unary nodes and one for the binary nodes. See: How to enumerate combinations using DCGs with CLP(FD) and multiple constraints

Only when I get stuck will I ask a questions related to this, so don't expect to see all of the necessary bread crumbs.

Prolog output

?- length(Ls, N), phrase(e, Ls).
Ls = ['number(0)'],
N = 1 ;
Ls = ['[op(u),[', 'number(0)', ']]'],
N = 3 ;
Ls = ['[op(u),[', '[op(u),[', 'number(0)', ']]', ']]'],
N = 5 ;
Ls = ['[op(b),[', 'number(0)', ',', 'number(0)', ']]'],
N = 5 ;
Ls = ['[op(u),[', '[op(u),[', '[op(u),[', 'number(0)', ']]', ']]', ']]'],
N = 7 ;
Ls = ['[op(u),[', '[op(b),[', 'number(0)', ',', 'number(0)', ']]', ']]'],
N = 7 ;
Ls = ['[op(b),[', '[op(u),[', 'number(0)', ']]', ',', 'number(0)', ']]'],
N = 7 ;
Ls = ['[op(b),[', 'number(0)', ',', '[op(u),[', 'number(0)', ']]', ']]'],
N = 7 ;

?- es(S).
S = 'number(0)' ;
S = '[op(u),[number(0)]]' ;
S = '[op(u),[[op(u),[number(0)]]]]' ;
S = '[op(b),[number(0),number(0)]]' ;
S = '[op(u),[[op(u),[[op(u),[number(0)]]]]]]' ;
S = '[op(u),[[op(b),[number(0),number(0)]]]]' ;
S = '[op(b),[[op(u),[number(0)]],number(0)]]' ;
S = '[op(b),[number(0),[op(u),[number(0)]]]]' ;
S = '[op(u),[[op(u),[[op(u),[[op(u),[number(0)]]]]]]]]' ;
S = '[op(u),[[op(u),[[op(b),[number(0),number(0)]]]]]]' ;
S = '[op(u),[[op(b),[[op(u),[number(0)]],number(0)]]]]' ;
S = '[op(u),[[op(b),[number(0),[op(u),[number(0)]]]]]]' ;
S = '[op(b),[[op(u),[[op(u),[number(0)]]]],number(0)]]' ;
S = '[op(b),[[op(u),[number(0)]],[op(u),[number(0)]]]]' ;
S = '[op(b),[[op(b),[number(0),number(0)]],number(0)]]' ;
S = '[op(b),[number(0),[op(u),[[op(u),[number(0)]]]]]]' ;
S = '[op(b),[number(0),[op(b),[number(0),number(0)]]]]' ;

?- es_m(1,E).
E = 'number(n)' ;
false.

?- es_m(2,E).
E = '[op(u),[number(n)]]' ;
false.

?- es_m(3,E).
E = '[op(u),[[op(u),[number(n)]]]]' ;
E = '[op(b),[number(n),number(n)]]' ;
false.

?- es_m(4,E).
E = '[op(u),[[op(u),[[op(u),[number(n)]]]]]]' ;
E = '[op(u),[[op(b),[number(n),number(n)]]]]' ;
E = '[op(b),[[op(u),[number(n)]],number(n)]]' ;
E = '[op(b),[number(n),[op(u),[number(n)]]]]' ;
false.

?- es_m(5,E).
E = '[op(u),[[op(u),[[op(u),[[op(u),[number(n)]]]]]]]]' ;
E = '[op(u),[[op(u),[[op(b),[number(n),number(n)]]]]]]' ;
E = '[op(u),[[op(b),[[op(u),[number(n)]],number(n)]]]]' ;
E = '[op(u),[[op(b),[number(n),[op(u),[number(n)]]]]]]' ;
E = '[op(b),[[op(u),[[op(u),[number(n)]]]],number(n)]]' ;
E = '[op(b),[[op(u),[number(n)]],[op(u),[number(n)]]]]' ;
E = '[op(b),[[op(b),[number(n),number(n)]],number(n)]]' ;
E = '[op(b),[number(n),[op(u),[[op(u),[number(n)]]]]]]' ;
E = '[op(b),[number(n),[op(b),[number(n),number(n)]]]]' ;
false.

Aeroscope answered 6/3, 2017 at 13:43 Comment(0)

In Python 3:

def ubtrees(n):
    if n == 1:
        yield 't'
    elif n > 1:
        for t in ubtrees(n - 1):
            yield '(u {})'.format(t)
        for i in range(1, n - 1):
            for t1 in ubtrees(i):
                for t2 in ubtrees(n - 1 - i):
                    yield '(b {} {})'.format(t1, t2)

This recursive enumeration procedure corresponds to the recurrence

M_1 = 1
M_n = M_{n-1} + sum_{i=1}^{n-2} M_i M_{n-1-i},

which is shifted from the recurrence given in Wikipedia by one.

Levity answered 6/3, 2017 at 15:30 Comment(0)

In addition to the solution that is already posted, I would like to present the following Prolog solution for the task.

First, here is a declarative description of such trees:

e(number)        --> [].
e(u(Arg))        --> [_], e(Arg).
e(b(Left,Right)) --> [_,_], e(Left), e(Right).

I am also using a dcg. However, I am using it for a different purpose than in the question: In my case, I only describe a list in order to limit the depth of solutions. Think of the nonterminals as stating how many "tokens" will definitely be consumed by applying a rule. Note also that I am using compound terms to naturally describe such trees.

Sample query, using iterative deepening:

?- length(Ls, _), phrase(e(E), Ls).
Ls = [],
E = number ;
Ls = [_5710],
E = u(number) ;
Ls = [_5710, _5716],
E = u(u(number)) ;
Ls = [_5710, _5716],
E = b(number, number) ;
Ls = [_5710, _5716, _5722],
E = u(u(u(number))) .

I can now count solutions as follows:

es_count(M, Count) :-
        length([_|Ls], M),
        findall(., phrase(e(_), Ls), Sols),
        length(Sols, Count).

Here are a few more solutions:

?- length(_, M),
   time(es_count(M, Count)),
   portray_clause(m_count(M,Count)),
   false.
% 7 inferences, 0.000 CPU in 0.000 seconds (64% CPU, 636364 Lips)
% 28 inferences, 0.000 CPU in 0.000 seconds (81% CPU, 1120000 Lips)
m_count(1, 1).
% 29 inferences, 0.000 CPU in 0.000 seconds (31% CPU, 1318182 Lips)
m_count(2, 1).
% 33 inferences, 0.000 CPU in 0.000 seconds (76% CPU, 1736842 Lips)
m_count(3, 2).
% 41 inferences, 0.000 CPU in 0.000 seconds (81% CPU, 1952381 Lips)
m_count(4, 4).
% 61 inferences, 0.000 CPU in 0.000 seconds (88% CPU, 2178571 Lips)
m_count(5, 9).
% 109 inferences, 0.000 CPU in 0.000 seconds (91% CPU, 2595238 Lips)
m_count(6, 21).
% 230 inferences, 0.000 CPU in 0.000 seconds (93% CPU, 2948718 Lips)
m_count(7, 51).
% 538 inferences, 0.000 CPU in 0.000 seconds (97% CPU, 3221557 Lips)
m_count(8, 127).
% 1,337 inferences, 0.000 CPU in 0.000 seconds (99% CPU, 3293103 Lips)
m_count(9, 323).
% 3,434 inferences, 0.001 CPU in 0.001 seconds (99% CPU, 3400000 Lips)
m_count(10, 835).
% 9,000 inferences, 0.003 CPU in 0.003 seconds (94% CPU, 3301541 Lips)
m_count(11, 2188).
% 23,908 inferences, 0.007 CPU in 0.024 seconds (31% CPU, 3300387 Lips)
m_count(12, 5798).
% 64,158 inferences, 0.019 CPU in 0.024 seconds (79% CPU, 3387792 Lips)
m_count(13, 15511).
% 173,579 inferences, 0.051 CPU in 0.062 seconds (83% CPU, 3397448 Lips)
m_count(14, 41835).
% 472,853 inferences, 0.139 CPU in 0.152 seconds (92% CPU, 3393690 Lips)
m_count(15, 113634).

Prolog is a nice language for generating solutions exhaustively based on declarative descriptions!

Coastal answered 6/3, 2017 at 19:2 Comment(6)

Awesome!!! I tried that approach but my Prolog is still not comparable with yours. – Aeroscope 6/3, 2017 at 19:11

I think you were quite close: Only the many operations on atoms have slowed this all down considerably. Trees are naturally represented as compound terms, and this lets us reuse the basic shapes while filling in subordinate components with different terms. Thank you for the algorithm tip, I will take a look at this tag. If you suspect an efficient Prolog solution, please continue to tag any pertaining questions with prolog also in the future, to make sure that Prolog programmers notice it. – Coastal 6/3, 2017 at 19:19

You are welcome and I hope to see more algorithm tag Prolog answers pop up with new and older questions. The nice thing about the algorithm tag is it is like proofs in math, new proofs are always welcome to old problems if they shine a new perspective on the theorem. So you and other advanced Prolog programmers can have a cottage industry giving new answers to old questions. – Aeroscope 6/3, 2017 at 19:24

Adding deleted comment back since mat made a reference to it. The nice thing about the algorithm tag is that it is common to have multiple answers in different languages and even multiple answers in the same language because the answers are generally expected to focus on giving insight into how to solve a problem and not be centered on using a specific language. Thus if there are many answers in many languages with the algorithm tag that is not only customary it is appreciated. That is why I added the algorithm tag. – Aeroscope 6/3, 2017 at 19:30

what elegance ! – Anthropomorphic 6/3, 2017 at 20:30

Thank you! Your tabling solution to calculate the number of solutions is also very nice! – Coastal 6/3, 2017 at 20:33

Coding in Prolog the recurrence relation, as suggested by @DavidEisenstat:

motzkin_numbers(0, 1).
motzkin_numbers(1, 1).
motzkin_numbers(N, M) :-
    N>1,
    N1 is N-1,
    motzkin_numbers(N1, M1),
    N2 is N-2,
    aggregate_all(sum(MiP), (
              between(0, N2, I),
              motzkin_numbers(I, M_i),
              I_n1i is N-2-I,
              motzkin_numbers(I_n1i, M_n1i),
              MiP is M_i * M_n1i), Ms),
       M is M1 + Ms.

we get

?- length(_,N), time(motzkin_numbers(N,M)).
...
N = 14,
M = 113634 ;
% 4 inferences, 0.000 CPU in 0.000 seconds (89% CPU, 115724 Lips)
% 1,863,722 inferences, 1.107 CPU in 1.107 seconds (100% CPU, 1683529 Lips)
N = 15,
M = 310572 ;
% 4 inferences, 0.000 CPU in 0.000 seconds (88% CPU, 129232 Lips)
% 4,499,430 inferences, 2.645 CPU in 2.646 seconds (100% CPU, 1700821 Lips)
N = 16,
M = 853467 
...

but SWI-Prolog has added tabling recently: just adding this declaration

:- table motzkin_numbers/2.

we get

...
% 310 inferences, 0.001 CPU in 0.001 seconds (99% CPU, 591031 Lips)
N = 14,
M = 113634 ;
% 331 inferences, 0.001 CPU in 0.001 seconds (100% CPU, 457731 Lips)
N = 15,
M = 310572 ;
% 352 inferences, 0.001 CPU in 0.001 seconds (100% CPU, 310880 Lips)
N = 16,
M = 853467 ;
% 373 inferences, 0.001 CPU in 0.001 seconds (100% CPU, 703349 Lips)
N = 17,
M = 2356779 ;
...

Anthropomorphic answered 6/3, 2017 at 20:26 Comment(2)

I had to read this twice to see that you only did the recurrence relation; when I first saw this I was expecting to see the trees generated. I know, it is an exercise for me. :( :) – Aeroscope 6/3, 2017 at 20:31

you're right: I got misguided by the accepted answer - just recoded it in Prolog. – Anthropomorphic 6/3, 2017 at 20:34

In Python 3:

def ubtrees(n):
    if n == 1:
        yield 't'
    elif n > 1:
        for t in ubtrees(n - 1):
            yield '(u {})'.format(t)
        for i in range(1, n - 1):
            for t1 in ubtrees(i):
                for t2 in ubtrees(n - 1 - i):
                    yield '(b {} {})'.format(t1, t2)

This recursive enumeration procedure corresponds to the recurrence

M_1 = 1
M_n = M_{n-1} + sum_{i=1}^{n-2} M_i M_{n-1-i},

which is shifted from the recurrence given in Wikipedia by one.

Levity answered 6/3, 2017 at 15:30 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Supplement

Results

Motzkin numbers

Relating Prolog list elements to Motzkin number

Stats with inefficient brute-force version

References

BNF

Using answer by David Eisenstat

Notes on duplicates

The bigger picture

Prolog output

Recommended topics

Hot tags