Will this be tail call optimized in SWI-Prolog
Asked Answered
T

2

5
step_n(0, I, I).
step_n(N, In, Out) :-
    N > 0, plus(N1, 1, N), phase_step(In, T),
    step_n(N1, T, Out).

phase_step is a function that transforms data.

Will this step_n run in almost the same memory as phase_step? If not, how should I rewrite it to do so? Will this depend on phase_step having a single solution?

EDIT: After some debugging using prolog_current_frame, I found out that if phase_step is a simple function like Out is In + 1, then optimization happens but not in my use case.

Why is TCO dependent on phase_step predicate?

Tideland answered 30/10, 2020 at 8:38 Comment(8)
I am running out of stack space. This also happened in other instances where I tried iteration, written as tail call recursion.Tideland
Of interest: Is this doing TCO, and if so how?Andri
@GuyCoder : You mean turn off tail call optimization (when debugging, tracing)?Tideland
So the best I can find related to that predicate is the way Boris did it in this post using prolog_current_frame/1 and format/2Andri
Be wared that using the debugger and tracing turn off (thanks @rajashekar) tail call optimization so don't rely on that for a definitive answer.Andri
So if the prolog_current_frame gives the same value for all instances of recursion, then it is optimized? I checked for a simple double function and my step_n, double is giving same value and my step_n is not.Tideland
SWI-Prolog has wrap_predicate/4 which might be a more practical way to use, e.g. write a predicate that wraps the predicate in question and check the frames. I don't plan on doing it for this question or even writing an answer. Others are free to explore and answer.Andri
So if the prolog_current_frame gives the same value for all instances of recursion, then it is optimized? Yes, the stack is not growing which is why TCO is done. It allows the compiler to pass the values without pushing a new entry (frame) on the stackAndri
F
6

Will this depend on phase_step having a single solution?

Kind of, but a bit stronger still: It depends on phase_step being deterministic, which means, not leaving any "choice points". A choice point is a future path to be explored; not necessarily one that will produce a further solution, but still something Prolog needs to check.

For example, this is deterministic:

phase_step_det(X, X).

It has a single solution, and Prolog does not prompt us for more:

?- phase_step_det(42, Out).
Out = 42.

The following has a single solution, but it is not deterministic:

phase_step_extrafailure(X, X).
phase_step_extrafailure(_X, _Y) :-
    false.

After seeing the solution, there is still something Prolog needs to check. Even if we can tell by looking at the code that that something (the second clause) will fail:

?- phase_step_extrafailure(42, Out).
Out = 42 ;
false.

The following has more than one solution, so it is not deterministic:

phase_step_twosolutions(X, X).
phase_step_twosolutions(X, Y) :-
    plus(X, 1, Y).

?- phase_step_twosolutions(42, Out).
Out = 42 ;
Out = 43.

Why is TCO dependent on phase_step predicate?

If there are further paths to be explored, then there must be data about those paths stored somewhere. That "somewhere" is some sort of stack data structure, and for every future path there must be a frame on the stack. This is why your memory usage grows. And with it, the computation time (the following uses copies of your step_n with my corresponding phase_step variants from above):

?- time(step_n_det(100_000, 42, Out)).
% 400,002 inferences, 0.017 CPU in 0.017 seconds (100% CPU, 24008702 Lips)
Out = 42 ;
% 7 inferences, 0.000 CPU in 0.000 seconds (87% CPU, 260059 Lips)
false.

?- time(step_n_extrafailure(100_000, 42, Out)).
% 400,000 inferences, 4.288 CPU in 4.288 seconds (100% CPU, 93282 Lips)
Out = 42 ;
% 100,005 inferences, 0.007 CPU in 0.007 seconds (100% CPU, 13932371 Lips)
false.

?- time(step_n_twosolutions(100_000, 42, Out)).
% 400,000 inferences, 4.231 CPU in 4.231 seconds (100% CPU, 94546 Lips)
Out = 42 ;
% 4 inferences, 0.007 CPU in 0.007 seconds (100% CPU, 548 Lips)
Out = 43 ;
% 8 inferences, 0.005 CPU in 0.005 seconds (100% CPU, 1612 Lips)
Out = 43 ;
% 4 inferences, 0.008 CPU in 0.008 seconds (100% CPU, 489 Lips)
Out = 44 ;
% 12 inferences, 0.003 CPU in 0.003 seconds (100% CPU, 4396 Lips)
Out = 43 ;
% 4 inferences, 0.009 CPU in 0.009 seconds (100% CPU, 451 Lips)
Out = 44 .  % many further solutions

One way to explore this is using the SWI-Prolog debugger, which has a way of showing you alternatives (= choice points = future paths to be explored):

?- trace, step_n_det(5, 42, Out).
   Call: (9) step_n_det(5, 42, _1496) ? skip           % I typed 's' here.
   Exit: (9) step_n_det(5, 42, 42) ? alternatives      % I typed 'A' here.
 [14] step_n_det(0, 42, 42)
   Exit: (9) step_n_det(5, 42, 42) ? no debug          % I typed 'n' here.
Out = 42 ;
false.

?- trace, step_n_extrafailure(5, 42, Out).
   Call: (9) step_n_extrafailure(5, 42, _1500) ? skip
   Exit: (9) step_n_extrafailure(5, 42, 42) ? alternatives
 [14] step_n_extrafailure(0, 42, 42)
 [14] phase_step_extrafailure(42, 42)
 [13] phase_step_extrafailure(42, 42)
 [12] phase_step_extrafailure(42, 42)
 [11] phase_step_extrafailure(42, 42)
 [10] phase_step_extrafailure(42, 42)
   Exit: (9) step_n_extrafailure(5, 42, 42) ? no debug
Out = 42 ;
false.

All of those alternatives correspond to extra interpreter frames. If you use SWI-Prolog's visual debugger, it will also show you a graph representation of your stack, including all open choice points (though I've always found that hard to make sense of).

So if you want TCO and not grow the stack, you need your phase step to execute deterministically. You can do that by making the phase_step predicate itself deterministic. You can also put a cut after the phase_step call inside step_n.

Here are the calls from above with a cut after each phase_step:

?- time(step_n_det(100_000, 42, Out)).
% 400,001 inferences, 0.017 CPU in 0.017 seconds (100% CPU, 24204529 Lips)
Out = 42 ;
% 7 inferences, 0.000 CPU in 0.000 seconds (83% CPU, 737075 Lips)
false.

?- time(step_n_extrafailure(100_000, 42, Out)).
% 400,000 inferences, 0.023 CPU in 0.023 seconds (100% CPU, 17573422 Lips)
Out = 42 ;
% 5 inferences, 0.000 CPU in 0.000 seconds (93% CPU, 220760 Lips)
false.

?- time(step_n_twosolutions(100_000, 42, Out)).
% 400,000 inferences, 0.023 CPU in 0.023 seconds (100% CPU, 17732727 Lips)
Out = 42 ;
% 5 inferences, 0.000 CPU in 0.000 seconds (94% CPU, 219742 Lips)
false.

Do not place cuts blindly, only once you understand where and why you really need them. Note how in the extrafailure case the cut only removes failures, but in the twosolutions case it removes actual solutions.

Fixity answered 30/10, 2020 at 10:57 Comment(1)
Thank you, non-determinism is the cause. (gist.github.com/k3ut0i/0f4609385d838d1de1c6db3bc1d513e3)Tideland
T
2

One helpful tool to understood code performance issues, notably unwanted non-determinism, is a ports profiler tool as found on e.g. ECLiPSe and Logtalk. The Logtalk ports_profiler tool is portable so we can use it here. We start by wrapping your code (from your gist link):

:- use_module(library(lists), []).


:- object(step).

    :- public(step_n/3).
    :- use_module(lists, [reverse/2]).

    % pattern for the nth digit mth-coeffcient
    digit_m(N, M, D) :-
        divmod(M, N, Q, _),  divmod(Q, 4, _, C),
        (C = 0, D = 0; C = 1, D = 1; C = 2, D = 0; C = 3, D = -1).
    
    calculate_digit_n(N, In, D) :-
        calculate_digit_n_(N, In, D, 1, 0).
    calculate_digit_n_(_, [], D, _, Acc) :- D1 is abs(Acc), divmod(D1, 10, _, D).
    calculate_digit_n_(N, [I | Is], D, M, Acc) :-
        digit_m(N, M, C), P is C*I, M1 is M+1, Acc1 is Acc+P,
        calculate_digit_n_(N, Is, D, M1, Acc1).
    
    phase_step(In, Out) :-
        length(In, L), L1 is L + 1, phase_step_(In, Out, L1, 1, []).
    phase_step_(_, Out, L, L, Acc) :- reverse(Out, Acc).
    phase_step_(In, Out, L, N, Acc) :-
        N < L, calculate_digit_n(N, In, D), N1 is N + 1,
        phase_step_(In, Out, L, N1, [D | Acc]).
    
    step_n(0, I, I).
    step_n(N, In, Out) :-
        prolog_current_frame(Fr), format('~w ', Fr),
        N > 0, N1 is N - 1, phase_step(In, T),
        step_n(N1, T, Out).

:- end_object.

%:- step_n(10, [1, 2, 3, 4, 5, 6, 7, 8], X).

And then (using SWI-Prolog as the backend as that is the Prolog system you told us you're using):

$ swilgt
...
?- {ports_profiler(loader)}.
% [ /Users/pmoura/logtalk/tools/ports_profiler/ports_profiler.lgt loaded ]
% [ /Users/pmoura/logtalk/tools/ports_profiler/loader.lgt loaded ]
% (0 warnings)
true.

?- logtalk_load(step, [debug(on), source_data(on)]).
% [ /Users/pmoura/step.pl loaded ]
% (0 warnings)
true.

?- step::step_n(10, [1, 2, 3, 4, 5, 6, 7, 8], X).
340 15578 30816 46054 61292 76530 91768 107006 122244 137482 
X = [3, 6, 4, 4, 0, 6, 7, 8] .

?- ports_profiler::data.
------------------------------------------------------------------------------
Entity  Predicate               Fact  Rule  Call  Exit *Exit  Fail  Redo Error
------------------------------------------------------------------------------
step    calculate_digit_n/3        0    80    80     0    80     0     0     0
step    calculate_digit_n_/5       0   720   720     0   720     0     0     0
step    digit_m/3                  0   640   640    40   600     0     0     0
step    phase_step/2               0    10    10     0    10     0     0     0
step    phase_step_/5              0    90    90     0    90     0     0     0
step    step_n/3                   1    10    11     0    11     0     0     0
------------------------------------------------------------------------------
true.

The *Exit column is for non-deterministic exists from the procedure box. For help with the tool and with interpreting the table results, see https://logtalk.org/manuals/devtools/ports_profiler.html But is clear by just a glance to the table that both phase_step/2 and step_n/3 are non-deterministic.

Update

Note that tail call optimization (TCO) doesn't mean or require the predicate to be deterministic. In your case, TCO can be applied by a Prolog compiler as the last call in the rule for the step_n/3 predicate is call to itself. That means that a stack frame can be saved on that specific recursive call. It doesn't mean that there are no choice-points being created by what precedes the recursive call. Using once/1 (as you mention on the comments) simply discards the choice-point created when phase_step/2 is called as that predicate itself is non-deterministic. That's what the table shows. The step_n/3 predicate is also non-deterministic and thus calling it creates a choice-point when the first argument is 0, which happens when you call the predicate with a zero on the first argument or when the proof for the query reaches the base case on this recursive definition.

Teresitateressa answered 30/10, 2020 at 17:11 Comment(5)
I am trying out you logtalk program and could not understand why using once/1 is not making the program tail call optimizable. In normal swi-prolog it is being optimized <gist.github.com/k3ut0i/8e0db10915d5a59d13d79b5a3e5a2005>. The corresponding program in logtalk is still showing different stack frames and *Exit column <gist.github.com/k3ut0i/14d2e25b513a6b4645eb1782c8955436>, Why?Tideland
@Tideland Updated my answer to explain what's happening when you use once/1.Teresitateressa
I understand that phase_step/2 itself is non-deterministic. I am saying that swi-prolog and logtalk versions are non behaving similarly. Using once/1 I have TCO with swi-prolog but I am not getting TCO with logtalk.Tideland
You do get TCO with the Logtalk version. Simply load the object (without the debug(on) flag) and try the query: {step}, step::step_n(10, 1, X). You will get the same stack frame printed as expected for TCO. But using the ports_profiler tool the code is instrumented and, as with most debuggers (including Prolog ones), that prevents TCO.Teresitateressa
Thank you, I forgot about debug(on). And yes with debug(off), I do have TCO.Tideland

© 2022 - 2024 — McMap. All rights reserved.