What occurs-check optimizations is SWI Prolog using?

Asked 6/1, 2021 at 17:12 Answered 3/2, 2021 at 12:23

The usual mathematical theory behind Logic Programming forbids the creation of cyclic terms, dictating that an occurs-check should be done each time a variable is unified with a term. Unfortunately, an occurs-check would be so expensive as to render Prolog impractical as a programming language.

However, I ran these benchmarks (The Prolog ones) and saw fairly minor differences (less than 20%) in SWI Prolog between the occurs check (OC) being on and off:

OC is off: :- set_prolog_flag(occurs_check, false). in .swiplrc (restarted)

?- run_interleaved(10).
% 768,486,984 inferences, 91.483 CPU in 91.483 seconds (100% CPU, 8400298 Lips)
true.

?- run(1).
'Program'            Time     GC
================================
boyer               0.453  0.059
browse              0.395  0.000
chat_parser         0.693  0.000
crypt               0.481  0.000
fast_mu             0.628  0.000
flatten             0.584  0.000
meta_qsort          0.457  0.000
mu                  0.523  0.000
nreverse            0.406  0.000
poly_10             0.512  0.000
prover              0.625  0.000
qsort               0.574  0.000
queens_8            0.473  0.000
query               0.494  0.000
reducer             0.595  0.000
sendmore            0.619  0.000
simple_analyzer     0.620  0.000
tak                 0.486  0.000
zebra               0.529  0.000
           average  0.534  0.003
true.

OC is on: :- set_prolog_flag(occurs_check, true). in .swiplrc (restarted)

?- run_interleaved(10).
% 853,189,814 inferences, 105.545 CPU in 105.580 seconds (100% CPU, 8083669 Lips)
true.

?- run(1).
'Program'            Time     GC
================================
boyer               0.572  0.060
browse              0.618  0.000
chat_parser         0.753  0.000
crypt               0.480  0.000
fast_mu             0.684  0.000
flatten             0.767  0.000
meta_qsort          0.659  0.000
mu                  0.607  0.000
nreverse            0.547  0.000
poly_10             0.541  0.000
prover              0.705  0.000
qsort               0.660  0.000
queens_8            0.491  0.000
query               0.492  0.000
reducer             0.867  0.000
sendmore            0.629  0.000
simple_analyzer     0.757  0.000
tak                 0.550  0.000
zebra               0.663  0.000
           average  0.634  0.003
true.

Are these benchmarks not representative of real-world usage? (I remember reading somewhere that they were specifically chosen to be "fairly representative") Is SWI Prolog implementing some optimization tricks, that SICStus people aren't aware of, that make the cost of OC minor? If so, are they published? (I found a bunch of papers about this from the '90s, but I don't know if they are state-of-the-art)

Lagena answered 6/1, 2021 at 17:12 Comment(10)

While I do not have a direct answer to your question, we've had the pleasure of getting pretty technical with Jan Wielemaker (creator of SWI Prolog) over on the SWI-Prolog Discourse (swi-prolog.discourse.group) :-) He's rather active so I can recommend reaching out there as well. – Blamable 6/1, 2021 at 17:20

This is cross posted here – Drees 6/1, 2021 at 19:33

@MostowskiCollapse My version is 7.6.4 (Linux). current_prolog_flag(occurs_check, X) agrees with whatever's in ~/.swiplrc. – Lagena 9/1, 2021 at 20:25

@MostowskiCollapse "Try this: ?- freeze(A,true),freeze(B,true),A-B=s(A)-n.. It will succeed." I see an error message in 7.6.4 (OC set to "error"). – Lagena 9/1, 2021 at 20:48

@MostowskiCollapse That succeeds. I wouldn't get hung up on 15% vs 40% (I'm using a Xeon from 6 years ago also). 40% is still not the difference between "practical" and "impractical" language. – Lagena 9/1, 2021 at 22:43

@oguz, why did you remove the tag occurs-check in so many posts? That is a sensible tag. – Paraph 3/2, 2021 at 9:11

@Paraph It doesn't have an excerpt and wiki, and was created and added to ~10 questions by the same user. Besides it doesn't mean anything in isolation, it is only meaningful when used with the prolog tag. – Weirdo 3/2, 2021 at 9:27

@oguzismail: What you are doing here is vandalizing a small tag. And no, it is not "only meaningful" to Prolog. It is related to unification which appears in many areas. – Paraph 3/2, 2021 at 9:33

@Paraph Not vandalizing, rather getting rid of it. At least add a tag excerpt, so that no one else attempts to delete the tag again. – Weirdo 3/2, 2021 at 9:36

Now I got a simple test case where occurs check goes +100% for SWI-Prolog, see revised answer. – Refract 21/2, 2021 at 15:29

The major optimization makes unification of local variables a constant operation.

Many abstract machines like the PLM, ZIP, WAM, VAM provide a special case for logic variables that cannot be subterm of some structure (called local variables). Unification with such variables does not require any occurs-check at all and thus can be constant. In this manner large terms can be "passed back" without an extra occurs-check required.

Without this optimization, the handling of grammars (for parsing a given list) gets an overhead quadratic in the number of tokens. Every time the "input list" is handed back (so, graphically speaking, every time you are crossing a comma after a non-terminal in the grammar body), the remaining input list needs to be scanned for the occurrence of that local variable. (It's better than quadratic in the number of characters, since regular expressions are mostly encoded tail-recursively).

This optimization was introduced 2007 in 5.6.39. It is surprising that your measurements show overheads even in cases like tak, where not a single structure is built at all. As far as I can remember, occurs-check unification in SWI 5.6.39 ran a tiny bit faster than rational tree unification (for simple cases) as (at that time) no extra setup was needed.

There is still ample room for many further occurs-check optimizations. But those will only happen, if people do use this feature. As for SWI, not much happened in the last 13 years.

But think-of-it: The very first Prolog, called Prolog 0 did support the occurs-check by default. But from Prolog I ("Marseille Prolog") on, only excuses (such as those you cite) were made. And at least, the standard did not rule out occurs-check unification as default by only defining NSTO executions and requiring unify_with_occurs_check/2 and acyclic_term/1. And now, Prologs like SWI, Tau, and Scryer provide it optionally via a flag.

A further optimization into the same direction is Joachim Beer's NEW_UNBOUND tag which avoids additionally occurs-checks of some heap-variables at the expense of a more complex scheme. See The Occur-Check Problem Revisited. JLP 5(3) 1988. And LNCS 404.

Paraph answered 3/2, 2021 at 12:23 Comment(7)

That flag (later with its name shortened) was proposed for standardization by ISO back in 2009 in the context of WG17 work on the Core standard revision. Hopefully more Prolog systems will eventually implement it. – Agnosia 3/2, 2021 at 15:48

@PauloMoura: There is quite some reservation against a flag as it is pretty error prone if code that relies on occurs-check unification and code that relies on rational trees are mixed. One such example is CHR in SWI, which (in its current implementation since 10 years or so) relies on creating infinite trees and thus cannot be run together with that flag turned on. A possible (incomplete) solution might be call_with_occurs_check/1. – Paraph 3/2, 2021 at 17:35

I'm aware of those issues. Coinduction is another example of code that breaks with such a flag turned on. But the implementation of the flag would/should allow to turn it on just for the computations that require the occurs check (for example, using a setup_call_cleanup/3 goal). This would useful notably for testing. – Agnosia 3/2, 2021 at 17:47

Unfortunately setup_call_cleanup/3 for this purpose is too coarse. One would need effectively to maintain that flag for each and every invocation. But that means to impose some general overhead. And, there is not a single implementation to show that it is not that difficult. – Paraph 3/2, 2021 at 17:53

Is this "major optimization" separate from the hashtable one? Do modern Prologs use the latter? – Lagena 4/2, 2021 at 6:22

@MaxB: It has nothing to do with those cases. And seriously, difference lists are a major and not a "major" point of Prolog. Those other bad cases for occurs-check which have been mentioned in various posts all rely on the tree vs graph representation of terms where a naive traversal gets an exponential overhead. In early LISP parlance, these are blam lists. There, hashtables might be of interest. But not for a simple traversal, there, marking is much cheaper. And in general, factoring (basically the same problem as minimization of finite automata) will always speed up the occurs-check. – Paraph 4/2, 2021 at 8:54

SWI-Prolog is especially stupid with pattern matching: github.com/SWI-Prolog/swipl-devel/issues/789 – Refract 22/2, 2021 at 12:57

Here is a test case where the occurs check doubles the time to execute a query. Take this code here, to compute a negation normal form. Since the (=)/2 is at the end of the rule, the visited compounds becomes quadratic:

/* Variant 1 */
norm(A, R) :- var(A), !, R = pos(A).
norm((A+B), R) :- !, norm(A, C), norm(B, D), R = or(C,D).
norm((A*B), R) :- !, norm(A, C), norm(B, D), R = and(C,D).
Etc..

We can compare with this variant where the (=)/2 is done first while the compound is not yet instantiated:

/* Variant 2 */
norm(A, R) :- var(A), !, R = pos(A).
norm((A+B), R) :- !, R = or(C,D), norm(A, C), norm(B, D).
norm((A*B), R) :- !, R = and(C,D), norm(A, C), norm(B, D).
Etc..

Here are some measurements for SWI-Prolog 8.3.19. For variant 1 setting the occurs check flag to true doubles the time needed to convert some propositional formulas from the principia mathematica:

/* Variant 1 */
/* occurs_check=false */
?- time((between(1,1000,_),test,fail;true)).
% 3,646,000 inferences, 0.469 CPU in 0.468 seconds (100% CPU, 7778133 Lips)
true.

/* occurs_check=true */
?- time((between(1,1000,_),test,fail;true)).
% 6,547,000 inferences, 0.984 CPU in 0.983 seconds (100% CPU, 6650921 Lips)
true.

On the other hand, moving (=)/2 to the front changes the picture favorably:

/* Variant 2 */
/* occurs_check=false */
?- time((between(1,1000,_),test,fail;true)).
% 3,646,000 inferences, 0.453 CPU in 0.456 seconds (99% CPU, 8046345 Lips)
true.

/* occurs_check=true */
?- time((between(1,1000,_),test,fail;true)).
% 6,547,000 inferences, 0.703 CPU in 0.688 seconds (102% CPU, 9311289 Lips)
true.

Open Source (pages no longer available, nor on archive.org):

Negation Normal Form, Non-Tail Recursive
https://gist.github.com/jburse/7705ace654af0df6f4fdd12eee80aaec#file-norm-pl

Negation Normal Form, Tail Recursive
https://gist.github.com/jburse/7705ace654af0df6f4fdd12eee80aaec#file-norm2-pl

193 propositional logic test cases from Principia.
https://gist.github.com/jburse/7705ace654af0df6f4fdd12eee80aaec#file-principia-pl

Refract answered 7/1, 2021 at 21:42 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags