What occurs-check optimizations is SWI Prolog using?
Asked Answered
L

2

6

To quote the SICStus Prolog manual:

The usual mathematical theory behind Logic Programming forbids the creation of cyclic terms, dictating that an occurs-check should be done each time a variable is unified with a term. Unfortunately, an occurs-check would be so expensive as to render Prolog impractical as a programming language.

However, I ran these benchmarks (The Prolog ones) and saw fairly minor differences (less than 20%) in SWI Prolog between the occurs check (OC) being on and off:

OC is off: :- set_prolog_flag(occurs_check, false). in .swiplrc (restarted)

?- run_interleaved(10).
% 768,486,984 inferences, 91.483 CPU in 91.483 seconds (100% CPU, 8400298 Lips)
true.

?- run(1).
'Program'            Time     GC
================================
boyer               0.453  0.059
browse              0.395  0.000
chat_parser         0.693  0.000
crypt               0.481  0.000
fast_mu             0.628  0.000
flatten             0.584  0.000
meta_qsort          0.457  0.000
mu                  0.523  0.000
nreverse            0.406  0.000
poly_10             0.512  0.000
prover              0.625  0.000
qsort               0.574  0.000
queens_8            0.473  0.000
query               0.494  0.000
reducer             0.595  0.000
sendmore            0.619  0.000
simple_analyzer     0.620  0.000
tak                 0.486  0.000
zebra               0.529  0.000
           average  0.534  0.003
true.

OC is on: :- set_prolog_flag(occurs_check, true). in .swiplrc (restarted)

?- run_interleaved(10).
% 853,189,814 inferences, 105.545 CPU in 105.580 seconds (100% CPU, 8083669 Lips)
true.

?- run(1).
'Program'            Time     GC
================================
boyer               0.572  0.060
browse              0.618  0.000
chat_parser         0.753  0.000
crypt               0.480  0.000
fast_mu             0.684  0.000
flatten             0.767  0.000
meta_qsort          0.659  0.000
mu                  0.607  0.000
nreverse            0.547  0.000
poly_10             0.541  0.000
prover              0.705  0.000
qsort               0.660  0.000
queens_8            0.491  0.000
query               0.492  0.000
reducer             0.867  0.000
sendmore            0.629  0.000
simple_analyzer     0.757  0.000
tak                 0.550  0.000
zebra               0.663  0.000
           average  0.634  0.003
true.

Are these benchmarks not representative of real-world usage? (I remember reading somewhere that they were specifically chosen to be "fairly representative") Is SWI Prolog implementing some optimization tricks, that SICStus people aren't aware of, that make the cost of OC minor? If so, are they published? (I found a bunch of papers about this from the '90s, but I don't know if they are state-of-the-art)

Lagena answered 6/1, 2021 at 17:12 Comment(10)
While I do not have a direct answer to your question, we've had the pleasure of getting pretty technical with Jan Wielemaker (creator of SWI Prolog) over on the SWI-Prolog Discourse (swi-prolog.discourse.group) :-) He's rather active so I can recommend reaching out there as well.Blamable
This is cross posted hereDrees
@MostowskiCollapse My version is 7.6.4 (Linux). current_prolog_flag(occurs_check, X) agrees with whatever's in ~/.swiplrc.Lagena
@MostowskiCollapse "Try this: ?- freeze(A,true),freeze(B,true),A-B=s(A)-n.. It will succeed." I see an error message in 7.6.4 (OC set to "error").Lagena
@MostowskiCollapse That succeeds. I wouldn't get hung up on 15% vs 40% (I'm using a Xeon from 6 years ago also). 40% is still not the difference between "practical" and "impractical" language.Lagena
@oguz, why did you remove the tag occurs-check in so many posts? That is a sensible tag.Paraph
@Paraph It doesn't have an excerpt and wiki, and was created and added to ~10 questions by the same user. Besides it doesn't mean anything in isolation, it is only meaningful when used with the prolog tag.Weirdo
@oguzismail: What you are doing here is vandalizing a small tag. And no, it is not "only meaningful" to Prolog. It is related to unification which appears in many areas.Paraph
@Paraph Not vandalizing, rather getting rid of it. At least add a tag excerpt, so that no one else attempts to delete the tag again.Weirdo
Now I got a simple test case where occurs check goes +100% for SWI-Prolog, see revised answer.Refract
P
5

The major optimization makes unification of local variables a constant operation.

Many abstract machines like the PLM, ZIP, WAM, VAM provide a special case for logic variables that cannot be subterm of some structure (called local variables). Unification with such variables does not require any occurs-check at all and thus can be constant. In this manner large terms can be "passed back" without an extra occurs-check required.

Without this optimization, the handling of grammars (for parsing a given list) gets an overhead quadratic in the number of tokens. Every time the "input list" is handed back (so, graphically speaking, every time you are crossing a comma after a non-terminal in the grammar body), the remaining input list needs to be scanned for the occurrence of that local variable. (It's better than quadratic in the number of characters, since regular expressions are mostly encoded tail-recursively).

This optimization was introduced 2007 in 5.6.39. It is surprising that your measurements show overheads even in cases like tak, where not a single structure is built at all. As far as I can remember, occurs-check unification in SWI 5.6.39 ran a tiny bit faster than rational tree unification (for simple cases) as (at that time) no extra setup was needed.

There is still ample room for many further occurs-check optimizations. But those will only happen, if people do use this feature. As for SWI, not much happened in the last 13 years.

But think-of-it: The very first Prolog, called Prolog 0 did support the occurs-check by default. But from Prolog I ("Marseille Prolog") on, only excuses (such as those you cite) were made. And at least, the standard did not rule out occurs-check unification as default by only defining NSTO executions and requiring unify_with_occurs_check/2 and acyclic_term/1. And now, Prologs like SWI, Tau, and Scryer provide it optionally via a flag.

A further optimization into the same direction is Joachim Beer's NEW_UNBOUND tag which avoids additionally occurs-checks of some heap-variables at the expense of a more complex scheme. See The Occur-Check Problem Revisited. JLP 5(3) 1988. And LNCS 404.

Paraph answered 3/2, 2021 at 12:23 Comment(7)
That flag (later with its name shortened) was proposed for standardization by ISO back in 2009 in the context of WG17 work on the Core standard revision. Hopefully more Prolog systems will eventually implement it.Agnosia
@PauloMoura: There is quite some reservation against a flag as it is pretty error prone if code that relies on occurs-check unification and code that relies on rational trees are mixed. One such example is CHR in SWI, which (in its current implementation since 10 years or so) relies on creating infinite trees and thus cannot be run together with that flag turned on. A possible (incomplete) solution might be call_with_occurs_check/1.Paraph
I'm aware of those issues. Coinduction is another example of code that breaks with such a flag turned on. But the implementation of the flag would/should allow to turn it on just for the computations that require the occurs check (for example, using a setup_call_cleanup/3 goal). This would useful notably for testing.Agnosia
Unfortunately setup_call_cleanup/3 for this purpose is too coarse. One would need effectively to maintain that flag for each and every invocation. But that means to impose some general overhead. And, there is not a single implementation to show that it is not that difficult.Paraph
Is this "major optimization" separate from the hashtable one? Do modern Prologs use the latter?Lagena
@MaxB: It has nothing to do with those cases. And seriously, difference lists are a major and not a "major" point of Prolog. Those other bad cases for occurs-check which have been mentioned in various posts all rely on the tree vs graph representation of terms where a naive traversal gets an exponential overhead. In early LISP parlance, these are blam lists. There, hashtables might be of interest. But not for a simple traversal, there, marking is much cheaper. And in general, factoring (basically the same problem as minimization of finite automata) will always speed up the occurs-check.Paraph
SWI-Prolog is especially stupid with pattern matching: github.com/SWI-Prolog/swipl-devel/issues/789Refract
R
3

Here is a test case where the occurs check doubles the time to execute a query. Take this code here, to compute a negation normal form. Since the (=)/2 is at the end of the rule, the visited compounds becomes quadratic:

/* Variant 1 */
norm(A, R) :- var(A), !, R = pos(A).
norm((A+B), R) :- !, norm(A, C), norm(B, D), R = or(C,D).
norm((A*B), R) :- !, norm(A, C), norm(B, D), R = and(C,D).
Etc..

We can compare with this variant where the (=)/2 is done first while the compound is not yet instantiated:

/* Variant 2 */
norm(A, R) :- var(A), !, R = pos(A).
norm((A+B), R) :- !, R = or(C,D), norm(A, C), norm(B, D).
norm((A*B), R) :- !, R = and(C,D), norm(A, C), norm(B, D).
Etc..

Here are some measurements for SWI-Prolog 8.3.19. For variant 1 setting the occurs check flag to true doubles the time needed to convert some propositional formulas from the principia mathematica:

/* Variant 1 */
/* occurs_check=false */
?- time((between(1,1000,_),test,fail;true)).
% 3,646,000 inferences, 0.469 CPU in 0.468 seconds (100% CPU, 7778133 Lips)
true.

/* occurs_check=true */
?- time((between(1,1000,_),test,fail;true)).
% 6,547,000 inferences, 0.984 CPU in 0.983 seconds (100% CPU, 6650921 Lips)
true.

On the other hand, moving (=)/2 to the front changes the picture favorably:

/* Variant 2 */
/* occurs_check=false */
?- time((between(1,1000,_),test,fail;true)).
% 3,646,000 inferences, 0.453 CPU in 0.456 seconds (99% CPU, 8046345 Lips)
true.

/* occurs_check=true */
?- time((between(1,1000,_),test,fail;true)).
% 6,547,000 inferences, 0.703 CPU in 0.688 seconds (102% CPU, 9311289 Lips)
true.

Open Source (pages no longer available, nor on archive.org):

Negation Normal Form, Non-Tail Recursive
https://gist.github.com/jburse/7705ace654af0df6f4fdd12eee80aaec#file-norm-pl

Negation Normal Form, Tail Recursive
https://gist.github.com/jburse/7705ace654af0df6f4fdd12eee80aaec#file-norm2-pl

193 propositional logic test cases from Principia.
https://gist.github.com/jburse/7705ace654af0df6f4fdd12eee80aaec#file-principia-pl

Refract answered 7/1, 2021 at 21:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.