Implementing XOR function with Prolog CLPFD for 32-bit numbers

Asked 17/5, 2014 at 21:39 Answered 3/4, 2021 at 22:30

Solved prolog swi-prolog clpfd bitwise-xor

I try to implement efficient exclusive-or (XOR) in Prolog CLPFD. This should be simple predicate like:

xor(A, B, AxorB).

A, B, AxorB are natural numbers (with 0) and AxorB is a result of A xor B.

My main problem is with efficiency. Firstly, I wasn't able to find any way to XOR two number without breaking those numbers into separate parts that could be further processed/constrained, and the process of breaking those numbers (creating proper constraints and then resolving them) is taking some processing time. Secondly, I wans't able to come up with any efficient way to "simulate" XOR functions on natural numbers other than presented in the second code below.

Lets start from my first code. This is the most simple XOR implementation possible and it works only for 1 bit values (0 and 1):

xor_1bit_values(A, B, AxorB) :-
    AxorB #= (A + B) mod 2.

To use it for numbers larger than 1, numbers must be broken into bits:

xor_number(A, B, Result, Bits) :-
    Count is Bits - 1,
    xor_number(A, B, Result, Count, 0).
xor_number(A, B, Result, 0, Sum) :-
    xor_1bit_values(A, B, Xor),
    Result #= Xor + Sum.
xor_number(A, B, Result, Count, Sum) :-
    P is 2^Count,
    X #= A / P,
    Y #= B / P,
    xor_1bit_values(X, Y, Tmp),
    NewSum #= Sum + P*Tmp,
    NewCount is Count - 1,
    xor_number(A, B, Result, NewCount, NewSum).

Sample input and output:

?- time(xor_number(123456789, 987654321, R, 32)).
% 943 inferences, 0.000 CPU in 0.001 seconds (0% CPU, Infinite Lips)
R = 1032168868

Now, this is too slow for my purposes, as in my code I have sometimes need to guess A and B when I have AxorB where all of these should be 32-bit numbers. And for numbers that need more than 10 bits this goes into literal millions of inferences that seem to increase expotentially. And I use the best labeling strategies, XOR arguments swapping and other tricks to speed up calculations.

So, I tried to do some maths. What I devised is XOR function for 2-bit values (0, 1, 2, 3):

xor_2bit_values(A, B, Result) :-
    Result #= ((A + B*((-1)^A)) mod 4).

To use it in numbers larger than 3 there is code similar to what I presented before:

xor_number2(A, B, Result, Bits) :-
    Count is (Bits / 2) - 1,
    xor_number2(A, B, Result, Count, 0).
xor_number2(A, B, Result, 0, Sum) :-
    xor_2bit_values(A, B, Xor),
    Result #= Xor + Sum,
    !.
xor_number2(A, B, Result, Count, Sum) :-
    P is 4^Count,
    X #= A / P,
    Y #= B / P,
    xor_2bit_values(X, Y, Tmp),
    NewSum #= Sum + P*Tmp,
    NewCount is Count - 1,
    xor_number2(A, B, Result, NewCount, NewSum).

This seems to work nearly 50% faster than the first code. But still, two-fold difference is still too small for me.

So, my question for you is this: how can I implement efficient XOR for 32-bit numbers? If this is not possible on modern machines and you can prove it by some sort of calcucation then it is also a nice answer to my question. Eventually, how can I best improve my code? Maybe you have some ideas how to deal with numbers without breaking them apart or how to XOR numbers in other way?

Additional info: If it happens to you to try my code to guess two from three arguments or XOR, then because of possibility to freely swap arguments of that functions (which comes from its mathematical properties) I recommend setting A as bound variable and B and AxorB as unbound. CLPFD seems to work fastest that way. Also, the best labeling strategy would be labeling([bisect], [B,AxorB].

Institutionalism answered 17/5, 2014 at 21:39 Comment(1)

Study the source: There are instructions how to make such extensions efficiently. – Crotch 17/5, 2014 at 21:45

Maybe it wasn't available then but now, we can do this:

Y in 0..5, X #= Y xor 1, label([Y]).

From the docs, it's written that:

The bitwise operations ()/1, (/)/2, (/)/2, (>>)/2, (<<)/2, lsb/1, msb/1, popcount/1 and (xor)/2 are also supported.

See if you can adapt this for your purposes.

Sexennial answered 3/4, 2021 at 22:30 Comment(0)

I think I would try to precompute some table of 'bit chunks', and then, using modulo and division (both supported operations), would do N lookups into the table. The idea it's that a lookup could work faster than the (huge!) arithmetic expansion performed by the library. This is the usual 'trade space for time' trick.

/** <module> bits_clpfd
 *
 *  naive implementation of basic bit operations on constrained variables
 *  --------
 *
 *  source file /home/carlo/prolog/bits_clpfd.pl
 *  created at dom mag 18 07:57:03 2014
 *
 *  @author carlo
 *  @version 0.9.9
 *  @copyright carlo
 *  @license LGPL v2.1
 */

:- module(bits_clpfd,
    [bits_clpfd_prepare_lut/2
    ]).

:- use_module(library(clpfd)).

:- dynamic lut_and_or_xor/5.
:- dynamic chunk_size/2.

%% bits_clpfd_prepare_lut(Bits, Max) is det.
%
%  setup the lookup table for basic most operations on constrained variables
%   the cost is mainly controlled by Bits, being the LUT size 2^(Bits*2)
%
%  @arg Bits how many bits to store 
%  @arg Max describe Max
%
bits_clpfd_prepare_lut(Bits, BMax) :-
    ( nonvar(Bits) ; Bits = 4 ),
    ( nonvar(BMax) ; BMax = 32 ),

    retractall(chunk_size(_, _)),
    Max is 1 << BMax,
    assert(chunk_size(Bits, Max)),

    retractall(lut_and_or_xor(_,_, _,_,_)),
    N is (1 << Bits) - 1,
    forall((between(0, N, A), between(0, N, B)), (
        And is A /\ B,
        Or  is A \/ B,
        Xor is A xor B,
        assertz(lut_and_or_xor(A,B, And,Or,Xor))
    )).

%% xor_clpfd(A, B, C) is nondet.
%
%  naive constraint A xor B #= C
%
%  @arg A constrained variable
%  @arg B constrained variable
%  @arg C constrained variable
%
xor_clpfd(A, B, C) :-
    maplist(check_domain_range, [A,B,C]),
    split_apply_xor(1, A, B, C).

split_apply_xor(L, A, B, C) :-
    chunk_size(NBits, Max),
    (   L < Max
    ->  Mod is (2 << NBits),
        Am #= A mod Mod,
        Bm #= B mod Mod,
        Cm #= C mod Mod,
        lut_and_or_xor(Am, Bm, _, _, Cm),
        Ad #= A / Mod,
        Bd #= B / Mod,
        Cd #= C / Mod,
        M is L << NBits,
        split_apply_xor(M, Ad, Bd, Cd)
    ;   true
    ).

check_domain_range(V) :-
    chunk_size(_, Max),
    assertion((fd_dom(V, Inf .. Sup), Inf>=0, Sup < Max)).

:- begin_tests(bits_clpfd).

test(1) :-
    bits_clpfd_prepare_lut(2, 4),
    Vs = [A,B,C], Vs ins 0..15,
    A #= 1, B #= 1, C #= 0,
    xor_clpfd(A, B, C).

:- end_tests(bits_clpfd).

test

?- run_tests(bits_clpfd).
% PL-Unit: bits_clpfd 
Warning: /home/carlo/prolog/bits_clpfd.pl:83:
    PL-Unit: Test 1: Test succeeded with choicepoint
 done
% test passed
true.

anyway, this is a naive approach, the right one should be to compile your own run_propagator/2. But I've never done it...

Group answered 17/5, 2014 at 22:49 Comment(0)