Prolog - handling binary data with DCGs
Asked Answered
P

4

10

It seems to me one ought to be able to process binary data with DCGs on a list of bytes. To make it work generally, though, one has to use bitwise operations which means is/2 is involved, which means instantiation order is an issue, which may confound using DCGs to both parse and generate. The idea here is to serialize/deserialize binary data but I think this example is simple enough to illustrate the problem.

Let me illustrate with some code. Suppose I have a binary protocol. I want to read two 4-bit integers from a byte. My naive self tries this:

two_four_bit_ints(High, Low) -->
  [B],
  {
    High is B >> 4,
    Low  is B /\ 0xF
  }.

This seems to work for parsing:

?- phrase(two_four_bit_ints(H,L), [255]).
H = L, L = 15.

?- phrase(two_four_bit_ints(H,L), [0]).
H = L, L = 0.

?- phrase(two_four_bit_ints(H,L), [15]).
H = 0,
L = 15.

?- phrase(two_four_bit_ints(H,L), [240]).
H = 15,
L = 0.

But this is not going to generate:

?- phrase(two_four_bit_ints(15,15), [X]).
ERROR: is/2: Arguments are not sufficiently instantiated

?- phrase(two_four_bit_ints(15,15), X).
ERROR: is/2: Arguments are not sufficiently instantiated

Not sure what to do. I'm bracing for someone to shout "Use clpfd" but it does not appear to support bit shift operations, and I'd be concerned about the performance effect of invoking such a powerful system in the middle of low-level code.

Since I don't see a lot of helpers for binary around, is there some other more preferred way of doing binary extraction/encoding in Prolog? I'm only using SWI for now so I'm happy to accept suggestions that are not portable to ISO, but if it is portable that's nice too. I was rather hoping to find someone had ported something like Erlang's bit syntax but didn't have any luck searching.

Podophyllin answered 31/10, 2014 at 21:35 Comment(5)
On the one hand you accurately complain that (is)/2 is moded, but on the other hand you are "concerned about the performance effect of invoking such a powerful system [as clpfd] in the middle of low-level code". You will have to choose. Shifting is related to multiplication, division and exponentiation...Murk
is/2 doesn't work for unbound terms on the right side. Just imagine that if it would, prolog would have to define (more than questionable) inverse functions for all arithmetic operations. Especially for operators like >>/2 this seems odd (What should A >> B generate? and in what order?).Sifuentes
@PatrickJ.S. I am well aware of is/2's limitations. I am not saying I require is/2. I am saying I want to parse and generate and that I suspect clpfd would be too heavy in practice. I'm happy to be proven wrong.Podophyllin
@lurker: While your approach might work, it opens up a wide field for "unrelational" errors. Contrast this to clpfd, where such errors cannot happen at all.Murk
I have seen in library code a few places where the reading and the generating DCGs are separate for situations as the one you have. In other words, you have one one DCG which first maps and then converts (parsing), and another which first converts and then maps (writing).Inflexible
M
5

Better support for binary data would be a very nice to have feature in Prolog. However, the relational nature of Prolog makes a general solution quite difficult. So you are faced with a serious decision: Either, you map some library of other languages directly into Prolog, thereby ignoring Prolog's relational nature (and ideally warding off all borders with clean instantiation errors) or you opt for a more relational approach.

When opting for a more relational solution you might either use existing libraries as library(clfd) or implement the entire constraint mechanism yourself. With some clever restrictions you might get away with a much simpler approach, but I doubt that this will work out. The tradeoffs are in areas of correctness and efficiency. Note that the clpfd systems in SICStus or SWI needed literally decades to reach their level of quality.

No matter which way you will take, some remarks:

Efficiency of library(clpfd)

library(clpfd) in SWI-Prolog has been specifically optimized to be (in some cases) comparable in performance to the traditional (is)/2. To see this, compile the rule:

list_len([_|Es], N0) :- N0 #> 0, N1 #= N0-1, list_len(Es, N1).

and look at the generated code with listing(list_len):

list_len([_|C], A) :-
    (   integer(A)
    ->  A>=0+1
    ;   clpfd:clpfd_geq(A, 1)
    ),
    (   integer(A)
    ->  B is A+ -1
    ;   clpfd:clpfd_equal(B, A-1)
    ),
    list_len(C, B).

Effectively, the built-ins for evaluable expressions like (is)/2 and (>=)/2 are used for those cases that correspond to those primitive operations directly.

To simulate bitshift operations fully, you would, however, need (div)/2 which is currently only supported by SICStus' library(clpfd) but not SWI's. So some extra headache would await you here. But as long as you are using unsigned nonnegative values, no problem should arise. For general shifts you will need (^)/2 which is supported by SWI - but not by SICStus.

And here is the CLPFD-version:

two_four_bit_ints(High, Low) -->
  [B],
  { B in 0..255,
    Low in 0..15,
    High in 0..15,
    B #= Low + High*16
  }.

Note that your original program unintentionally defines behavior in rather unintended situations, like B = -1234, B = 1+1. You could add between(0, 255, B) but then you would get combinatorial enumerations (read: explosions) easily.

The current implementations of library(clpfd) might be improved significantly for such cases even further, but in order to improve them, one must use them!

I/O and pio

ISO Prolog supports elementary I/O operations on

  • bytes (get_byte/1),
  • codes (get_code/1), and
  • characters (get_char/1).

If you want to use DCGs, you will certainly want to use library(pio). Currently, SWI's library(pio) only supports codes.

Murk answered 31/10, 2014 at 23:14 Comment(9)
Last I checked I couldn't use library(pio) with streams, so I couldn't use pipes. Do you know a good work-around for this?Inflexible
This answer makes me want to try out CLP(FD) right away, thanks!Stay
@Boris For parsing there is phrase_from_stream/2 in library pio. For generating you need to first do phrase/2 to codes and then with_output_to/2 to stream. Since a DCG can backtrack it cannot generate to stream directly.Stay
@Boris: There were plans for streams and output in SWI. Unfortunately SWI took another turn.Murk
@WouterBeek: phrase_from_stream/2 requires a seekable device. And for output: There was even a version doing this transparently. Backtracking does not complicate things too much.Murk
@WouterBeek Yes, phrase_from_stream does not work on standard input, for example.Inflexible
@false: Fair points! Is the more advanced library still available somewhere?Stay
@WouterBeek: This intermediary version of 2008-07-15 pure_output should be re-exported by pio. Be warned, it's pre-meta_predicate and depends on whatever. There are a lot of things to make this cleaner, more conforming, with better update semantics etc. If you are interested, why not ask a question? As long as it fits into SO. Some more on pio - I did not expect that this would be the last thing for some time...Murk
@WouterBeek: The slides contain many of the TODOs.Murk
V
4

In SWI-Prolog, many bitwise operations are now supported in CLP(FD).

Please try the following with the latest git version. It suffices to replace (is)/2 by (#=)/2 in your code:

two_four_bit_ints(High, Low) -->
  [B],
  {
    High #= B >> 4,
    Low  #= B /\ 0xF
  }.

Your first 4 example queries work exactly as previously, and should be acceptably efficient:

?- phrase(two_four_bit_ints(H,L), [255]).
H = L, L = 15.

?- phrase(two_four_bit_ints(H,L), [0]).
H = L, L = 0.

?- phrase(two_four_bit_ints(H,L), [15]).
H = 0,
L = 15.

?- phrase(two_four_bit_ints(H,L), [240]).
H = 15,
L = 0.

Note that CLP(FD) constraints are directly compiled to low-level predicates if they are used in modes that are also supported by primitive arithmetic.

One benefit of using CLP(FD) constraints is that the other 2 queries now also work:

?- phrase(two_four_bit_ints(15,15), [X]).
15#=X/\15,
15#=X>>4.

?- phrase(two_four_bit_ints(15,15), X).
X = [_G1048],
15#=_G1048/\15,
15#=_G1048>>4.

At least you can use this for further reasoning.

In fact, the most general query now also works:

?- phrase(two_four_bit_ints(A,B), X).
X = [_G1270],
A#=_G1270>>4,
B#=_G1270/\15.

It may be possible to perform more powerful propagations in some cases. I will look into this as soon as the need arises.

Vacuva answered 19/6, 2016 at 17:32 Comment(0)
I
1

To be a bit more explicit than I can in a comment, taking as an example parsing an integer and converting an integer back to a string, you could say:

foo_parse(Number) -->
    digits(Ds),
    { number_codes(Number, Ds) }.

foo_generate(Number) -->
    { number_codes(Number, Ds) },
    Ds.

You could avoid this by having a var(Number) "guard" for the first of the two clauses, complete with its own cut etc, but I am not sure it gets much easier to write, read, or use. The two DCGs are going to be probably called from different contexts anyway.

So for your case the generate would be something like:

fourbit_fourbit_generate(High, Low) -->
    { D is (High << 4) + Low },
    [D].

Just my opinion.

Inflexible answered 31/10, 2014 at 22:50 Comment(0)
R
1

this can work

two_four_bit_ints(High, Low) -->
 [B],
  {
     integer(B) % suggestion by @false, instead of nonvar(B)
  -> High is B >> 4,
     Low  is B /\ 0xF
  ;  B is (High << 4) \/ Low
  }.

remember, a DCG is just plain Prolog, but you can see them like executable semantic grammars.

Rubirubia answered 31/10, 2014 at 23:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.