Minimize a DFA with don't care transitions
Asked Answered
F

2

7

I have a DFA (Q, Σ, δ, q0, F) with some “don't care transitions.” These transitions model symbols which are known not to appear in the input in some situations. If any such transition is taken, it doesn't matter whether the resulting string is accepted or not.

Is there an algorithm to compute an equivalent DFA with a minimal amount of states? Normal DFA minimisation algorithms cannot be used as they don't know about “don't care” transitions and there doesn't seem to be an obvious way to extend the algorithms.

Frisco answered 3/7, 2019 at 12:48 Comment(13)
If you don't care about the "don't care" transitions, then why can't you just remove them?Subternatural
@MattTimmermans Because a DFA has a transition for every combination of state and symbol. You can't have “no transition” and the minimization algorithms expect this to be the case.Frisco
If the symbol never occurs(in a state) you can remove the transition for that {state,symbol}(or:you could replace al the unreachable states by one or more error states)Erg
@Erg And replace it with what? Recall, the transition function is complete and must contain an entry for each combination of state and symbol. If I was to change that, the question how to adapt DFA minimization algorithms stays.Frisco
The transition can use a smaller domain than the Carthesian product {states X symbols}Erg
Doesn't Brzozowski's algorithm still work?Pyosis
@Erg Then the Myhill-Nerode relation is either no longer an equivalence relation (if we take “no transition” as equivalent to any other transition) or it marks states with a missing transition as distinct from all states where this transition is present, negating the whole point of “don't care” transitions. How do you fix this?Frisco
@harold Actually, on second thought, I don't think Brzozowski's algorithm is going to work. How would you model “don't care” transitions for it to be used?Frisco
@Frisco well I thought, for a don't-care edge, don't add its reversal to the NFA. It's still an NFA with that, so it doesn't immediately violate a precondition.. I'm not entirely sure what it does to the result thoughPyosis
@harold That would be like turning all “don't care” transitions into “reject” transitions. For the DFA I have, I know that it cannot be minimised any further if I do exactly this. I thought about this, too, when I said that it might be doable, but on second thought I think it doesn't work.Frisco
(Sorry if this comment is naive, I studied DFA 40 years ago) I don't see the difference with what you call "a DFA which doesn't care some transitions" and an incomplete DFA. The usual way to complete a DFA is to add a sink state corresponding to the missing transitions. You can minimize such a completed DFA. Then you can remove the sink state from the resulting minimized DFA. In your previous comment, you mentioned it is not optimal for your DFA. I am surprised. Could you provide a DFA example where such a simple method will fail?Nonsectarian
@Nonsectarian This is equal to rejecting all "don't care" strings. I'd be surprised if this among all possible choices would be best. For an example: consider a DFA matching a single word of length n and noot caring about all others. The "sink state" approach generates a DFA with n + 2 states while accepting all don't care strings results in a DFA with a single state that just accepts everything.Frisco
@Nonsectarian Note that simply accepting all don't care strings doesn't work either: suppose you have a DFA that rejects all strings except for one word of n characters. If the word is matched, a “don't care” transition occurs. If we add an accepting sink to the automaton and minimize it, the result has n + 1 states. If we instead reject all words from this don't care, the automaton has a single state, rejecting all strings.Frisco
Q
5

I think this problem is NP-hard (more on that in a bit). This is what I'd try.

  1. (Optional) Preprocess the input via the usual minimization algorithm with accept/reject/don't care as separate outcomes. (Since don't care is not equivalent to accept or reject, we get the Myhill–Nerode equivalence relation back, allowing a variant of the usual algorithm.)

  2. Generate a conflict graph as follows. Start with all edges between accepting and rejecting states. Take the closure where we iteratively add edges q1—q2 such that there exists a symbol s for which there exists an edge σ(q1, s)—σ(q2, s).

  3. Color this graph with as few colors as possible. (Or approximate.) Lots and lots of coloring algorithms out there. PartialCol is a good starting point.

  4. Merge each color class into a single node. This potentially makes the new transition function multi-valued, but we can choose arbitrarily.

With access to an alphabet of arbitrary size, it seems easy enough to make this reduction to coloring run the other way, proving NP-hardness. The open question for me is whether having a fixed-size alphabet constrains the conflict graph in such a way as to make the resulting coloring instance easier somehow. Alas, I don't have time to research this.

Quentinquercetin answered 6/7, 2019 at 13:3 Comment(3)
There are potentially many DC (or sink) states, one for each dont care case. Could you detail how these DC states are handled in step 2? Besides, it would be nice if you could illustrate your answer with a small exampleNonsectarian
@Nonsectarian DC states exist initially as isolated vertices, but they end up with an edge to each other vertex whenever there exists a string that can distinguish the two.Quentinquercetin
Thanks for pointing this out. This is very similar to the NP-completeness proof I found in literature.Frisco
P
1

I believe a slight variation of the normal Moore algorithm works. Here's a statement of the algorithm.

Let S be the set of states.
Let P be the set of all unordered pairs drawn from S.
Let M be a subset of P of "marked" pairs.
Initially set M to all pairs where one state is accepting and the other isn't. 
Let T(x, c) be the transition function from state x on character c.
Do
  For each pair z = <a, b> in P - M
    For each character c in the alphabet
      If <T(a, c), T(b, c)> is in M
        Add z to M and continue with the next z
Until no new additions to M

The final set P - M is a pairwise description of an equivalence relation on states. From it you can create a minimum DFA by merging states and transitions of the original.

I believe don't care transitions can be handled by never marking (adding to M) pairs based on them. That is, we change one line:

      If T(a, c) != DC and T(b, c) != DC and <T(a, c), T(b, c)> is in M   

(Actually in an implementation, no real algorithm change is needed if DC is a reserved value of type state that's not a state in the original FA.)

I don't have time to think about a formal proof right now, but this makes intuitive sense to me. We skip splitting equivalence classes of states based on transitions that we know will never occur.

The thing I still need to prove to myself is whether the set P - M is still a pairwise description of an equivalence relation. I.e., can we end up with <a,b> and <b,c> but not <a,c>? If so, is there a fixup?

Phobe answered 4/7, 2019 at 0:41 Comment(3)
It's no longer an equivalence relation: consider a language matching xab, not caring about yab and rejecting zab. Both transitions from xa to xab and from za to zab relate to ya to yab, but they do ot relate to each other. This is what I meant when I said that the Myhill-Nerode relation is no longer an equivalence relation if you extend it to don't cares the obvious way (i.e. the way you chose, too).Frisco
@Frisco Haha. Thanks. So the "fixup" would seem to be finding a partition into maximal cliques, which is NP hard, isn't it? Yikes.Phobe
Yeah. I found an NP-completeness proof by reduction to graph-colouring in the meanwhile.Frisco

© 2022 - 2024 — McMap. All rights reserved.