Why does the second version of this run in exponential time?
Asked Answered
W

1

6

I am writing a program to determine if the Levenshtein distance between two strings is exactly 2 in linear time.

I have an algorithm which does this. I use the naive recursive approach which scans the string and when it finds a discrepancy it branches into 3 options trying either to delete, insert or replace. However I make a modification that if we exceed 2 as our total we just give up on that branch. This limits the total number of branches to 9 and makes the algorithm linear.

Here's my initial implementation:

diffIs2 x y =
  2 == go 0 x y
  where
    go total xs@(x1:xs') ys@(y1:ys')
      | x1 == y1
      =
        go total xs' ys'
      | total < 2
      =
        minimum $ zipWith (go $ total + 1)
          [ xs', xs , xs' ]
          [ ys , ys', ys' ]
    go total xs ys =
      total + length xs + length ys

Testing confirms that this runs in linear time.

I also have a second similar program which as far as I can tell should be linear as well.

The idea here is to use short circuiting and lazy evaluation to limit the number of branches evaluated instead. We always allow a branching however, when we branch we take the minimum of the result of each and 3. That way if the branch total exceeds 3 the short circuit should prevent further evaluation.

We have to implement a natural number type for partial evaluation.

data Nat
  = Zero
  | Succ Nat
  deriving
    ( Eq
    , Ord
    )

natLength :: [a] -> Nat
natLength [] =
  Zero
natLength (a : b) =
  Succ (natLength b)

nat3 =
  Succ $ Succ $ Succ Zero

diffIs2 x y =
  Succ (Succ Zero) == go x y
  where
    go xs@(x1:xs') ys@(y1:ys')
      | x1 == y1
      =
        go xs' ys'
      | otherwise
      =
        Succ $
          minimum $
            map (min nat3) $
              zipWith go
                [ xs', xs , xs' ]
                [ ys , ys', ys' ]
    go xs ys =
      natLength $ xs ++ ys

Testing this reveals it doesn't run in linear time. Something makes it exponential but I can't figure out what. The short circuiting does work as expected. We can show this with the following program which halts in finite time because of the short circuiting of min:

f = Succ f

main =
  print $ (Succ Zero) == min (Succ Zero) f

but when put together in the algorithm it doesn't seem to work as I expect.

Why is the second code exponential? What is my misconception?

Winterize answered 2/9, 2021 at 12:32 Comment(5)
f = Succ f is an infinite recursion of Succ f, so Succ (Succ (Succ (Succ (...))))))Antirrhinum
@WillemVanOnsem Yes, that is the intention. Because of lazy evaluation that program halts in finite time while if it were strict it would loop forever.Winterize
I am unsure, but I'd guess minimum is not lazy enough to enable short-circuiting. minimum ys will fully scan ys in all cases. Most importantly minimum [Succ f, Succ f] is not Succ f (i.e., f), where f is your infinity value. You might need a custom minimum which will return Succ early, before digging deep into the values, when possible (i.e. when no Zero is found).Bourassa
Succ Zero `min` (Succ undefined) diverges, even though we know it should be Succ Zero. could it be that your nat3 actually sets the branching factor of 4? could it be that, giving you this big constant factor slowdown, or have you actually absolutely positively detected the exponential growth?Chrisman
@WillNess I can't be certain of much with asymptotic complexity, however the growth is more consistent with exponential complexity than linear complexity, and I can say that having run a version of algorithm 1 with a branching factor of 4 that it is much faster than the below algorithm.Winterize
A
9

While the default min is lazy enough for your simple example at the end of your question, it is not as lazy as we would hope:

ghci> let inf = Succ inf
ghci> inf `min` inf `min` Zero == Zero
^CInterrupted.

To fix it we need a lazier min:

min' :: Nat -> Nat -> Nat
min' Zero _ = Zero
min' _ Zero = Zero
min' (Succ x) (Succ y) = Succ (min' x y)

The big difference is that now we can get a partial result before evaluating the arguments completely:

min (Succ undefined) (Succ undefined) === undefined
min' (Succ undefined) (Succ undefined) === Succ (min' undefined undefined)

Now we can use it as follows:

diffIs2 :: Eq a => [a] -> [a] -> Bool
diffIs2 x y = Succ (Succ Zero) == go x y
  where
    go xs@(x1:xs') ys@(y1:ys')
      | x1 == y1 = go xs' ys'
      | otherwise = Succ $ go xs' ys `min'` go xs ys' `min'` go xs' ys'
    go xs ys = natLength $ xs ++ ys

Note that you do not even really need the extra min' nat3, because the top level comparison will only force the first three Succs anyway.

Aircondition answered 2/9, 2021 at 14:25 Comment(2)
I see how min is not as lazy as it could be and I see how the fix works, but I am still struggling to understand exactly why my code doesn't work. In the provided code, since min nat3 is mapped across the list, the code never taking the min in any case where both inputs might be longer than 3. As demonstrated by the example min nat3 won't evaluate the entire expression.Winterize
@ÉamonnOlive that is a good question. I believe it is because each branch will itself branch and in that branch the limit of 3 is reset. And the Succs are not immediately propagated, only a single Succ is propagated and it is only propagated a single level. So it will only reach the limit of 3 once it reaches the end of the input. It would be nice to show how this evaluates in practice, but it gets very messy very quickly.Aircondition

© 2022 - 2024 — McMap. All rights reserved.