I am writing a program to determine if the Levenshtein distance between two strings is exactly 2 in linear time.
I have an algorithm which does this. I use the naive recursive approach which scans the string and when it finds a discrepancy it branches into 3 options trying either to delete, insert or replace. However I make a modification that if we exceed 2 as our total we just give up on that branch. This limits the total number of branches to 9 and makes the algorithm linear.
Here's my initial implementation:
diffIs2 x y =
2 == go 0 x y
where
go total xs@(x1:xs') ys@(y1:ys')
| x1 == y1
=
go total xs' ys'
| total < 2
=
minimum $ zipWith (go $ total + 1)
[ xs', xs , xs' ]
[ ys , ys', ys' ]
go total xs ys =
total + length xs + length ys
Testing confirms that this runs in linear time.
I also have a second similar program which as far as I can tell should be linear as well.
The idea here is to use short circuiting and lazy evaluation to limit the number of branches evaluated instead. We always allow a branching however, when we branch we take the minimum of the result of each and 3. That way if the branch total exceeds 3 the short circuit should prevent further evaluation.
We have to implement a natural number type for partial evaluation.
data Nat
= Zero
| Succ Nat
deriving
( Eq
, Ord
)
natLength :: [a] -> Nat
natLength [] =
Zero
natLength (a : b) =
Succ (natLength b)
nat3 =
Succ $ Succ $ Succ Zero
diffIs2 x y =
Succ (Succ Zero) == go x y
where
go xs@(x1:xs') ys@(y1:ys')
| x1 == y1
=
go xs' ys'
| otherwise
=
Succ $
minimum $
map (min nat3) $
zipWith go
[ xs', xs , xs' ]
[ ys , ys', ys' ]
go xs ys =
natLength $ xs ++ ys
Testing this reveals it doesn't run in linear time. Something makes it exponential but I can't figure out what. The short circuiting does work as expected. We can show this with the following program which halts in finite time because of the short circuiting of min
:
f = Succ f
main =
print $ (Succ Zero) == min (Succ Zero) f
but when put together in the algorithm it doesn't seem to work as I expect.
Why is the second code exponential? What is my misconception?
f = Succ f
is an infinite recursion ofSucc f
, soSucc (Succ (Succ (Succ (...))))))
– Antirrhinumminimum
is not lazy enough to enable short-circuiting.minimum ys
will fully scanys
in all cases. Most importantlyminimum [Succ f, Succ f]
is notSucc f
(i.e.,f
), wheref
is your infinity value. You might need a customminimum
which will returnSucc
early, before digging deep into the values, when possible (i.e. when noZero
is found). – BourassaSucc Zero `min` (Succ undefined)
diverges, even though we know it should beSucc Zero
. could it be that yournat3
actually sets the branching factor of 4? could it be that, giving you this big constant factor slowdown, or have you actually absolutely positively detected the exponential growth? – Chrisman