Proof of correct of the dynamic programming approach to min edit distance

Asked 7/3, 2016 at 3:55 Answered 8/3, 2024 at 0:57

Solved algorithm language-agnostic dynamic-programming

To calculate min edit distance (the minimum amount of insertions, deletions and substitutions required to transform one word to another), a dynamic programming solution is based on the recurrence relation, where the last character of both string is examined. The details are in https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm.

The description of this algorithm is everywhere on the Internet when it comes to edit distance, but all of them just asserts its correctness without proof. By definition of edit distance, you can insert, delete or substitute characters in the middle, not just at the end. Then how do you prove that this recurrence relation actually holds?

Earsplitting answered 7/3, 2016 at 3:55 Comment(1)

You can view DP as a way to speed up recursion, and the easiest way to prove a recursive algorithm correct is nearly always by induction: Show that it's correct on some small base case(s), and then show that, assuming it is correct for a problem of size n, it is also correct for a problem of size n+1. This way the proof closely mirrors the recursive structure. – Durrace 7/3, 2016 at 8:28

Use Induction to Prove Recursive Algorithms Correct

First, as I said in the comment, you can view dynamic programming as a way to speed up recursion, and the easiest way to prove a recursive algorithm correct is nearly always by induction: Show that it's correct on some small base case(s), and then show that, assuming it is correct for a problem of size n, it is also correct for a problem of size n+1. This way the proof closely mirrors the recursive structure.

The usual recursion for this problem breaks the problem "Find the minimum cost to edit string A into string B" into the (|A|+1)(|B|+1) subproblems "Find the minimum cost to edit the first i characters of string A into the first j characters of string B", for all 0 <= i <= |A| and 0 <= j <= |B|.

Choosing Base Cases

Usually with induction, we can pick a small number of simple base cases (perhaps just one), show that we can easily compute the correct answers for them, and it's obvious how the correctness of all other cases will be implied by the correctness of the base cases, because regardless of what case we start with, there will be just a single "chain" of assumptions that needs to be satisfied, and this chain clearly must end at one of our base cases. However for this particular problem, to show that the algorithm solves the (i, j) subproblem optimally, we need to first assume that it solves the (i-1, j), (i, j-1) and (i-1, j-1) subproblems optimally (since if any of the answers to those subproblems was incorrect, it could easily compute a totally wrong answer for the (i, j) subproblem). This will require a more complicated induction than usual: instead of a single "chain" of assumptions that need to be satisfied, we now have a branching "tree" of assumptions, with (up to) 3 children at each point. We need to pick base cases in such a way that for any (i, j), this entire tree of assumptions eventually "stops", i.e., every path in it eventually hits a base case where its assumptions are satisfied.

To recap: To prove our solution to (i, j) optimal, we must assume we have optimal solutions for (i-1, j), (i, j-1), and (i-1, j-1); to satisfy that assumption for, say, (i-1, j) (that is, to prove our solution to (i-1, j) is optimal), we need to assume we have optimal solutions for (i-2, j), (i-1, j-1) and (i-2, j-1), etc., etc. How to choose base case(s) that will work? While traversing down this tree, i.e., in going from proving our solution to the subproblem (i, j) correct to proving our solution to any one of its "child" subproblems (i', j') correct, we notice that:

At least one of i' < i or j' < j holds.
We never "skip over" subproblems -- that is, we never have i-i' >= 2, or j-j' >= 2.

Basically, if we take a single step down this tree, at least one of our two "subproblem co-ordinates" (i or j) decreases, but never by more than 1. What that means is that if we keep descending through the tree, then regardless of which particular "children" subproblems we pick on the way down, we must eventually hit a subproblem with a 0 for (at least) one of its co-ordinates -- i.e. a (i'', 0) subproblem for some 0 <= i'' <= |A| or a (0, j'') subproblem for some 0 <= j'' <= |B|. And what that means is that if we make those subproblems our base cases, we can ensure that every path in the induction tree hits a base case where its assumptions are satisfied and can therefore stop.

Luckily, these base cases are indeed easy to compute optimal answers to. Consider a problem (i, 0): This problem asks for the minimum cost required to change the first i characters of string A into the first 0 characters of string B. Clearly the best (only!) way to do this is to delete all i characters in the prefix of A, for a cost of i. Likewise the problem (0, j) asks for the minimum cost required to change the first 0 characters of A into the first j characters of B: just as clearly, the best way to do this is to simply insert all j characters in this prefix of B, for a cost of j.

Inductive Step

All that remains is the inductive step: Showing that we compute the answer to the (i, j) subproblem correctly, under the assumption that we have computed the answers to the (i-1, j), (i, j-1) and (i-1, j-1) subproblems correctly. The trick is to see that in every possible way of editing the first i characters of A into the first j characters of B, there are actually only 3 possible things that we could do with the last character in each of these prefixes (that is, the i-th character in A and the j-th character in B):

Pair A[i] with B[j]. Either they match (cost 0) or not (cost 1), but either way, the total cost of this pairing must be that cost (either 0 or 1), plus the smallest possible cost of editing the rest of the prefix of A (A[1 .. i-1]) into the rest of the prefix of B (B[1 .. j-1]) -- which, by assumption, we have already calculated correctly!
Delete A[i]. This costs 1, so the total cost of doing this must be 1 plus the smallest possible cost of editing the rest of the prefix of A (A[1 .. i-1]) into the entire prefix of B (B[1 .. j]) -- which, by assumption, we have already calculated correctly!
Insert B[j]. This costs 1, so the total cost of doing this must be 1 plus the smallest possible cost of editing the entire prefix of A (A[1 .. i]) into the rest of the prefix of B (B[1 .. j-1]) -- which, by assumption, we have already calculated correctly!

Since those 3 things are the only possible things that we could do, and for each of the 3 we have calculated the overall lowest cost of doing that thing, the overall best thing to do must be the best of the 3 of them. This proves that we correctly calculate the lowest cost needed to edit the first i characters of A into the first j characters of B, for any i and j -- so in particular, it's true for i = |A| and j = |B|, that is, for editing the complete string A into the complete string B.

Durrace answered 7/3, 2016 at 10:0 Comment(10)

In the induction step, there are more than three possible ways to do it. You can insert, delete or change in the middle of the prefix to transform A[:i] to B[:j]. You must prove that these changes are equivalent to one of the three cases. – Earsplitting 8/3, 2016 at 1:38

Read carefully: "there are actually only 3 possible things that we could do with the last character in each of these prefixes". There are no other possibilities. – Durrace 8/3, 2016 at 8:21

Compare: There are many ways to permute n numbers, but we can build every such permutation by permuting the first n-1 numbers somehow, and then choosing a position to insert the nth number. There are n! permutations, but there are only n possibilities for that final decision of where to insert the nth number. – Durrace 8/3, 2016 at 8:28

Maybe another way to see it: You certainly can insert, delete or change in the middle of the prefix to transform A[:i] into B[:j]. You can do this many times. But regardless of how many such "middle transformations" you perform, in any edit script that transforms A[:i] into B[:j], the relationship that exists between A[i] and B[j] will be in one of the 3 possible cases I listed. – Durrace 8/3, 2016 at 8:32

The relationship between A[i] and B[j] being only one of three cases are not clear in my opinion. The key point is the "minimal". It is not clear why such induction keeps the amount of transformations minimal. – Earsplitting 14/3, 2016 at 2:58

OK. I think part of the problem is that an "edit script" (i.e., a sequence of insertions, deletions and substitutions, e.g. "Change 3rd character to Q, delete 5th character, insert P after 10th character") doesn't have an intuitive representation. But actually there is a nice representation, which makes that case distinction much clearer: as an alignment. Are you familiar with this representation, and do you accept the one-to-one correspondence between edit scripts and alignments? – Durrace 14/3, 2016 at 11:38

"do you accept the one-to-one correspondence between edit scripts and alignments" I do accept it. – Earsplitting 14/3, 2016 at 15:1

Cool, that makes things a bit easier :) I suggest first of all picking a concrete example problem (e.g. A = BEAT, B = ART) and concrete values of i and j (e.g. i = 3, j = 2). For this subproblem, we are trying to find the optimal way to edit A[:3] = BEA into B[:2] = AR. If we represent any such edit script as an alignment, then the final column of that alignment must look like one of the following: (a) A above R; (b) A above -; (c) - above R. Do you agree? If not, what other possibilities do you see? – Durrace 14/3, 2016 at 15:34

Sorry to dig this up. But as someone coming here to find precisely this answer, and reading through the proof, I had the very same views of the above proof as OP. Namely, "there are actually only 3 possible things that we could do with the last character in each of these prefixes". I have a hard time saying what is wrong with this, but my math intuition was left unsatisfied. That being said, the 1-1 correspondence from edit scripts <-> alignments made things quite clear, and I think should be included in the original answer. Good job @Durrace – Rogers 21/9, 2017 at 6:18

I agree that this answer needs some expansion. You can transform *a into *b without deleting a, inserting b, or substituting a for b, e.g. *a -> *ax -> *ab = *b. I don't know what an "alignment" is or what "pair A[i] with B[j]" means. – Supplicant 23/1, 2022 at 3:30

If you are looking for some "insights" rather than a rigorous proof I can share some thoughts. I figured this out after looking at some code generated data and worked on the proof for a while. The intuition behind my "proof" turns out fairly simple at the end. It takes only a bit thought on top of the following lemma.

Lemma: dp[i][j] >= dp[i-1][j-1].

This lemma guarantees that editing s[0..i) to t[0..j) is indeed a sub-problem to editing s[0..i] to t[0..j]. Then we only have to prove it is also a subproblem that can be used to construct an optimal solution. Note that dp[i][j] might be strictly less than dp[i-1][j] or dp[i][j-1]. So proving the lemma might not be as trivial as it seems. For now, let's just move on.

Then, we prove that dp[i][j] == dp[i-1][j-1] if and only if when s[i]==t[j]. This is also intuitively correct so I omit the formal proof here.

Finally, dp[i][j] is at most dp[i-1][j-1]+1 when s[i]!=s[j] since we can do Insert(i,t[j]) at the beginning of the sequence correspond to dp[i-1][j-1] to edit s[0..i] to t[0..j].

For the proof of the lemma, I tried many ways to make a short and elegant proof but just had not found one yet. I see that induction can do it, but it just not make me so satisfied. But the lemma itself seems so bluntly correct, so I'm just moving forward as a software programmer.

Gallivant answered 8/3, 2024 at 0:57 Comment(1)

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. – Planetesimal 9/3, 2024 at 3:46

I couldn't find any satisfying proof so I made one. All the proofs that I've read do not actually prove that the cases are collectively exhaustive.

(A) There always exists at least one optimal series of edits and let it be called Eo

This is trivial.

(B) In Eo there are characters that never get inserted or changed. Let the last of such characters be called pivot.

If there isn't any, we can use the start of the string as pivot. In Eo, this common subsequence never gets changed from the beginning to the end. We are not assuming it to be the longest common subsequence or anything.

e.g.) #KITTEN, #SITTING → pivot : '#ITTN'[-1] = 'N'

There are a few properties of this pivot which makes the problem much easier.

The left side and the right side of the pivot are independent to each other. Whatever edits that happen on one side cannot affect the other.
By the definition (B), all characters on the right side of the pivot of the target string should be made correct by either replacement or addition.

Because of (1) we only need to consider the right side of the pivot of the original string and the target string. Let's just assume the left side is optimal(greedy) and the number of edits will be added to the total.
e.g)
#weofanbmcdepmqu -> #eopasbctdewni
only consider pmqu → wni
Using (2) this subproblem can be solved as follow.
len(original)>len(target): asdfgh→qwer
Remove len(original)-len(target) times to fit the length, and replace len(target) times to fit the characters. It can be done in any order since they are all equivalent distancewise. Removing the last character at the last edit is one of such solutions which is equal to dp(original[:-1]→target) + 1.
len(original)<len(target): asdf→qwerty
Add len(target)-len(original) times to fit the length, and replace len(original) times to fit the characters. Adding the last character at the last edit is equal to dp(original→target[:-1])+1.
len(original)==len(target)!=0: asdf→qwer
Replace for the length. It is equal to replacing the last character at the last edit. dp(original[:-1]→target[:-1])+1
len(original)==len(target)==0: ' '→' '
The last character is the pivot. This happens when the last characters are the same characters. You don't edit the pivot so it is same as dp(original[:-1]->target[:-1])

Daina answered 6/1, 2021 at 12:55 Comment(0)

This is my proof:

To get from string A to B we perform an optimal set of operations from left to right. There are 4 operations: EQ (keep the character), SWAP (change the character), INS (insert a character), DEL (delete a character). This is the definition of the edit distance, with cost of EQ == 0.

Define the length of A as a, and the Length of B as b.

Define d[a,b] as the edit distance between the first a characters of A and the first b characters of B, which means the number of operations (besides EQ) required to do on A in order to get to B.

If we look at an optimal series of operations (there must be one), there are 4 options for the last operation. We don’t know the last operation, so we check all 4 options and take the best one (best means MINIMUM edit distance).

If the last operation was EQ that means that A[a]==B[b] and that the edit distance equals to d[a-1, b-1], because EQ is not considered as an “edit distance” cost.

If the last operation was SWAP that means that A[a]!=B[b] and that the edit distance equals to d[a-1, b-1] + 1.

If the operation was INS it means that edit distance is d[a, b-1] + INS(to B) If the operation was DEL it means that the edit distance is d[a-1, b] + DEL(from A)

We simply try all combinations of the 4 operations from LAST to FIRST until we find the best path. Actually it’s 3 operations every step, because we can decide if to check EQ or SWAP depending on if the current characters are equal or not. (if they are equal, no need to check SWAP operation and vice versa).

Since we try all possible operations, the recursion formula must be true.

Linstock answered 9/1, 2021 at 14:34 Comment(0)

Say we are editing a string A of length m into a string B of length n using a minimal edit sequence.

A key fact to notice is that in a minimal edit sequence, each of the characters in A and B is operated on at most once. (This is easy to show; if a character is operated on twice, we can combine those operations into a single equivalent operation.)

It follows that we can partition the characters in A and B as:

Characters in A that are deleted (and are not in B)
Characters in B that are inserted (and are not in A)
Pairs of a character from A and a character from B which are either:
- Substituted (and are therefore different)
- Left alone (and are therefore the same)

Consider A[m] and B[n], the last characters of A and B. The following are possible cases for the minimal edit sequence; we claim that at least one of them is true:

A[m] is deleted
B[n] is inserted
A[m] is paired with B[n]

If 1) is false, then A[m] must be paired with a character A[m]' in B, and if 2) is also false, then B[n] must be paired with a character B[n]' in A.
Note that all edit operations preserve the order of the characters. So we cannot have A[m]' before B[n] and B[n]' before A[m] -- otherwise the pairs have changed order.
Thus, A[m]' is B[n] and B[n]' is A[m] -- that is, A[m] is paired with B[n], and the claim is demonstrated.

What remains is simple recursion on each of the three possible cases. Let A[..m-1] represent all of A except its last character A[m], and B[..n-1] represent all of B except its last character B[n].

In case 1), to edit A into B we must delete A[m] and edit A[..m-1] into B, which in total can be done in a minimum of 1 + distance(A[..m-1], B) operations.

In case 2), to edit A into B we must edit A into B[..n-1] and insert B[n], which in total can be done in a minimum of 1 + distance(A, B[..n-1]) operations.

And in case 3), we must either substitute A[m] into B[n] if it has changed, or leave it alone if it has not, and then edit A[..m-1] into B[..n-1], which in total can be done in a minimum of 1 + distance(A[..m-1], B[..n-1]) operations or distance(A[..m-1], B[..n-1]) operations respectively.

As these are the only possible cases, choosing a case for which this minimum number of operations is smallest yields an edit sequence with the smallest possible number of operations.

I found these sources useful:
https://blog.cykerway.com/posts/2021/04/10/minimum-edit-distance.html
https://cstheory.stackexchange.com/questions/10391/proof-of-levenshtein-distance/48178#48178

Supplicant answered 30/1, 2022 at 22:8 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Use Induction to Prove Recursive Algorithms Correct

Choosing Base Cases

Inductive Step

Recommended topics

Hot tags