What are the main differences between the Knuth-Morris-Pratt and Boyer-Moore search algorithms?
Asked Answered
P

3

65

What are the main differences between the Knuth-Morris-Pratt search algorithm and the Boyer-Moore search algorithm?

I know KMP searches for Y in X, trying to define a pattern in Y, and saves the pattern in a vector. I also know that BM works better for small words, like DNA (ACTG).

What are the main differences in how they work? Which one is faster? Which one is less computer-greedy? In which cases?

Pooka answered 29/9, 2012 at 20:20 Comment(1)
BM works better on "natural text" instead of small setsUredo
T
46

Moore's UTexas webpage walks through both algorithms in a step-by-step fashion (he also provides various technical sources):

According to the man himself,

The classic Boyer-Moore algorithm suffers from the phenomenon that it tends not to work so efficiently on small alphabets like DNA. The skip distance tends to stop growing with the pattern length because substrings re-occur frequently. By remembering more of what has already been matched, one can get larger skips through the text. One can even arrange ``perfect memory'' and thus look at each character at most once, whereas the Boyer-Moore algorithm, while linear, may inspect a character from the text multiple times. This idea of remembering more has been explored in the literature by others. It suffers from the need for very large tables or state machines.

However, there have been some modifications of BM that have made small-alphabet searching viable.

Transpacific answered 29/9, 2012 at 20:30 Comment(0)
U
45

In an rough explanation

Boyer-Moore's approach is to try to match the last character of the pattern instead of the first one with the assumption that if there's not match at the end no need to try to match at the beginning. This allows for "big jumps" therefore BM works better when the pattern and the text you are searching resemble "natural text" (i.e. English)

Knuth-Morris-Pratt searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters. (Source: Wiki)

This means KMP is better suited for small sets like DNA (ACTG)

Uredo answered 29/9, 2012 at 20:27 Comment(8)
I don't get why it'd be an improvement to match the last characters first. If it fails, you still have to move forward by a single character, no?Sirkin
@ThomasAhle Here is an example: Word: guitar Text: I love guitars. Then you'll try to match the "r" of guitar (6th character) vs the 6th character of the text... the "e" of "love'... since they don't match... no need to check against "I love" since they will never be a match.. so you jump all that part...Uredo
Right, and then you jump to check 'r' vs ' ', but that still only moved you forward 1 step. If you had checked 'g' vs 'l' you would have had the same result. No?Sirkin
@ThomasAhle probably this will help: utdallas.edu/~besp/demo/John2010/boyer-moore.htm And try my example case: Text: "I love guitars" Pattern: "guitar"Uredo
Nice example to show the idea behind BM. Unfortunately, the implementation is buggy. Try "HERE IS A SIMPIEXAMPLE" and "EXAMPLE" as input. This will result in no match, even though the search string occurs in the searched text.Frum
@ThomasAhle I think the idea is that you can figure out where the character can match in the substring. For example, if the character isn't in the substring, you can skip by the length of the substring.Bullis
@Uredo No, Boyer-Moore's doesn't skip like that. You need to shift your alignment so that you find the closest 'e' to the left or 'r' in "guitar" and align that 'e' in the pattern with the 'e' causing a mismatch in text (since there is no 'e' in "guitar", you skip all the way to the next character of 'e' in text). Watch this amazing video.Hedden
I found a nice blog which explains Boyer-Moore technique. hereSuffocate
C
5

Boyer-Moore technique match the characters from right to left, works well on long patterns. knuth moris pratt match the characters from left to right, works fast on short patterns.

Crucifixion answered 14/1, 2014 at 9:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.