Aho-Corasick-like algorithm for use in anti-malware code
Asked Answered
G

1

6

Is there an algorithm like Aho-Corasick, which can match a set of patterns simultaneously and is applicable to be used in anti-malware comparison? Do all known commercial antivirus software use the Aho-Corasick algorithm?

What are the advantages of the Aho-Corasick algorithm over Boyer-Moore?

Gremial answered 4/11, 2011 at 18:32 Comment(5)
Keep in mind most commercial Anti Malware tools probably use something more than just exact string matching, in which case neither algorithm is the right answer.Satsuma
Yes, I mean the standard comparison process without Heuristic & AI Techniques.Gremial
But Aho-Corasick, being a finite-state method, can be extended to fuzzy matching with some basic automata theory. Determining how to weight the dictionary is the hard part.Seay
@larsmans Do you have references about how it can be extended to fuzzy matching? Do you know about ClamAV if it extended the algorithm?Gremial
@Adban: Aho-Corasick constructs a finite-state machine on which the entire algebra of FA operations applies. The references that come to mind are Kornai's Extended finite state models of language, Mehryar Mohri's papers and Jurafsky & Martin's Speech and Language Processing, first few chapters.Seay
S
7

Boyer-Moore: For searching one string in another target string
Aho-Corasick: For searching multiple patterns simultaneously

So the advantage being that Aho-Corasick is optimal if you want to search lot of patterns simultaneously in one pass.

Rabin-Karp string search can also match multiple patterns.

Sopher answered 4/11, 2011 at 18:38 Comment(1)
Is it commonly believed that Aho Corasick is optimal? I mean surely it is if you only do one query, but if you do many, might there not be a more efficient datastructure?Soricine

© 2022 - 2024 — McMap. All rights reserved.