Is using a finite state machine a good design for general text parsing?
Asked Answered
B

2

2

I am reading a file that is filled with hex numbers. I have to identify a particular pattern, say "aaad" (without quotes) from it. Every time I see the pattern, I generate some data to some other file.

This would be a very common case in designing programs - parsing and looking for a particular pattern.

I have designed it as a Finite State Machine and structured structured it in C using switch-case to change states. This was the first implementation that occured to me.

  • DESIGN: Are there some better designs possible?
  • IMPLEMENTATION: Do you see some problems with using a switch case as I mentioned?
Binetta answered 5/5, 2010 at 20:10 Comment(1)
@Pierreten: never used regex before in C code. are there some regex libraries for C?Binetta
S
1

A hand-rolled FSM can work well for simple situations, but they tend to get unwieldy as the number of states and inputs grows.

There is probably no reason to change what you have already designed/implemented, but if you are interested in general-purpose text parsing techniques, you should probably look at things like regular expressions, Flex, Bison, and ANTLR.

Subcartilaginous answered 5/5, 2010 at 20:14 Comment(1)
+1: Regex is an FSM. So, for that matter are parsers generated by tools like Flex, Bison and ANTLR. I like this: Yes, use an FSM; no, don't roll your own.Lib
C
1

For embarrassingly simple cases couple of if's or switch'es are sufficient. For parsing a string on POSIX systems, man regex (3). For fully featured parsing of whole files (e.g. complex configs) use Lex/Flex and Yacc/Bison.

When writing in C++, look at Boost Regex for the simpler case and Boost Spirit for more complex one. Flex & Bison work with C++ too.

Cyclic answered 5/5, 2010 at 21:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.