What is the Pythonic way to implement a simple FSM?

G

7

14

Yesterday I had to parse a very simple binary data file - the rule is, look for two bytes in a row that are both 0xAA, then the next byte will be a length byte, then skip 9 bytes and output the given amount of data from there. Repeat to the end of the file.

My solution did work, and was very quick to put together (even though I am a C programmer at heart, I still think it was quicker for me to write this in Python than it would have been in C) - BUT, it is clearly not at all Pythonic and it reads like a C program (and not a very good one at that!)

What would be a better / more Pythonic approach to this? Is a simple FSM like this even still the right choice in Python?

My solution:

#! /usr/bin/python

import sys

f = open(sys.argv[1], "rb")

state = 0

if f:
    for byte in f.read():
        a = ord(byte)       
        if state == 0:
            if a == 0xAA:
                state = 1
        elif state == 1:
            if a  == 0xAA:
                state = 2
            else: 
                state = 0
        elif state == 2:
            count = a;
            skip = 9
            state = 3
        elif state == 3:
            skip = skip -1
            if skip == 0:
                state = 4
        elif state == 4:
             print "%02x" %a
             count = count -1 
             if count == 0:
                 state = 0
                 print "\r\n"

Gower answered 26/5, 2010 at 19:38 Comment(3)

Looks fine to me. That's how I write 'em. – Chlorohydrin 26/5, 2010 at 19:41

Well, first you need a beer volcano, then preferably a pirate outfit... Hold on, which FSM are we talking about here? – Mis 26/5, 2010 at 20:7

I must be tired; I keep reading FSM as Flying Spaghetti Monster. – Xanthochroism 26/5, 2010 at 20:8

V

7

You could give your states constant names instead of using 0, 1, 2, etc. for improved readability.

You could use a dictionary to map (current_state, input) -> (next_state), but that doesn't really let you do any additional processing during the transitions. Unless you include some "transition function" too to do extra processing.

Or you could do a non-FSM approach. I think this will work as long as 0xAA 0xAA only appears when it indicates a "start" (doesn't appear in data).

with open(sys.argv[1], 'rb') as f:
    contents = f.read()
    for chunk in contents.split('\xaa\xaa')[1:]:
        length = ord(chunk[0])
        data = chunk[10:10+length]
        print data

If it does appear in data, you can instead use string.find('\xaa\xaa', start) to scan through the string, setting the start argument to begin looking where the last data block ended. Repeat until it returns -1.

Vive answered 26/5, 2010 at 19:58 Comment(3)

Brilliant - thank you! I knew there'd be some way like this but couldn't quite get my head round it. – Gower 27/5, 2010 at 8:41

Oh - 0xAA 0xAA does not appear in valid data. It might conceivably appear in the junk between chunks of valid data, in which case neither your solution nor mine (nor any other solution AFAICT) can handle it. – Gower 27/5, 2010 at 8:42

I think this more pythonic than most overly OOP-ish ways to do this. github.com/kashifrazzaqui/themstates – Ailment 6/9, 2019 at 18:15

A

9

The coolest way I've seen to implement FSMs in Python has to be via generators and coroutines. See this Charming Python post for an example. Eli Bendersky also has an excellent treatment of the subject.

If coroutines aren't familiar territory, David Beazley's A Curious Course on Coroutines and Concurrency is a stellar introduction.

Avalon answered 26/5, 2010 at 19:56 Comment(3)

Wow! Thanks---I've started reading the Beazley reference, and it's going to take most of the evening, but it has already been worth it. – Anti 26/5, 2010 at 21:4

It's phenomenal. I come from a heavily (OK, entirely) imperative background, and it completely blew my mind the first time I read it. – Avalon 26/5, 2010 at 21:36

Interesting - thank you! Probably overkill for my specific case, but very cool nonetheless. – Gower 27/5, 2010 at 8:44

V

7

You could give your states constant names instead of using 0, 1, 2, etc. for improved readability.

You could use a dictionary to map (current_state, input) -> (next_state), but that doesn't really let you do any additional processing during the transitions. Unless you include some "transition function" too to do extra processing.

Or you could do a non-FSM approach. I think this will work as long as 0xAA 0xAA only appears when it indicates a "start" (doesn't appear in data).

with open(sys.argv[1], 'rb') as f:
    contents = f.read()
    for chunk in contents.split('\xaa\xaa')[1:]:
        length = ord(chunk[0])
        data = chunk[10:10+length]
        print data

If it does appear in data, you can instead use string.find('\xaa\xaa', start) to scan through the string, setting the start argument to begin looking where the last data block ended. Repeat until it returns -1.

Vive answered 26/5, 2010 at 19:58 Comment(3)

Brilliant - thank you! I knew there'd be some way like this but couldn't quite get my head round it. – Gower 27/5, 2010 at 8:41

Oh - 0xAA 0xAA does not appear in valid data. It might conceivably appear in the junk between chunks of valid data, in which case neither your solution nor mine (nor any other solution AFAICT) can handle it. – Gower 27/5, 2010 at 8:42

I think this more pythonic than most overly OOP-ish ways to do this. github.com/kashifrazzaqui/themstates – Ailment 6/9, 2019 at 18:15

A

3

I am a little apprehensive about telling anyone what's Pythonic, but here goes. First, keep in mind that in python functions are just objects. Transitions can be defined with a dictionary that has the (input, current_state) as the key and the tuple (next_state, action) as the value. Action is just a function that does whatever is necessary to transition from the current state to the next state.

There's a nice looking example of doing this at http://code.activestate.com/recipes/146262-finite-state-machine-fsm. I haven't used it, but from a quick read it seems like it covers everything.

A similar question was asked/answered here a couple of months ago: Python state-machine design. You might find looking at those responses useful as well.

Anti answered 26/5, 2010 at 20:4 Comment(0)

B

1

I think your solution looks fine, except you should replace count = count - 1 with count -= 1.

This is one of those times where fancy code-show-offs will come up ways of have dicts mapping states to callables, with a small driver function, but it isn't better, just fancier, and using more obscure language features.

Butte answered 26/5, 2010 at 19:56 Comment(0)

U

1

I suggest checking out chapter 4 of Text Processing in Python by David Mertz. He implements a state machine class in Python that is very elegant.

Undulant answered 26/5, 2010 at 19:57 Comment(0)

I

1

I think the most pythonic way would by like what FogleBird suggested, but mapping from (current state, input) to a function which would handle the processing and transition.

Irreparable answered 26/5, 2010 at 20:1 Comment(0)

O

1

You can use regexps. Something like this code will find the first block of data. Then it's just a case of starting the next search from after the previous match.

find_header = re.compile('\xaa\xaa(.).{9}', re.DOTALL)
m = find_header.search(input_text)
if m:
    length = chr(find_header.group(1))
    data = input_text[m.end():m.end() + length]

Overcharge answered 26/5, 2010 at 20:6 Comment(0)

Recommended topics

Hot tags