Use of Haskell state monad a code smell?

E

8

62

God I hate the term "code smell", but I can't think of anything more accurate.

I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).

During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.

I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).

I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?

Ezraezri answered 3/3, 2009 at 19:44 Comment(8)

Wtf man, whitespace... – Issacissachar 3/3, 2009 at 20:9

Should I be writing a compiler for mips or x86 asm? That would be quite a bit more complicated. – Ezraezri 3/3, 2009 at 20:17

C++, Cybis. The 1999 specification. And we want it by friday. – Bourque 3/3, 2009 at 22:10

@Rob - lol, this is way overdue but maybe a subset of a language would suffice? Whitespace is just, I mean... why not go with brainfuck muppetlabs.com/~breadbox/bf or shakespeare shakespearelang.sourceforge.net/report/shakespeare ? – Issacissachar 25/1, 2010 at 12:45

LOL @ John. You are kinda late. I finished that high-level language/compiler back in May '09 (it's inspired by Haskell, Python, and Lisp - hence I named it HaPyLi). Google for 'Sudoku Whitespace' - the first few matches will be a sudoku solver I wrote in this language and compiled to Whitespace. – Ezraezri 7/2, 2010 at 23:58

compile = tokenize . parse . optimize . generate – Segment 17/8, 2019 at 21:56

@Ezraezri Any chance the HaPyLi compiler is still available? Your HaPyLi homepage is offline and the web archive doesn't have any of the linked files. – Hueston 11/9, 2019 at 4:7

@Segment rather compile = generate . optimize . parse . tokenize, right? – Asparagine 5/3, 2021 at 13:36

C

43

I'd say that state in general is not a code smell, so long as it's kept small and well controlled.

This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.

An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:

VM: the virtual machine that ran the simulation program
Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
Solution: generation of the solution file based on the output of the controllers
Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed

Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.

The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.

All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).

When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.

One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.

Cluj answered 30/6, 2009 at 7:10 Comment(4)

Very nice answer, though I'm not sure it's proper to change the "accepted" answer two months later. Much thanks regardless. – Ezraezri 4/7, 2009 at 5:6

I've added a new paragraph to show why I think the accepted answer is, on further reflection, actually the wrong answer. I think you should change it; I'm always annoyed by wrong answers sitting at the top of the list. – Cluj 5/7, 2009 at 4:7

I accepted based on the paper he linked to, not the answer itself (I agree the answer alone was quite poor). Now that I've thought about it again, that probably wasn't a good enough reason. Proper use of monads is about both modularity (your answer) and abstraction (Norman Ramsey's answer). If I accept this, both will be at the top. That would be best. – Ezraezri 5/7, 2009 at 18:12

There's always the indexed state monad: #28690948 – Synergism 5/6, 2017 at 21:55

T

47

I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.

Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:

empyGraph    :: Graph
mkLabel      :: Label -> Graph
mkAssignment :: Assignment -> Graph  -- modify a register or memory
mkTransfer   :: ControlTransfer -> Graph   -- any control transfer
(<*>)        :: Graph -> Graph -> Graph

But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:

withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
             -> Graph   -- code in the 'then' branch
             -> Graph   -- code in the 'else' branch 
             -> Graph   -- resulting if-then-else construct

The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.

The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:

type Graph = RealGraph -> [Label] -> (RealGraph, [Label])

or a bit more accurately

type Graph = RealGraph -> State [Label] RealGraph
  -- a Graph is a monadic function from a successor RealGraph to a new RealGraph

With the state monad hidden behind a layer of abstraction, it's not smelly at all!

Tarkany answered 4/3, 2009 at 4:23 Comment(0)

C

43

I'd say that state in general is not a code smell, so long as it's kept small and well controlled.

This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.

An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:

VM: the virtual machine that ran the simulation program
Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
Solution: generation of the solution file based on the output of the controllers
Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed

Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.

The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.

All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).

When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.

One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.

Cluj answered 30/6, 2009 at 7:10 Comment(4)

Very nice answer, though I'm not sure it's proper to change the "accepted" answer two months later. Much thanks regardless. – Ezraezri 4/7, 2009 at 5:6

I've added a new paragraph to show why I think the accepted answer is, on further reflection, actually the wrong answer. I think you should change it; I'm always annoyed by wrong answers sitting at the top of the list. – Cluj 5/7, 2009 at 4:7

I accepted based on the paper he linked to, not the answer itself (I agree the answer alone was quite poor). Now that I've thought about it again, that probably wasn't a good enough reason. Proper use of monads is about both modularity (your answer) and abstraction (Norman Ramsey's answer). If I accept this, both will be at the top. That would be best. – Ezraezri 5/7, 2009 at 18:12

There's always the indexed state monad: #28690948 – Synergism 5/6, 2017 at 21:55

V

6

Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?

With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.

Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.

Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.

At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code (.hs files) from the provided .ag files.

Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.

In that case, you could manually write the sort of code that attributes grammars generate for you, for example:

data AbstractSyntax = Literal Int | Block AbstractSyntax
                    | Comment String AbstractSyntax

compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _      = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
                             in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
                             in (code, s : comments')

generateCode :: Int -> Code
labelCode :: Label -> Code -> Code

Vinosity answered 3/3, 2009 at 20:41 Comment(2)

Certainly something to keep in mind, thanks. My language really isn't all that complex though - it's a lot like lisp but without macros, lists, or higher-order functions (these would be hard to translate to Whitespace). Using attribute grammars might be a little overkill. – Ezraezri 3/3, 2009 at 21:8

In that case you could definitely use the manual AG pattern. It would eliminate the need for the State monad. – Vinosity 3/3, 2009 at 21:14

S

3

It's possible that you may want an applicative functor instead of a monad:

http://www.haskell.org/haskellwiki/Applicative_functor

I think the original paper explains it better than the wiki, however:

http://www.soi.city.ac.uk/~ross/papers/Applicative.html

Scherer answered 4/3, 2009 at 11:56 Comment(0)

A

2

I don't think using the State Monad is a code smell when it used to model state.

If you need to thread state through your functions, you can do this explicitly, taking the the state as an argument and returning it in each function. The State Monad offers a good abstraction: it passes the state along for you and provides lots of useful function to combine functions that require state. In this case, using the State Monad (or Applicatives) is not a code smell.

However, if you use the State Monad to emulate an imperative style of programming while a functional solution would suffice, you are just making things complicated.

Alexine answered 3/5, 2013 at 12:52 Comment(0)

D

0

In general you should try to avoid state wherever possible, but that's not always practical. Applicative makes effectful code look nicer and more functional, especially tree traversal code can benefit from this style. For the problem of name generation there is now a rather nice package available: value-supply.

Drain answered 4/3, 2009 at 15:3 Comment(0)

I

-5

Well, don't use monads. The power of functional programming is function purity and their reuse. There's this paper a professor of mine once wrote and he's one of the guys who helped build Haskell.

The paper is called "Why functional programming matters", I suggest you read through it. It's a good read.

Issacissachar answered 3/3, 2009 at 20:12 Comment(9)

I thought overuse of monads was a code smell. There are plenty of online Haskell tutorials, but very little about good Haskell programming. Much thanks for the paper. – Ezraezri 3/3, 2009 at 20:23

Only the IO monad is in any way impure. – Marguritemargy 10/3, 2009 at 2:7

Note that John Huges, the author of "Why Functional Programming Matters," is also the author of "Generalizing Monads to Arrows," which is also a way of handling state. – Cluj 25/1, 2010 at 6:28

Terrible answer. When handling state, the alternative to using monads is threading the state through every single function. Monads are a very handy mechanism, and the state monad is in no way unpure! – Leena 8/10, 2010 at 17:14

Let's just say that maybe I didn't really understand exactly what a monad was but it was all-around and noisy, monads being the ultimate solution. I got a little irritated by that. Now that I understand how simple and elegant monads are, I also know that I program using monads all the time. I just didn't know, that what I was writing was called a monad. – Issacissachar 4/3, 2012 at 11:3

I don't see how this is relevant at all to the question. You say not to use monads, then you link to a random paper on why functional programming matters, and talk about using pure code. This ignores the fact that monads are pure constructs... – Rawinsonde 18/10, 2013 at 16:59

@Rawinsonde I'm saying you don't have to use monads. The paper I linked to does not even mention the word monad, instead it talks about functional composition. Not as an alternative to monads but as one way to write complex programs using simple functions. – Issacissachar 23/10, 2013 at 11:8

But there are some composition patterns that are inherently monadic, regardless of whether or not you actually use an instance of the Monad type class and those things occur a lot. You can do lots of case matches on a Maybe value instead of using >>= but you might still essentially be doing the same thing. Also, when you are using a specific Monad instance, you can think of return and bind being specialized helper functions that just happen to share a name with other specialized helper functions (>>= for lists is concatMap, >>= for Maybe binds Justs, etc). – Aidoneus 6/7, 2014 at 21:16

What I'm getting at is that you will probably use monads if you write a good amount of Haskell code whether or not you call them "monads". Using the Monad type class is likely the nicest way of doing this. You could keep reinventing the wheel, but it will still be essentially monadic in nature. Also, when you generalize this pattern to the Monad type class you get useful helper functions that work on all Monad instances (like join, replicateM, etc). This actually seems like a really good example of code reuse to me. – Aidoneus 6/7, 2014 at 21:21

B

-14

let's be careful about the terminology here. State is not per se bad; functional languages have state. What is a "code smell" is when you find yourself wanting to assign variables values and change them.

Of course, the Haskell state monad is there for just that reason -- as with I/O, it's letting you do unsafe and un-functional things in a constrained context.

So, yes, it's probably a code smell.

Bearded answered 3/3, 2009 at 20:14 Comment(3)

Why does that myth gets repeated? Most of the traditional monads are neither "unsafe", nor "un-functional". Even IO, which implementation /implementation/ is actually un-functional, as it is handled specifically by compiler, could be considered such. – Shultz 17/2, 2010 at 17:21

Well, primarily because, by the definition, it's true. Even in the most sophisticated compiler, the I/O component must, by necessity, have side-effects: the state of the underlying device must change, or else no I/O has occurred. – Bearded 5/3, 2010 at 18:37

The IO monad (and IIRC the ST monad) is internally un-functional for the purpose of performance. However, it doesn't have to be. The runtime environment (C code) can simply execute the monad without the Haskell code ever doing anything unsafe or un-functional. All the other monads are not at all unsafe or un-functional. – Transduction 19/6, 2010 at 19:32

Recommended topics

Hot tags