Of which things should I take care if I'm using unboxed type (like Int#) in Haskell / GHC?

Y

2

7

I'm trying to write a small script which parses and executes Brainfuck code, to understand the GHC options of optimization, I'm trying to optimize the code in order to be a bit faster and to understand what's going on there.

On of the parts is the internal represantation of BF-code, I use a special datatype for this. Here's the sourcecode, included the two functions which are doing the conversions:

data BFinstruction
  = AdjustValue Int
  | MovePointer Int
  | GetChar
  | PutChar
  | Loop BFcode
  deriving (Eq)

type BFcode = [BFinstruction]

unsafeCompileBrainfuck :: String -> BFcode
unsafeCompileBrainfuck = fst . parse [] where
  -- arguments: input string, built code; output: output code, rest of input
  parse :: BFcode -> String -> (BFcode,String)
  parse c ('+':s) = parse (AdjustValue   1 :c) s
  parse c ('-':s) = parse (AdjustValue (-1):c) s
  parse c ('>':s) = parse (MovePointer   1 :c) s
  parse c ('<':s) = parse (MovePointer (-1):c) s
  parse c ('.':s) = parse (PutChar         :c) s
  parse c (',':s) = parse (GetChar         :c) s
  parse c (']':s) = (reverse c, s)
  parse c ('[':s) = parse (Loop l          :c) s' where (l,s') = parse [] s
  parse c []      = (reverse c ,"")
  parse c ( _ :s) = parse                   c  s

simplifyBrainfuck :: BFcode -> BFcode
simplifyBrainfuck ((AdjustValue x):(AdjustValue y):zs) = if x + y /= 0
  then simplifyBrainfuck (AdjustValue (x + y):zs)
  else simplifyBrainfuck zs
simplifyBrainfuck ((MovePointer x):(MovePointer y):zs) = if x + y /= 0
  then simplifyBrainfuck (MovePointer (x + y):zs)
  else simplifyBrainfuck zs
simplifyBrainfuck (x                              :zs) = x: simplifyBrainfuck zs
simplifyBrainfuck []                                   = []

The idea is, that the code will be read from some input (string), preparsed and simplified by the above code and then executed by some other functions. (It is assumed, that the input is valid).

In order to optimize this example, I tried to unbox the Int params of the MovePointer and AdjustValue constructors by doing domething like this:

data BFinstruction -- BangPatterns
  = AdjustValue {-# UNPACK #-} !Int
  | MovePointer {-# UNPACK #-} !Int
  | GetChar
  | PutChar
  | Loop BFcode
  deriving (Eq)

This will turn the boxed Int type into an unboxed, raw Int# type, which is an implementation detail of GHc. As I read, this option is only good in a few cases, so I want to ask, of which things I have to pay attention if I want to perform this kind of optimization. My goal is to allow the execution of BF-code using the benefits of Haskell - laziness (I want to archieve, that the code may only be kept as needed in memory) and easiness.

Yettie answered 4/9, 2010 at 9:5 Comment(1)

Just wondering how many people will tag this offensive, even though BrainF**k is an actual language... – Horizontal 4/9, 2010 at 9:9

Y

2

This really looks like a premature optimization to me. UNPACK is mostly useful when you have a very large number of BFInstructions sitting around. I doubt you'll ever have enough brainf**k code to make it worthwhile. I agree with Gian, it should be simple enough to test, so do that first.

In any case, the thing to be aware of with UNPACK'ed values is that you don't want the compiler to have to rebox them. You should be ok with numerical ops, but other than that you'd have to look carefully at the functions you're using to see if you ever use the unpacked values as a non-strict argument. The only way to be sure is to look at the core, or the interface files, to see what parameters are strict and which aren't. As always, be sure to compile with at least "-O".

Yellows answered 4/9, 2010 at 16:44 Comment(0)

U

3

Is this really necessary? Are you encountering performance problems with this code that you think are the result of boxed values? If not, don't bother.

If you do believe this is the case, then this page in the GHC manual seems to provide the necessary restrictions in a convenient list format.

The main points appear to be that any kind of interaction between polymorphic functions or names and unboxed types that is not rejected by the compiler could still lead to nasty space leaks. Also, without trying it, I suspect that you will not get exceptions thrown in the case of an overflow, for example, so presumably you should detect this kind of thing yourself. A simple test could verify if this is actually the case or not.

Unreasonable answered 4/9, 2010 at 15:34 Comment(1)

+1. If I need unboxed values, it's because I have a lot of them. And if I have a lot of them, they'll be in a vector. And then I'll just use Data.Vector.Unboxed. – Reboant 15/9, 2010 at 16:4

Y

2

This really looks like a premature optimization to me. UNPACK is mostly useful when you have a very large number of BFInstructions sitting around. I doubt you'll ever have enough brainf**k code to make it worthwhile. I agree with Gian, it should be simple enough to test, so do that first.

In any case, the thing to be aware of with UNPACK'ed values is that you don't want the compiler to have to rebox them. You should be ok with numerical ops, but other than that you'd have to look carefully at the functions you're using to see if you ever use the unpacked values as a non-strict argument. The only way to be sure is to look at the core, or the interface files, to see what parameters are strict and which aren't. As always, be sure to compile with at least "-O".

Yellows answered 4/9, 2010 at 16:44 Comment(0)

Recommended topics

Hot tags