How to find out GHC's memory representations of data types?
Asked Answered
S

1

16

Recently, blog entries such as Computing the Size of a Hashmap explained how to reason about space complexities of commonly used container types. Now I'm facing the question of how to actually "see" which memory layout my GHC version chooses (depending on compile flags and target architecture) for weird data types (constructors) such as

data BitVec257 = BitVec257 {-# UNPACK #-} !Word64
                           {-# UNPACK #-} !Word64
                           {-# UNPACK #-} !Bool
                           {-# UNPACK #-} !Word64
                           {-# UNPACK #-} !Word64

data BitVec514 = BitVec514 {-# UNPACK #-} !BitVec257
                           {-# UNPACK #-} !BitVec257

In C there's the sizeof and offsetof operator, which allows me to "see" what size and alignment was chosen for the fields of C struct.

I've tried to look at GHC Core in the hope to find some hint there, but I didn't know what to look for. Can somebody point me in the right direction?

Supremacy answered 4/7, 2011 at 17:28 Comment(2)
What's your motivation? Pure curiosity, or you're trying to interface with another language, or something else?Beeeater
Yes, mostly curiosity. I want to be able to verify whether GHC really does what I expect/assume it does... or whether I need to fix my assumptions... :-)Supremacy
U
11

My first idea was to use this neat litte function, due to Simon Marlow:

{-# LANGUAGE MagicHash,UnboxedTuples #-}
module Size where

import GHC.Exts
import Foreign

unsafeSizeof :: a -> Int
unsafeSizeof a =
  case unpackClosure# a of
    (# x, ptrs, nptrs #) ->
      sizeOf (undefined::Int) + -- one word for the header
        I# (sizeofByteArray# (unsafeCoerce# ptrs)
             +# sizeofByteArray# nptrs)

Using it:

Prelude> :!ghc -c Size.hs

Size.hs:15:18:
    Warning: Ignoring unusable UNPACK pragma on the
             third argument of `BitVec257'
    In the definition of data constructor `BitVec257'
    In the data type declaration for `BitVec257'
Prelude Size> unsafeSizeof $! BitVec514 (BitVec257 1 2 True 3 4) (BitVec257 1 2 True 3 4)
74

(Note that GHC is telling you that it cannot unbox Bool since it's a sum type.)

The above function claims that your data type uses 74 bytes on a 64-bit machine. I find that hard to believe. I'd expect the data type to use 11 words = 88 bytes, one word per field. Even Bools take one word, as they are pointer to (statically allocated) constructors. I'm not quite sure what's going on here.

As for alignment I believe every field should be word aligned.

Umpteen answered 4/7, 2011 at 19:4 Comment(2)
Ah, so I think that function has a bug since we changed the representation of ByteArray# (it now has a length in bytes instead of words), and so sizeOfByteArray# does not multiply by the word size. You need to multiple the results of the first sizeOFByteArray# by the word size.Drawplate
btw, what does ptrs and nptrs actually represent?Supremacy

© 2022 - 2024 — McMap. All rights reserved.