Using low bitsize integral types like `Int8` and what they are for
Asked Answered
D

2

9

Recently I've learned that every computation cycle performs on machine words which on most contemporary processors and OS'es are either 32-bit or 64-bit. So what are the benefits of using the smaller bit-size values like Int16, Int8, Word8? What are they exactly for? Is it storage reduction only?

I write a complex calculation program which consists of several modules but is interfaced by only a single function which returns a Word64 value, so the whole program results in Word64 value. I'm interested in the answer to this question because inside this program I found myself utilizing a lot of different Integral types like Word16 and Word8 to represent small entities, and seeing that they quite often got converted with fromIntegral got me thinking: was I making a mistake there and what was the exact benefit of those types which I not knowing about got blindly attracted by? Did it make sense at all to utilize other integral types and evetually convert them with fromIntegral or maybe I should have just used Word64 everywhere?

Doctor answered 22/1, 2012 at 19:26 Comment(0)
W
6

These smaller types give you a memory reduction only when you store them in unboxed arrays or similar. There, each will take as many bits as indicated by the type suffix.

In general use, they all take exactly as much storage as an Int or Word, the main difference is that the values are automatically narrowed to the appropriate bit size when using fixed-width types, and there are (still) more optimisations (in the form of rewrite rules mainly) for Int and Word than for Int8 etc., so some operations will be slower using those.

Concerning the question whether to use Word64 throughout or to use smaller types, that depends. On a 64-bit system, when compiling with optimisations, the performance of Word and Word64 should mostly be the same since where it matters both should be unpacked and the work is done on the raw machine Word#. But there probably still are a few rules for Word that have no Word64 counterpart yet, so perhaps there is a difference after all. On a 32-bit system, most operations on Word64 are implemented via C calls, so there operations on Word64 are much slower than operations on Word.

So depending on what is more important, simplicity of code or performance on different systems, either

  1. use Word64 throughout: simple code, good performance on 64-bit systems
  2. use Word as long as your values are guaranteed to fit into 32 bits and transform to Word64 at the latest safe moment: more complicated code, but better performance on 32-bit systems.
Waki answered 22/1, 2012 at 19:45 Comment(3)
Thank you Daniel! Please consider the comment I left to the ehird's answer. Since generally your answers are similar it addresses you too.Doctor
Updated answer. Depending on your situation and aims, I may disagree with ehird.Waki
@DanielFischer is your statement about when you get a memory reduction still true in base-4.16? The definition of Int8 changed to the following in that version: data {-# CTYPE "HsInt8" #-} Int8 = I8# Int8#, whereas it was I8# Int# in earlier versions. There have been similar changes to all the other fixed-size signed/unsigned integral types (although the code comments still claim that the sub-64-bit types are represented in the same way as Int/Word).Firehouse
I
8

In GHC, the fixed-size integral types all take up a full machine word, so there's no space savings to be had. Using machine-word-sized types (i.e. Int and Word) will probably be faster than the fixed-size types in most cases, but using a fixed-size integral type will be faster than doing explicit wrap-around.

You should choose the appropriate type for the range of values you're using. maxBound :: Word8 is 255, 255 + 1 :: Word8 is 0 — and if you're dealing with octets, that's exactly what you want. (For instance, ByteStrings are defined as storing Word8s.)

If you just have some integers that don't need a specific number of bits, and the calculations you're doing aren't going to overflow, just use Int or Word (or even Integer). Fixed-size types are less common than the regular integral types because, most of the time, you don't need a specific size.

So, don't use them for performance; use them if you're looking for their specific semantics: fixed-size integral types with defined overflow behaviour.

Infest answered 22/1, 2012 at 19:40 Comment(3)
As always you and Daniel to the rescue! ) Okay, so generally Int and Word are better than those small types - I get it. But what about the second question I was pointing out: will it still be more effective to perform all the intrinsic operations in Word if I know that in the end I would still evetually have to upsample those values to Word64, or does it make sense in this scenario to just use Word64 everywhere from the beginning and thus spare some cycles by eliminating the fromIntegral conversions?Doctor
If you're aiming to produce a Word64 result, I would use Word64 for the intermediate computations, too. On a 64-bit machine, it'll be a machine word, so there shouldn't be any performance problems, and fromIntegral will be a NOP at runtime.Infest
No space savings? I thought that with bangpatterns they would save space in data constructors.Coridon
W
6

These smaller types give you a memory reduction only when you store them in unboxed arrays or similar. There, each will take as many bits as indicated by the type suffix.

In general use, they all take exactly as much storage as an Int or Word, the main difference is that the values are automatically narrowed to the appropriate bit size when using fixed-width types, and there are (still) more optimisations (in the form of rewrite rules mainly) for Int and Word than for Int8 etc., so some operations will be slower using those.

Concerning the question whether to use Word64 throughout or to use smaller types, that depends. On a 64-bit system, when compiling with optimisations, the performance of Word and Word64 should mostly be the same since where it matters both should be unpacked and the work is done on the raw machine Word#. But there probably still are a few rules for Word that have no Word64 counterpart yet, so perhaps there is a difference after all. On a 32-bit system, most operations on Word64 are implemented via C calls, so there operations on Word64 are much slower than operations on Word.

So depending on what is more important, simplicity of code or performance on different systems, either

  1. use Word64 throughout: simple code, good performance on 64-bit systems
  2. use Word as long as your values are guaranteed to fit into 32 bits and transform to Word64 at the latest safe moment: more complicated code, but better performance on 32-bit systems.
Waki answered 22/1, 2012 at 19:45 Comment(3)
Thank you Daniel! Please consider the comment I left to the ehird's answer. Since generally your answers are similar it addresses you too.Doctor
Updated answer. Depending on your situation and aims, I may disagree with ehird.Waki
@DanielFischer is your statement about when you get a memory reduction still true in base-4.16? The definition of Int8 changed to the following in that version: data {-# CTYPE "HsInt8" #-} Int8 = I8# Int8#, whereas it was I8# Int# in earlier versions. There have been similar changes to all the other fixed-size signed/unsigned integral types (although the code comments still claim that the sub-64-bit types are represented in the same way as Int/Word).Firehouse

© 2022 - 2024 — McMap. All rights reserved.