Why are GHC tuples limited to size 62?
Asked Answered
M

1

23

From the the Haskell 98 report:

There is no upper bound on the size of a tuple, but some Haskell implementations may restrict the size of tuples, and limit the instances associated with larger tuples. However, every Haskell implementation must support tuples up to size 15, together with the instances for Eq, Ord, Bounded, Read, and Show. (...)

However, it's well known that GHC does not support tuples of size greater than 62. Here's what happens when I try to create a tuple of size 63 in GHCi:

<interactive>:1:1: error:
    A 63-tuple is too large for GHC
      (max size is 62)
      Workaround: use nested tuples or define a data type

I recognise that this is in accordance with the Haskell 98 specification, and also that a tuple of size greater than 62 is likely to be highly unnecessary, but I don't understand why exactly this is as it is in GHC.


To summarise:

  • Why is there a tuple size limit at all?
  • Why is the size limit specifically at 62?

In addition:

  • Why is this the case only from GHC 6.12.2 and onwards?
  • Do other prominent Haskell implementations do this? What are their reasons?
Michelson answered 25/9, 2017 at 19:31 Comment(9)
Tuples of higher-cardinality are not very often used in Haskell, since we prefer to work with structurally (co-)recursive data types. I can't think of a time when I've used anything larger than a 3-tuple; to do so would be a code smell IMHO. GHC's suggestion is also valid, if you really need a product type with more than 62 fields, it's going to be worth defining a data record yourself.Galactometer
Why is there a size limit and why is it 62? Because Manuel Chakravarty says that larger tuples cause segmentation faults. Way back in GHC 6, this limit was not present. It would be fun to figure out what has happened since...Gonfalonier
@Gonfalonier Indeed, I'm aware that this 62-field limit is from GHC 6.12.2 onwards. I may add this to the question. Edit: addedMichelson
@AJFarmar I think Alec's comment contains strictly more content than "this limit is from GHC 6.12.2 onwards" -- it is also a direct answer to the questions you posed. "Why is there a tuple size limit at all?" To avoid segfaults. "Why is the limit specifically at 62?" Because that's where the segfaults start. (In fact I think he should have posted this as an answer rather than as a comment!)Tegantegmen
@DanielWagner That's what I'm asking: why do these faults occur at all? What changed in GHC 6.12.2?Michelson
I suspect what changed in ghc 6.12.2 is that someone noticed that if you had a triple of more than 62 things you got a segfault. So the limit was added to prevent this. If we postulate 4 bytes per element in a tuple (ie a 32-bit machine) and a 8 byte header then we could fit the total size of a tuple into one byte (ie up to 256 bytes) perhaps that is the reason for this limit. I don’t know about the internal representation in memory of Haskell types so this is all guesswork. However it would explain segfaults: if a tuple were bigger the header would be corrupted or the gc would get lost.Lakeshialakey
What I find interesting here is that records can easily have more than 62 type variables / fields. So that means that there is some core way in which tuples actually aren't the record types we are always told they are. That makes me wonder (1) why and (2) how can I take advantage of the difference?Gonfalonier
Whatever the reason is i think Workaround: use nested tuples or define a data type is a beautiful tip.Orcus
A post-facto justification: there are quite a few boilerplate instances of code that creates class instances for all tuple sizes (like Data.Binary) that result in $O(n^2)$ code size where $n$ is the maximum tuple size.Anywheres
A
37

I think the speculation re: the timing of this change in the comments is wrong. First, as far as I can tell, the limit has existed since LONG before 6.12.1. As can be seen in Trac #98 from November 2002, in version 5.02.2, the limit was 37 (instead of 62), and trying to use a larger tuple generated a mysterious message:

Switches_tupel.lhs:1:
Failed to find interface decl for
`PrelTup.(,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,)'
    from module `PrelTup'

Simon Peyton-Jones fixed the bug by having the compiler check the size earlier in the compilation pipeline and generate a nicer error message (visible in Git commit b44c6881). By the time this commit was made, the limit had already been increased from 37 to 62 (Git commit 9af77fa4, which integrated Template Haskell work into the HEAD), so GHC 5.04 was released with a 62-tuple limit and a better error message.

I believe the original Trac #98 bug points to the reason for a limit. In ghc/compiler/prelude/TysWiredIn.hs, a set of tuple type and data constructors is preallocated:

boxedTupleArr, unboxedTupleArr :: Array Int (TyCon,DataCon)
boxedTupleArr   = listArray (0,mAX_TUPLE_SIZE) [mk_tuple Boxed   i 
                    | i <- [0..mAX_TUPLE_SIZE]]
unboxedTupleArr = listArray (0,mAX_TUPLE_SIZE) [mk_tuple Unboxed i 
                    | i <- [0..mAX_TUPLE_SIZE]]

where mAX_TUPLE_SIZE is the 62-tuple limit in question. However, the functions that actually use these preallocated arrays are happy to generate bigger constructors on demand ("Build one specially"):

tupleTyCon :: Boxity -> Arity -> TyCon
tupleTyCon sort i | i > mAX_TUPLE_SIZE 
                = fst (mk_tuple sort i)  -- Build one specially
tupleTyCon Boxed   i = fst (boxedTupleArr   ! i)
tupleTyCon Unboxed i = fst (unboxedTupleArr ! i)

and this is what the compiler used to do before Simon added the error message for 5.04 -- it built one specially.

Unfortunately, this caused an error (not a segfault, just an error) later in the compilation process when the compiler couldn't find an interface definition for the too-large tuple in the list given in ghc/libraries/ghc-prim/GHC/Tuple.hs. As per the (slightly outdated) comments in TysWiredIn.hs under the heading The tuple types, the declarations in Tuple.hs are used to construct the info tables and entry code for tuple constructors, even though these could theoretically be programmatically generated on the fly for arbitrarily large tuples.

So, what does this mean for modern-day GHC? Well, for the same technical reasons described above, even though the compiler is prepared to generate arbitrarily large tuples, there's a limit imposed by the fact that they require matching declarations in .../GHC/Tuple.hs.

I ran a few experiments, compiling GHC from source with the tuple length check disabled. The resulting compiler successfully compiled and ran the following program with a 100-tuple:

a = (False,...,False)  -- imagine 100 Falses
main = let (x,_,...,_) = a
       in print x

and it printed "False". It worked fine when I modified it to grab the last element of the same tuple:

a = (False,...,False)  -- imagine 100 Falses
main = let (_,...,_,x) = a
       in print x

However, the program:

a = (False,...,False)  -- imagine 100 Falses
main = let (x,_,...,_,y) = a
       in print (x,y)

failed with a linkage error:

[1 of 1] Compiling Main             ( Tuple.hs, Tuple.o )
Linking Tuple ...
Tuple.o(.data+0x0): error: undefined reference to 'ghczmprim_GHCziTuple_Z100T_con_info'
collect2: error: ld returned 1 exit status
`gcc' failed in phase `Linker'. (Exit code: 1)

I suspect that for the first two programs, the compiler optimized away the reference to the missing constructor, but the final program needed it. After I added a declaration for a 100-tuple in Tuple.hs and rebuilt the compiler, all three programs compiled and ran fine.

In short, compiling the manually constructed list of tuples in Tuple.hs generates required data structures to support tuples up to size 62, and no one has been sufficiently motivated to re-implement this data structure generation to be independent of the Tuple.hs crutch. If they did, GHC would probably support tuples of arbitrary size.

As an aside, the note in Tuple.hs about Manuel's segfault (referenced in one of the comments to this question) dates from July 2001, when it was checked into libraries/base/Data/Tuple.hs, so whatever it was about, it had nothing to do with GHC 6.12.1. This issue is probably the reason the max was set to 62 by Simon, but the limit no longer seems to apply with modern GHC.

Agribusiness answered 26/9, 2017 at 0:39 Comment(3)
Interesting. Lifting the limit on tuple size sounds like something that shouldn't be too difficult to do. I might try doing this, just as a way of playing around with GHC internals.Gonfalonier
Bravo. I hope your work gets included in a book on the history of GHC someday. :DCauchy
I'm not sure this is quite so simple. The problem is that you need one place to declare the necessary info tables. If you regenerate them in every module you need them all sorts of things will break (e.g. profiling). I suppose you might be able to lean on the linker to eliminate the duplicate tables but this isn't currently something that we do.Whaling

© 2022 - 2024 — McMap. All rights reserved.