Why can't you pass data constructors around like regular functions in OCaml?

Asked 27/3, 2021 at 16:59 Answered 27/3, 2021 at 18:23

For instance, consider the following code:

type foo = Foo of int

let apply z f = f z

(* This is not allowed *)
let create_foo = Foo

(* This is allowed *)
let create_foo i = Foo i

(* This is not allowed *)
apply 1 Foo

(* This is allowed *)
apply 1 create_foo

Are data constructors special function that must be fully applied? When used as a function, Foo and create_foo are identical in what they do. What's the reasoning for disallowing the usage of Foo as a regular function that you can pass around and partially apply?

Haskell seems to allow this behaviour.

Stinko answered 27/3, 2021 at 16:59 Comment(0)

From the horse's mouth, Xavier Leroy, in this mailing list message from 2001:

The old Caml V3.1 implementation treated constructors as functions like SML. In Caml Light, I chose to drop this equivalence for several reasons:

Simplicity of the compiler. Internally, constructors are not functions, and a special case is needed to transform Succ into (fun x -> Succ x) when needed. This isn't hard, but remember that Caml Light was really a minimal, stripped-down version of Caml.

Constructors in Caml Light and OCaml really have an arity, e.g. C of int * int is really a constructor with two integer arguments, not a constructor taking one argument that is a pair. Hence, there would be two ways to map the constructor C to a function: fun (x,y) -> C(x,y) or fun x y -> C(x,y) The former is more natural if you come from an SML background (where constructors have 0 or 1 argument), but the latter fits better the Caml Light / OCaml execution model, which favors curried functions. By not treating constructors like functions, we avoid having to choose...

Code clarity. While using a constructor as a function is sometimes convenient, I would argue it is often hard to read. Writing "fun x -> Succ x" is more verbose, but easier to read, I think.

Miscue answered 27/3, 2021 at 17:15 Comment(1)

Thanks a lot, that makes sense. Adding a little historical context to this answer (since I had to search this up). "[Caml's] successor, Caml Light, was implemented in C by Xavier Leroy and Damien Doligez, and the original was nicknamed "Heavy Caml" because of its higher memory and CPU requirements. Caml Special Light was a further complete rewrite that added a powerful module system to the core language. It was augmented with an object layer to become Objective Caml, eventually renamed OCaml." – Stinko 27/3, 2021 at 17:22

I won’t speak in the name of the language designers. However, constructors have stronger properties than general functions. They

are injective,
have a fixed arity,
do not execute arbitrary code,
and allocate a memory block (except in some optimized cases).

So it makes sense to distinguish them syntactically.

Point 2 means that you can apply them totally, without building nor calling any closure. In a way you are inlining the constructor, seen as a function, but point 3 and 4 say that the code of the function is straightforward (it simply allocates and initializes a value block). In fact, there is no need for a function in memory that represents the constructor.

Thus, by forcing constructor applications to be total, you can have a faster compilation scheme, that also better evidences allocations when reading your OCaml code. This is appreciated by some, as allocations are among the prime things monitored by performance-concerned OCaml programmers.

The syntax Constr (a,b,c) resembles that of regular tuples (a,b,c) on purpose: it does mostly the same thing, only with a different type (and with the tag of the constructor, different from the implied tag of tuples). In this example the constructor does not take one tuple argument, but rather it takes three arguments a, b and c where parentheses and commas are part of the syntax of the constructor. So the syntax token Constr cannot be treated as (fun abc -> Cons abc). In other words, uncurried syntax does not lead naturally to a syntax for partial application (including no-application), which is on purpose.

The tuple syntax in patterns also evidences better injectivity, I think. In pattern-matching, you are actually destructing a tuple, you are not magically finding the unique values for the arguments of an injective function such that the application of said function yields the matched value. But that is more minor.

That said, conflating constructors and functions is perfectly reasonable too. Coq does it, even though its syntax draws on Caml. OCaml constructors not behaving as proper functions is a recurring subject of complaint, for example, you may skim the discussion of this pull request.

Skinnydip answered 27/3, 2021 at 18:23 Comment(0)

Recommended topics

Hot tags