One thing I do not fully understand about Haskell is declaring functions and their types: is it something you have to do or is it just something you should do for clarity? Or are there certain scenarios where you need to do it, just not all?
You don’t need to declare the type of any function that uses only standard Haskell type system features. Haskell 98 is specified with global type inference, meaning that the types of all top-level bindings are guaranteed to be inferable.
However, it’s good practice to include type annotations for top-level definitions, for a few reasons:
Verifying that the inferred type matches your expectations
Helping the compiler producing better diagnostic messages when there are type mismatches
Most importantly, documenting your intent and making the code more readable for humans!
As for definitions in where
clauses, it’s a matter of style. The conventional style is to omit them, partly because in some cases, their types could not be written explicitly before the ScopedTypeVariables
extension. I consider the omission of scoped type variables a bit of a bug in the 1998 and 2010 standards, and GHC is the de facto standard compiler today, but it’s still a nonstandard extension. Regardless, it’s good practice to include annotations where possible for nontrivial code, and helpful for you as a programmer.
In practice, it’s common to use some language extensions that complicate type inference or make it “undecidable”, meaning that, at least for arbitrary programs, it’s impossible to always infer a type, or at least a unique “best” type. But for the sake of usability, extensions are usually designed very carefully to only require annotations at the point where you actually use them.
For example, GHC (and standard Haskell) will only infer polymorphic types with top-level forall
s, which are normally left entirely implicit. (They can be written explicitly using ExplicitForAll
.) If you need to pass a polymorphic function as an argument to another function like (forall t. …) -> …
using RankNTypes
, this requires an annotation to override the compiler’s assumption that you meant something like forall t. (… -> …)
, or that you mistakenly applied the function on different types.
If an extension requires annotations, the rules for when and where you must include them are typically documented in places like the GHC User’s Guide, and formally specified in the papers specifying the feature.
Short answer: Functions are defined in "bindings" and have their types declared in "type signatures". Type signatures for bindings are always syntactically optional, as the language doesn't require their use in any particular case. (There are some places type signatures are required for things other than bindings, like in class definitions or in declarations of data types, but I don't think there's any case where a binding requires an accompanying type signature according to the syntax of the language, though I might be forgetting some weird situation.) The reason they aren't required is that the compiler can usually, though not always, figure out the types of functions itself as part of its type-checking operation.
However, some programs may not compile unless a type signature is added to a binding, and some programs may not compile unless a type signature is removed, so sometimes you need to use them, and sometimes you can't use them (at least not without a language extension and some changes to the syntax of other, nearby type signatures to use the extension).
It is considered best practice to include type signatures for every top-level binding, and the GHC -Wall
flag will warn you if any top-level bindings lack an associated type signature. The rationale for this is that top-level signatures (1) provide documentation for the "interface" of your code, (2) aren't so numerous that they overburden the programmer, and (3) usually provide sufficient guidance to the compiler that you get better error messages than if you omit type signatures entirely.
If you look at almost any real-world Haskell source code (e.g., browse the source of any decent library on Hackage), you'll see this convention being used -- all top-level bindings have type signatures, and type signatures are used sparingly in other contexts (in expressions or where
or let
clauses). I'd encourage any beginner to use this same convention in the code they write as they're learning Haskell. It's a good habit and will avoid many frustrating error messages.
Long answer:
In Haskell, a binding assigns a name to a chunk of code, like the following binding for the function hypo
:
hypo a b = sqrt (a*a + b*b)
When compiling a binding (or collection of related bindings), the compiler performs a type-checking operation on the expressions and subexpressions that are involved.
It is this type-checking operation that allows the compiler to determine that the variable a
in the above expression must be of some type t
that has a Num t
constraint (in order to support the *
operation), that the result of a*a
will be the same type t
, and that this implies that b*b
and so b
are also of this same type t
(since only two values of the same type can be added together with +
), and that a*a + b*b
is therefore of type t
, and so the result of the sqrt
must also be of this same type t
which must incidentally have a Floating t
constraint to support the sqrt
operation. The information collected and type relationships deduced during this type checking allow the compiler to infer a general type signature for the hypo
function automatically, namely:
hypo :: (Floating t) => t -> t -> t
(The Num t
constraint doesn't appear because it's implied by Floating t
).
Because the compiler can learn the type signatures of (most) bound names, like hypo
, automatically as a side-effect of the type-checking operation, there's no fundamental need for the programmer to explicitly supply this information, and that's the motivation for the language making type signatures optional. The only requirements the language places on type signatures is that if they are supplied, they must appear in the same declaration list as the associated binding (e.g., both must appear in the same module, or in the same where
clause or whatever, and you can't have a type signature without a binding), there must be at most one type signature for a binding (no duplicate type signatures, even if they are identical, unlike in C, say), and the type supplied in the type signature must not be in conflict with the results of type checking.
The language allows the type signature and binding to appear anywhere in the same declaration list, in any order, and with no requirement they be next to each other, so the following is valid Haskell code:
double :: (Num a) => a -> a
half x = x / 2
double x = x + x
half :: (Fractional a) => a -> a
Such silliness is not recommended, however, and the convention is to place the type signature immediately before the corresponding binding, though one exception is to have a type signature shared across multiple bindings of the same type, whose definitions follow:
ex1, ex2, ex3 :: Tree Int
ex1 = Leaf 1
ex2 = Node (Leaf 2) (Leaf 3)
ex3 = Node (Node (Leaf 4) (Leaf 5)) (Leaf 5)
In some situations, the compiler cannot automatically infer the correct type for a binding, and a type signature may be required. The following binding requires a type signature and won't compile without it. (The technical problem is that toList
is written using polymorphic recursion.)
data Binary a = Leaf a | Pair (Binary (a,a)) deriving (Show)
-- following type signature is required...
toList :: Binary a -> [a]
toList (Leaf x) = [x]
toList (Pair b) = concatMap (\(x,y) -> [x,y]) (toList b)
In other situations, the compiler can automatically infer the correct type for a binding, but the type can't be expressed in a type signature (at least, not without some GHC extensions to the standard language). This happens most often in where
clauses. (The technical problem is that type variables aren't scoped, and go
's type involves the type variable a
from the type signature of myLookup
.)
myLookup :: Eq a => a -> [(a,b)] -> Maybe b
myLookup k = go
where -- go :: [(a,b)] -> Maybe b
go ((k',v):rest) | k == k' = Just v
| otherwise = go rest
go [] = Nothing
There's no type signature in standard Haskell for go
that would work here. However, if you enable an extension, you can write one if you also modify the type signature for myLookup
itself to scope the type variables.
myLookup :: forall a b. Eq a => a -> [(a,b)] -> Maybe b
myLookup k = go
where go :: [(a,b)] -> Maybe b
go ((k',v):rest) | k == k' = Just v
| otherwise = go rest
go [] = Nothing
It's considered best practice to put type signatures on all top-level bindings and use them sparingly elsewhere. The -Wall
compiler flag turns on the -Wmissing-signatures
warning which warns about any missing top-level signatures.
The main motivation, I think, is that top-level bindings are the ones that are most likely to be used in multiple places throughout the code at some distance from where they are defined, and the type signature usually provides concise documentation for what a function does and how it's intended to be used. Consider the following type signatures from a Sudoku solver I wrote many years ago. Is there much doubt what these functions do?
possibleSymbols :: Index -> Board -> [Symbol]
possibleBoards :: Index -> Board -> [Board]
setSymbol :: Index -> Board -> Symbol -> Board
While the type signatures auto-generated by the compiler also serve as decent documentation and can be inspected in GHCi, it's convenient to have the type signatures in the source code, as a form of compiler-checked comment documenting the binding's purpose.
Any Haskell programmer who's spent a moment trying to use an unfamiliar library, read someone else's code, or read their own past code knows how helpful top-level signatures are as documentation. (Admittedly, a frequently levelled criticism of Haskell is that sometimes the type signatures are the only documentation for a library.)
A secondary motivation is that in developing and refactoring code, type signatures make it easier to "control" the types and localize errors. Without any signatures, the compiler can infer some really crazy types for code, and the error messages that get generated can be baffling, often identifying parts of the code that have nothing to do with the underlying error.
For example, consider this program:
data Tree a = Leaf a | Node (Tree a) (Tree a)
leaves (Leaf x) = x
leaves (Node l r) = leaves l ++ leaves r
hasLeaf x t = elem x (leaves t)
main = do
-- some tests
print $ hasLeaf 1 (Leaf 1)
print $ hasLeaf 1 (Node (Leaf 2) (Leaf 3))
The functions leaves
and hasLeaf
compile fine, but main
barfs out the following cascade of errors (abbreviated for this posting):
Leaves.hs:12:11-28: error: • Ambiguous type variable ‘a0’ arising from a use of ‘hasLeaf’ prevents the constraint ‘(Eq a0)’ from being solved. Probable fix: use a type annotation to specify what ‘a0’ should be. Leaves.hs:12:19: error: • Ambiguous type variable ‘a0’ arising from the literal ‘1’ prevents the constraint ‘(Num a0)’ from being solved. Probable fix: use a type annotation to specify what ‘a0’ should be. Leaves.hs:12:27: error: • No instance for (Num [a0]) arising from the literal ‘1’ Leaves.hs:13:11-44: error: • Ambiguous type variable ‘a1’ arising from a use of ‘hasLeaf’ prevents the constraint ‘(Eq a1)’ from being solved. Probable fix: use a type annotation to specify what ‘a1’ should be. Leaves.hs:13:19: error: • Ambiguous type variable ‘a1’ arising from the literal ‘1’ prevents the constraint ‘(Num a1)’ from being solved. Probable fix: use a type annotation to specify what ‘a1’ should be. Leaves.hs:13:33: error: • No instance for (Num [a1]) arising from the literal ‘2’
With programmer-supplied top-level type signatures:
leaves :: Tree a -> [a]
leaves (Leaf x) = x
leaves (Node l r) = leaves l ++ leaves r
hasLeaf :: (Eq a) => a -> Tree a -> Bool
hasLeaf x t = elem x (leaves t)
a single error is immediately localized to the offending line:
leaves (Leaf x) = x
^
Leaves.hs:4:19: error:
• Occurs check: cannot construct the infinite type: a ~ [a]
Beginners might not understand the "occurs check" but are at least looking at the right place to make the simple fix:
leaves (Leaf x) = [x]
So, why not add type signatures everywhere, not just at top-level? Well, if you literally tried to add type signatures everywhere they were syntactically valid, you'd be writing code like:
{-# LANGUAGE ScopedTypeVariables #-}
hypo :: forall t. (Floating t) => t -> t -> t
hypo (a :: t) (b :: t) = sqrt (((a :: t) * (a :: t) :: t) + ((b :: t) * (b :: t) :: t) :: t) :: t
so you want to draw the line somewhere. The main argument against adding them for all bindings in let
and where
clauses is that those bindings are often short bindings easily understood at a glance, and they're localized to the code that you're trying to understand "all at once" anyway. The signatures are also potentially less useful as documentation because bindings in these clauses are more likely to refer to and use other nearby bindings of arguments or intermediate results, so they aren't "self-contained" like a top-level binding. The signature only documents a small portion of what the binding is doing. For example, in:
qsort :: (Ord a) => [a] -> [a]
qsort (x:xs) = qsort l ++ [x] ++ qsort r
where -- l, r :: [a]
l = filter (<=x) xs
r = filter (>x) xs
qsort [] = []
having type signatures l, r :: [a]
in the where
clause wouldn't add very much. There's also the additional complication that you'd need the ScopedTypeVariables
extension to write it, as above, so that's maybe another reason to omit it.
As I say, I think any Haskell beginner should be encouraged to adopt a similar convention of writing top-level type signatures, ideally writing the top-level signature before starting to write the accompanying bindings. It's one of the easiest ways to leverage the type system to guide the design process and write good Haskell code.
© 2022 - 2024 — McMap. All rights reserved.