Haskell: Why ++ is not allowed in pattern matching?

K

4

11

Suppose we want to write our own sum function in Haskell:

sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (x:xs) = x + sum' xs

Why can't we do something like:

sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (xs++[x]) = x + sum' xs

In other words why can't we use ++ in pattern matching ?

Kaitlynkaitlynn answered 26/2, 2018 at 16:12 Comment(2)

In case you haven't seen it, you might like prolog – Recollected 26/2, 2018 at 21:21

Curry is a neat little language where this kind of thing is possible. It's a relational programming language, which means its basic components are relations like xs == ys ++ [y] instead of functions like case xs of y:ys -> undefined; [] -> undefined – Horologe 27/2, 2018 at 19:19

R

17

You can only pattern match on constructors, not on general functions.

Mathematically, a constructor is an injective function: each combination of arguments gives one unique value, in this case a list. Because that value is unique, the language can deconstruct it again into the original arguments. I.e., when you pattern match on :, you essentially use the function

uncons :: [a] -> Maybe (a, [a])

which checks if the list is of a form you could have constructed with : (i.e., if it is non-empty), and if yes, gives you back the head and tail.

++ is not injective though, for example

Prelude> [0,1] ++ [2]
[0,1,2]
Prelude> [0] ++ [1,2]
[0,1,2]

Neither of these representations is the right one, so how should the list be deconstructed again?

What you can do however is define a new, “virtual” constructor that acts like : in that it always seperates exactly one element from the rest of the list (if possible), but does so on the right:

{-# LANGUAGE PatternSynonyms, ViewPatterns #-}

pattern (:>) :: [a] -> a -> [a]
pattern (xs:>ω) <- (unsnoc -> Just (xs,ω))
 where xs:>ω = xs ++ [ω]

unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc [x] = Just x
unsnoc (_:xs) = unsnoc xs

Then

sum' :: Num a => [a] -> a
sum' (xs:>x) = x + sum xs
sum' [] = 0

Note that this is very inefficient though, because the :> pattern-synonym actually needs to dig through the entire list, so sum' has quadratic rather than linear complexity.

A container that allows pattern matching on both the left and right end efficiently is Data.Sequence, with its :<| and :|> pattern synonyms.

Riggle answered 26/2, 2018 at 16:21 Comment(3)

Op's pattern (xs++[x]) is unambiguous though (at least I understand intuitively what it means), it's also the same idea as "N Plus K patterns" (and of course + is not injective). – Idolla 26/2, 2018 at 16:29

Yeah, it's the same idea as N Plus K Patterns, which have been purged from the language due to problems which largely come down to non-injectivity. Special cases like sum' (xs++[x]) = x + sum' xs or fact (x+1) = (x+1) * fact x can be made to work, but it just doesn't generalise properly. – Riggle 26/2, 2018 at 16:33

Right, but I'd be really interested in understanding precisely what distinguishes a pattern like op's which is in fact injective, from one that's ambiguous/invalid. I think we can use induction to prove it on a case-by-case basis but I don't really know how to reason about the question in general – Idolla 26/2, 2018 at 18:17

F

20

This is a deserving question, and it has so far received sensible answers (mutter only constructors allowed, mutter injectivity, mutter ambiguity), but there's still time to change all that.

We can say what the rules are, but most of the explanations for why the rules are what they are start by over-generalising the question, addressing why we can't pattern match against any old function (mutter Prolog). This is to ignore the fact that ++ isn't any old function: it's a (spatially) linear plugging-stuff-together function, induced by the zipper-structure of lists. Pattern matching is about taking stuff apart, and indeed, notating the process in terms of the plugger-togetherers and pattern variables standing for the components. Its motivation is clarity. So I'd like

lookup :: Eq k => k -> [(k, v)] -> Maybe v
lookup k (_ ++ [(k, v)] ++ _) = Just v
lookup _ _                    = Nothing

and not only because it would remind me of the fun I had thirty years ago when I implemented a functional language whose pattern matching offered exactly that.

The objection that it's ambiguous is a legitimate one, but not a dealbreaker. Plugger-togetherers like ++ offer only finitely many decompositions of finite input (and if you're working on infinite data, that's your own lookout), so what's involved is at worst search, rather than magic (inventing arbitrary inputs that arbitrary functions might have thrown away). Search calls for some means of prioritisation, but so do our ordered matching rules. Search can also result in failure, but so, again, can matching.

We have a sensible way to manage computations offering alternatives (failure and choice) via the Alternative abstraction, but we are not used to thinking of pattern matching as a form of such computation, which is why we exploit Alternative structure only in the expression language. The noble, if quixotic, exception is match-failure in do-notation, which calls the relevant fail rather than necessarily crashing out. Pattern matching is an attempt to compute an environment suitable for the evaluation of a 'right-hand side' expression; failure to compute such an environment is already handled, so why not choice?

(Edit: I should, of course, add that you only really need search if you have more than one stretchy thing in a pattern, so the proposed xs++[x] pattern shouldn't trigger any choices. Of course, it takes time to find the end of a list.)

Imagine there was some sort of funny bracket for writing Alternative computations, e.g., with (|) meaning empty, (|a1|a2|) meaning (|a1|) <|> (|a2|), and a regular old (|f s1 .. sn|) meaning pure f <*> s1 .. <*> sn. One might very well also imagine (|case a of {p1 -> a1; .. pn->an}|) performing a sensible translation of search-patterns (e.g. involving ++) in terms of Alternative combinators. We could write

lookup :: (Eq k, Alternative a) => k -> [(k, v)] -> a k
lookup k xs = (|case xs of _ ++ [(k, v)] ++ _ -> pure v|)

We may obtain a reasonable language of search-patterns for any datatype generated by fixpoints of differentiable functors: symbolic differentiation is exactly what turns tuples of structures into choices of possible substructures. Good old ++ is just the sublists-of-lists example (which is confusing, because a list-with-a-hole-for-a-sublist looks a lot like a list, but the same is not true for other datatypes).

Hilariously, with a spot of LinearTypes, we might even keep hold of holey data by their holes as well as their root, then plug away destructively in constant time. It's scandalous behaviour only if you don't notice you're doing it.

Flagelliform answered 27/2, 2018 at 17:3 Comment(3)

how about lookup k [..., (k,v), ...] = Just v? We could also have [x, ...], [x, ...xs...], [_, ...xs...], etc. And as a constructor too, like, sieve [p, ...xs...] = [x, ...sieve [x | x <- xs, rem x p > 0]...]. Or (++) a b = [...a..., ...b...]. One advantage is, a reader doesn't have to guess the meaning of :, :: (ML), ++, etc. - the intention is visually apparent. Hopefully. BTW I've recently stumbled on en.wikipedia.org/wiki/Refal which seems related (a pattern-matching term-rewriting language from 1960s USSR). – Jaquelin 5/3, 2018 at 8:59

The thing I did in the late 80s did, in fact, steal its notation from SNOBOL: it was a pattern matching square-brackety LISP, but variables starting with $ stood for list segments, so [$_ (k.v) $_]. But as I said, above, it's not just about lists, so I think we need to look beyond. List-centric notation can lead to list-centric thinking. – Flagelliform 5/3, 2018 at 9:55

as a curiosity, this notation (plus some list comprehensions) allows for a very succinct - 13 LOC - reformulation if the McCarthy's original paper's eval with all auxiliaries. – Jaquelin 5/3, 2018 at 12:54

R

17

You can only pattern match on constructors, not on general functions.

Mathematically, a constructor is an injective function: each combination of arguments gives one unique value, in this case a list. Because that value is unique, the language can deconstruct it again into the original arguments. I.e., when you pattern match on :, you essentially use the function

uncons :: [a] -> Maybe (a, [a])

which checks if the list is of a form you could have constructed with : (i.e., if it is non-empty), and if yes, gives you back the head and tail.

++ is not injective though, for example

Prelude> [0,1] ++ [2]
[0,1,2]
Prelude> [0] ++ [1,2]
[0,1,2]

Neither of these representations is the right one, so how should the list be deconstructed again?

What you can do however is define a new, “virtual” constructor that acts like : in that it always seperates exactly one element from the rest of the list (if possible), but does so on the right:

{-# LANGUAGE PatternSynonyms, ViewPatterns #-}

pattern (:>) :: [a] -> a -> [a]
pattern (xs:>ω) <- (unsnoc -> Just (xs,ω))
 where xs:>ω = xs ++ [ω]

unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc [x] = Just x
unsnoc (_:xs) = unsnoc xs

Then

sum' :: Num a => [a] -> a
sum' (xs:>x) = x + sum xs
sum' [] = 0

Note that this is very inefficient though, because the :> pattern-synonym actually needs to dig through the entire list, so sum' has quadratic rather than linear complexity.

A container that allows pattern matching on both the left and right end efficiently is Data.Sequence, with its :<| and :|> pattern synonyms.

Riggle answered 26/2, 2018 at 16:21 Comment(3)

Op's pattern (xs++[x]) is unambiguous though (at least I understand intuitively what it means), it's also the same idea as "N Plus K patterns" (and of course + is not injective). – Idolla 26/2, 2018 at 16:29

Yeah, it's the same idea as N Plus K Patterns, which have been purged from the language due to problems which largely come down to non-injectivity. Special cases like sum' (xs++[x]) = x + sum' xs or fact (x+1) = (x+1) * fact x can be made to work, but it just doesn't generalise properly. – Riggle 26/2, 2018 at 16:33

Right, but I'd be really interested in understanding precisely what distinguishes a pattern like op's which is in fact injective, from one that's ambiguous/invalid. I think we can use induction to prove it on a case-by-case basis but I don't really know how to reason about the question in general – Idolla 26/2, 2018 at 18:17

C

9

You can only pattern-match on data constructors, and ++ is a function, not a data constructor.

Data constructors are persistent; a value like 'c':[] cannot be simplified further, because it is a fundamental value of type [Char]. An expression like "c" ++ "d", however, can replaced with its equivalent "cd" at any time, and thus couldn't reliably be counted on to be present for pattern matching.

(You might argue that "cd" could always replaced by "c" ++ "d", but in general there isn't a one-to-one mapping between a list and a decomposition via ++. Is "cde" equivalent to "c" ++ "de" or "cd" ++ "e" for pattern matching purposes?)

Carlocarload answered 26/2, 2018 at 16:16 Comment(0)

P

6

++ isn't a constructor, it's just a plain function. You can only match on constructors.

You can use ViewPatterns or PatternSynonyms to augment your ability to pattern match (thanks @luqui).

Paraph answered 26/2, 2018 at 16:16 Comment(2)

There are ViewPatterns and PatternSynonyms – Recollected 26/2, 2018 at 21:23

@Recollected Thanks. Not sure which I was thinking of. It's been awhile since I've written Haskell. I'll edit those links in in a bit. – Paraph 26/2, 2018 at 22:21

Recommended topics

Hot tags