Haskell: surprising behavior of "groupBy"
Asked Answered
T

4

15

I'm trying to figure out the behavior of the library function groupBy (from Data.List), which purports to group elements of a list by an "equality test" function passed in as the first argument. The type signature suggests that the equality test just needs to have type

(a -> a -> Bool)

However, when I use (<) as the "equality test" in GHCi 6.6, the results are not what I expect:

ghci> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]

Instead I'd expect runs of strictly increasing numbers, like this:

[[1,2,3],[2,4],[1,5,9]]

What am I missing?

Timber answered 22/8, 2009 at 16:27 Comment(0)
C
34

Have a look at the ghc implementation of groupBy:

groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

Now compare these two outputs:

Prelude List> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Prelude List> groupBy (<) [8, 2, 3, 2, 4, 1, 5, 9]
[[8],[2,3],[2,4],[1,5,9]]

In short, what happens is this: groupBy assumes that the given function (the first argument) tests for equality, and thus assumes that the comparison function is reflexive, transitive and symmetric (see equivalence relation). The problem here is that the less-than relation is not reflexive, nor symmetric.


Edit: The following implementation only assumes transitivity:

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _   []                        = []
groupBy' _   [x]                       = [[x]]
groupBy' cmp (x:xs@(x':_)) | cmp x x'  = (x:y):ys
                           | otherwise = [x]:r
  where r@(y:ys) = groupBy' cmp xs
Cortex answered 22/8, 2009 at 16:34 Comment(2)
Thank you. I hadn't realized that the documentation requires that an equality test be an equivalence relation.Timber
It doesn't say that it has to be an equivalence relation. In fact, there are useful things you can do with it using non-equivalence-relations. e.g. https://mcmap.net/q/821714/-functional-paragraphs/…Lancaster
S
9

The fact that "<" isn't an equality test.

You might expect some behavior because you'd implement it differently, but that isn't what it promises.

An example of why what it outputs is a reasonable answer is if it sweeps through it, doing

[1, 2, 3, 2, 4, 1, 5, 9] ->
[[1,2,3], [2,4], [1,5,9]]

Now has 3 groups of equal elements. So it checks if any of them are in fact the same:

Since it knows all elements in each group is equal, it can just look at the first element in each, 1, 2 and 1.

1 > 2? Yes! So it merges the first two groups.

1 > 1? No! So it leaves the last group be.

And now it's compared all elements for equality.

...only, you didn't pass it the kind of function it expected.

In short, when it wants an equality test, give it an equality test.

Steno answered 22/8, 2009 at 16:36 Comment(0)
L
9

The problem is that the reference implementation of groupBy in the Haskell Report compares elements against the first element, so the groups are not strictly increasing (they just have to be all bigger than the first element). What you want instead is a version of groupBy that tests on adjacent elements, like the implementation here.

Lancaster answered 22/8, 2009 at 18:50 Comment(0)
A
0

I'd just like to point out that the groupBy function also requires your list to be sorted before being applied.

For example:

equalityOp :: (a, b1) -> (a, b2) -> Bool
equalityOp x y = fst x == fst y

testData = [(1, 2), (1, 4), (2, 3)]

correctAnswer = groupBy equalityOp testData == [[(1, 2), (1, 4)], [(2, 3)]]

otherTestData = [(1, 2), (2, 3), (1, 4)]

incorrectAnswer = groupBy equalityOp otherTestData == [[(1, 2)], [(2, 3)], [(1, 4)]]

This behaviour comes about because groupBy is using span in its definition. To get reasonable behaviour which doesn't rely on us having the underlying list in any particular order we can define a function:

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' eq []     = []
groupBy' eq (x:xs) = (x:similarResults) : (groupBy' eq differentResults)
    where similarResults   = filter (eq x) xs
          differentResults = filter (not . eq x) xs
Allsun answered 7/9, 2018 at 20:7 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.