What constitutes codata in the context of programming?
Asked Answered
S

1

11

This is a corecursive algorithm, because with each iteration it calls itself on data that is greater then what it had before:

iterate f x =  x : iterate f (f x)

It is similar to tail recursion accumulator style, but its accumulator is implicit instead of being passed as an argument. And it would be infinite if it weren't for lazyness. So is codata just the result of a value constructor in WHNF, kind of like (a, thunk)? Or is codata rather a mathematical term from category theory, which hasn't a useful representation in the programming domain?

Follow-up question: Is value recursion just a synonym for corecursion?

Superheat answered 10/5, 2020 at 19:42 Comment(5)
Codata is a term from theory en.wikipedia.org/wiki/Coinduction#Codata, specifically used wrt 'Total Functional Programming en.wikipedia.org/wiki/Total_functional_programming. See the 2004 paper from Turner linked from that page.Kingston
Not really an answer, but an intuition that I've sometimes felt helpful: data is those in-memory structures which are produced by a finite number of constructor applications, and codata is those in-memory structures which are consumed by a finite number of pattern matches.Gadolinite
@DanielWagner This is actually quite helpful.The distinction is made according to the application context, namely whether data is introduced or eliminated. So codata isn't related to corecursion at all.Superheat
@bob Not necessarily. One can make the distinction that recursive functions can be called on data, and co-recursive functions can be called on codata. Haskell doesn't differentiate between recursion and corecursion, but some languages (Idris?) do.Isthmus
Here is a more specific example to @DanielWagner comment, which might be helpful: The introduction rule of a List is constructor application and the elimination rule is recursion (or a fold). The introduction rule of a Stream is corecursion and the elimination rule is pattern matching. List encodes data whereas Stream encodes codata, but Haskell doesn't make this distinction at the type level.Superheat
S
10

I think answering your questions requires a lot of explanation, so here's a big long answer with specific answers to your questions at the end.

Data and codata have formal mathematical definitions in terms of category theory, so it's not just a matter of how they are used in a program (i.e., not just the "application context" you mentioned in the comments). It may seem this way in Haskell because the language's features (specifically, non-termination and laziness) end up blurring the distinction, so in Haskell, all data is also codata and vice versa, but it doesn't have to be this way, and there are languages that make the distinction clearer.

Both data and codata do have useful representations in the programming domain, and those representations give rise to natural relationships to recursion and corecursion.

It's quite hard to explain these formal definitions and representations without quickly getting technical, but roughly speaking, a data type for, say, a list of integers, is a type L together with a constructor function:

makeL :: Either () (Int, L) -> L

that is somehow "universal" in that it can fully represent any such construction. (Here, you want to interpret the LHS type Either () (Int, L) to mean that a list L is either the empty list Left () or a pair Right (h, t) consisting of the head element h :: Int and a tail list t :: L.)

To start with a counterexample, L = Bool is not the data type we're looking for, because even though you could write:

foo :: Either () (Int, Bool) -> Bool
foo (Left ()) = False
foo (Right (h, t)) = True

to "construct" a Bool, this can't fully represent any such construction. For example, the two constructions:

foo (Right (1, foo (Left ()))) = True
foo (Right (2, foo (Left ()))) = True

give the same Bool value, even though they used different integers, so this Bool value is insufficient to fully represent the construction.

In contrast, the type [Int] is an appropriate data type because the (almost trivial) constructor function:

makeL :: Either () (Int, [Int]) -> [Int]
makeL (Left ()) = []
makeL (Right (h, t)) = h : t

fully represents any possible construction, creating a unique value for each one. So, it's somehow the "natural" construction for the type signature Either () (Int, L) -> L.

Similarly, a codata type for a list of integers would be a type L together with a destructor function:

eatL :: L -> Either () (Int, L)

that is somehow "universal" in the sense that it can represent any possible destruction.

Again, starting with a counterexample, a pair (Int, Int) is not the codata type we're looking for. For example, with the destructor:

eatL :: (Int, Int) -> Either () (Int, (Int, Int))
eatL (a, b) = Right (a, (b, a))

we can represent the destruction:

let p0 = (1, 2)
    Right (1, p1) = eatL p0
    Right (2, p2) = eatL p1
    Right (1, p3) = eatL p2
    Right (2, p4) = eatL p3
...continue indefinitely or stop whenever you want...

but we can't represent the destruction:

let p0 = (?, ?)
    Right (1, p1) = eatL p0
    Right (2, p2) = eatL p1
    Right (3, p3) = eatL p2
    Left () = eatL p3

On the other hand, in Haskell, the list type [Int] is an appropriate codata type for a list of integers, because the destructor:

eatL :: [Int] -> Either () (Int, [Int])
eatL (x:xs) = Right (x, xs)
eatL [] = Left ()

can represent any possible destruction (including both finite or infinite destructions, thanks to Haskell's lazy lists).

(As evidence that this isn't all hand-waving and in case you want to relate it back to the formal math, in technical category theory terms, the above is equivalent to saying that the list-like endofunctor:

F(A) = 1 + Int*A   -- RHS equivalent to "Either () (Int,A)"

gives rise to a category whose objects are constructor functions (AKA F-algebras) 1 + Int*A -> A. A data type associated with F is an initial F-algebra in this category. F also gives rise to another category whose objects are destructor functions (AKA F-coalgebras) A -> 1 + Int*A. A codata type associated with F is a final F-coalgebra in this category.)

In intuitive terms, as suggested by @DanielWagner, a data type is a way of representing any construction of a list-like object, while a codata type is a way of representing any destruction of a list-like object. In languages where data and codata are different, there's a fundamental asymmetry -- a terminating program can only construct a finite list, but it can destruct (the first part of) an infinite list, so data must be finite, but codata can be finite or infinite.

This leads to another complication. In Haskell, we can use makeL to construct an infinite list like so:

myInfiniteList = let t = makeL (Right (1, t)) in t

Note that this would not be possible if Haskell didn't allow lazy evaluation of non-terminating programs. Because we can do this, by the formal definition of "data", a Haskell list-of-integer data type must also include infinite lists! That is, Haskell "data" can be infinite.

This probably conflicts with what you might read elsewhere (and even with the intuition that @DanielWagner provided), where "data" is used to refer to finite data structures only. Well, because Haskell is a little weird and because infinite data isn't allowed in other languages where data and codata are distinct, when people talk about "data" and "codata" (even in Haskell) and are interested in drawing a distinction, they may use "data" to refer to finite structures only.

The way recursion and corecursion fit in to this is that the universality properties naturally give us "recursion" to consume data and "corecursion" to produce codata. If L is a list-of-integer data type with constructor function:

makeL :: Either () (Int, L) -> L

then one way of consuming a list L to produce a Result is to define a (non-recursive) function:

makeResult :: Either () (Int, Result) -> Result

Here, makeResult (Left ()) gives the intended result for an empty list, while makeResult (Right (h, t_result)) gives the intended result for a list whose head element is h :: Int and whose tail would give the result t_result :: Result.

By universality (i.e., the fact that makeL is an initial F-algebra), there exists a unique function process :: L -> Result that "implements" makeResult. In practice, it will be implemented recursively:

process :: [Int] -> Result
process [] = makeResult (Left ())
process (h:t) = makeResult (Right (h, process t))

Conversely, if L is a list-of-integer codata type with destructor function:

eatL :: L -> Either () (Int, L)

then one way of producing a list L from a Seed is to define a (non-recursive) function:

unfoldSeed :: Seed -> Either () (Int, Seed)

Here, unfoldSeed should produce a Right (x, nextSeed) for each desired integer, and produce Left () to terminate the list.

By universality (i.e., the fact that eatL is a final F-coalebra), there exists a unique function generate :: Seed -> L that "implements" unfoldSeed. In practice, it will be implemented corecursively:

generate :: Seed -> [Int]
generate s = case unfoldSeed s of
  Left () -> []
  Right (x, s') -> x : generate s'

So, with all that said, here are the answers to your original questions:

  • Technically, iterate f is corecursive because it's the unique codata-producing function Int -> [Int] that implements:

    unfoldSeed :: Seed -> Either () (Int, Seed)
    unfoldSeed x = Right (x, f x)
    

    by means of generate as defined above.

  • In Haskell, corecursion that produces codata of type [a] relies on laziness. However, strict codata representations are possible. For example, the following codata representation works fine in Strict Haskell and can be safely fully evaluated.

    data CoList = End | CoList Int (() -> CoList)
    

    The following corecursive function produces a CoList value (and I made it finite just for fun -- it's easy to produce infinite codata values, too):

    countDown :: Int -> CoList
    countDown n | n > 0 = CoList n (\() -> countDown (n-1))
                | otherwise = End
    
  • So, no, codata isn't just the result of values in WHNF with form (a, thunk) or similar and corecursion is not synonymous with value recursion. However, WHNF and thunks provide one possible implementation and are the implementation-level reason that a "standard" Haskell list data type is also a codata type.

Seise answered 12/5, 2020 at 20:10 Comment(2)
Such knowlegde is invaluable. Thank you for sharing and taking the time. I hope others can also develop an intuition.Superheat
I finally got the time to re-read your answer. I tried to construct a delimitation between data/codata (and recursion/corecursion respectively) with a language that does not have this very delimitation. Since my mathematical understanding isn't sufficient for CT, I was lost. Much obliged.Superheat

© 2022 - 2024 — McMap. All rights reserved.