Scala: difference between a typeclass and an ADT?
Asked Answered
B

3

18

What are the differences between typeclasses and Abstract Data Types?

I realize this is a basic thing for Haskell programmers, but I come from a Scala background, and would be interested in examples in Scala. The best I can find right now is that typeclasses are "open" and ADT's are "closed". It would also be helpful to compare and contrast typeclasses with structural types.

Brede answered 29/9, 2013 at 18:48 Comment(2)
Do you mean algebraic data types?Dishevel
Sorry, I guess in the OO world, ADT tends to mean Abstract Data Type but in the FP world it means Algebraic Data Type, so I got a bit confused. Thanks to everybody for clearing this up.Brede
K
63

ADTs (which in this context are not Abstract Data Types, which is even another concept, but Algebraic Data Types) and type classes are completely different concepts which solve different problems.

ADT, as follows from the acronym, is a data type. ADTs are needed to structure your data. The closest match in Scala, I think, is a combination of case classes and sealed traits. This is the primary mean of constructing complex data structures in Haskell. I think the most famous example of ADT is Maybe type:

data Maybe a = Nothing | Just a

This type has a direct equivalent in standard Scala library, called Option:

sealed trait Option[+T]
case class Some[T](value: T) extends Option[T]
case object None extends Option[Nothing]

This is not exactly how Option is defined in the standard library, but you get the point.

Basically ADT is a combination (in some sense) of several named tuples (0-ary, as Nothing/None; 1-ary, as Just a/Some(value); higher arities are possible too).

Consider the following data type:

-- Haskell
data Tree a = Leaf | Branch a (Tree a) (Tree a)
// Scala
sealed trait Tree[+T]
case object Leaf extends Tree[Nothing]
case class Branch[T](value: T, left: Tree[T], right: Tree[T]) extends Tree[T]

This is simple binary tree. Both of these definitions read basically as follows: "A binary tree is either a Leaf or a Branch; if it is a branch, then it contains some value and two other trees". What this means is that if you have a variable of type Tree then it can contain either a Leaf or a Branch, and you can check which one is there and extract contained data, if needed. The primary mean for such checks and extraction is pattern matching:

-- Haskell
showTree :: (Show a) => Tree a -> String
showTree tree = case tree of
  Leaf                    -> "a leaf"
  Branch value left right -> "a branch with value " ++ show value ++ 
                             ", left subtree (" ++ showTree left ++ ")" ++
                             ", right subtree (" ++ showTree right ++ ")"
// Scala
def showTree[T](tree: Tree[T]) = tree match {
  case Leaf => "a leaf"
  case Branch(value, left, right) => s"a branch with value $value, " +
                                     s"left subtree (${showTree(left)}), " +
                                     s"right subtree (${showTree(right)})"
}

This concept is very simple but also very powerful.

As you have noticed, ADTs are closed, i.e. you cannot add more named tuples after the type has been defined. In Haskell this is enforced syntactically, and in Scala this is achieved via sealed keyword, which disallows subclasses in other files.

These types are called algebraic for a reason. Named tuples can be considered as products (in mathematical sense) and 'combinations' of these tuples as a summation (also in mathematical sense), and such consideration has deep theoretical meaning. For example, aforementioned binary tree type can be written like this:

Tree a = 1 + a * (Tree a) * (Tree a)

But I think this is out of scope for this question. I can search for some links if you want to know more.


Type classes, on the other hand, are a way to define polymorphic behavior. Roughly type classes are contracts which certain type provides. For example, you know that your value x satisfies a contract which defines some action. Then you can call that method, and actual implementation of that contract is then chosen automatically.

Usually type classes are compared with Java interfaces, for example:

-- Haskell
class Show a where
    show :: a -> String
// Java
public interface Show {
    String show();
}
// Scala
trait Show {
  def show: String
}

Using this comparison, instances of type classes match with implementation of interfaces:

-- Haskell
data AB = A | B

instance Show AB where
  show A = "A"
  show B = "B"
// Scala
sealed trait AB extends Show
case object A extends AB {
  val show = "A"
}
case object B extends AB {
  val show = "B"
}

There are very important differences between interfaces and type classes. First, you can write custom type class and make any type an instance of it:

class MyShow a where
  myShow :: a -> String

instance MyShow Int where 
  myShow x = ...

But you cannot do such thing with interfaces, that is, you cannot make an existing class implement your interface. This feature, as you also have noticed, means that type classes are open.

This ability to add type class instance for existing types is a way to solve expression problem. Java language does not have means for solving it, but Haskell, Scala or Clojure have.

Another difference between type classes and interfaces is that interfaces are polymorphic only on first argument, namely, on implicit this. Type classes are not restricted in this sense. You can define type classes which dispatch even on return value:

class Read a where
  read :: String -> a

It is impossible to do this with interfaces.

Type classes can be emulated in Scala using implicit parameters. This pattern is so useful that in recent Scala versions there is even a special syntax which simplifies its usage. Here is how it is done:

trait Showable[T] {
  def show(value: T): String
}

object ImplicitsDecimal {
  implicit object IntShowable extends Showable[Int] {
    def show(value: Int) = Integer.toString(value)
  }
}

object ImplicitsHexadecimal {
  implicit object IntShowable extends Showable[Int] {
    def show(value: Int) = Integer.toString(value, 16)
  }
}

def showValue[T: Showable](value: T) = implicitly[Showable[T]].show(value)
// Or, equivalently:
// def showValue[T](value: T)(implicit showable: Showable[T]) = showable.show(value)

// Usage
{
  import ImplicitsDecimal._
  println(showValue(10))  // Prints "10"
}
{
  import ImplicitsHexadecimal._
  println(showValue(10))  // Prints "a"
}

Showable[T] trait corresponds to type class, and implicit objects definitions correspond to its instances.

As you can see, type classes are a kind of interface, but more powerful. You can even select different implementations of type classes, while the code which uses them remains the same. This power, however, comes at the cost of boilerplate and extra entities.

Note that it is possible to write Haskell equivalent of above Scala program, but it will require writing multiple modules or newtype wrappers, so I'm not presenting it here.

BTW, Clojure, a Lisp dialect working on JVM, has protocols, which combine interfaces and type classes. Protocols are dispatched on single first argument, but you can implement a protocol for any existing type.

Kuntz answered 29/9, 2013 at 20:36 Comment(2)
The Algebra of Algebraic Data Types, Part 1 is a good place to start if you are interested in this number/datatype correspondence.Margret
Great answer, thanks for taking the time to write examples in Scala as well as Haskell.Brede
I
8

Your question actually touches on three distinct concepts: typeclasses, abstract data types and algebraic data types. Confusingly enough, both "abstract" and "algebraic" data types can be abbreviated as "ADT"; in a Haskell context, ADT almost always means "algebraic".

So let's define all three terms.

An algebraic data type (ADT), is a type that can be made by combining simpler types. The core idea here is a "constructor", which is a symbol that defines a value. Think of this like a value in a Java-style enum, except it can also take arguments. The simplest algebraic data type has just one constructor with no arguments:

data Foo = Bar

there is only one¹ value of this type: Bar. By itself, this is not very interesting; we need some way to build up bigger types.

The first way is to give our constructor arguments. For example, we can have our Bars take an int and a string:

data Foo = Bar Int String

Now Foo has many different possible values: Bar 0 "baz", Bar 100 "abc" and so on. A more realistic example might be a record for an employee, looking something like this:

data Employee = Employee String String Int

The other way to build up more complicated types is by having multiple constructors to choose from. For example, we can have both a Bar and a Baz:

data Foo = Bar
         | Baz

Now values of type Foo can be either Bar or Baz. This is in fact exactly how booleans work; Bool is defined as follows:

data Bool = True
          | False

It works exactly how you'd expect. Really interesting types can use both methods to combine themselves. As a rather contrived example, imagine shapes:

data Shape = Rectangle Point Point
           | Circle Point Int

A shape can either be a rectangle, defined by its two corners, or a circle which is a center and a radius. (We'll just define Point as (Int, Int).) Fair enough. But here, we run into a snag: it turns out that other shapes also exist! If some heretic who believes in triangles wants to use our type in their model, can they add a Triangle constructor after the fact? Unfortunately not: in Haskell, algebraic data types are closed, which means you cannot add new alternatives after the fact.

One important thing you can do with an algebraic data type is pattern match on it. This basically means being able to branch on the alternatives of an ADT. As a very simple example, instead of using an if expression, you could pattern match on Bool:

case myBool of
  True  → ... -- true case
  False → ... -- false case

If your constructors have arguments, you can also access those values by pattern matching. Using Shape from above, we can write a simple area function:

area shape = case shape of
  Rectange (x₁, y₁) (x₂, y₂) → (x₂ - x₁) * (y₂ - y₁)
  Circle _ r                 → π * r ^ 2

The _ just means we don't care about the value of a point's center.

This is just a basic overview of algebraic data types: it turns out there's quite a bit more fun to be had. You might want to take a look at the relevant chapter in Learn You a Haskell (LYAH for short) for more reading.

Now, what about abstract data types? This refers to a different concept. An abstract data type is one where the implementation is not exposed: you don't know what the values of the type actually look like. The only thing you can do with it is apply functions exported from its module. You can't pattern match on it or construct new values yourself. A good example in practice is Map (from Data.Map). The map is actually a particular kind of binary search tree, but nothing in the module lets you work with the tree structure directly. This is important because the tree needs to maintain certain additional invariants which you could easily mess up. So you only ever use Map as an opaque blob.

Algebraic and abstract types are somewhat orthogonal concepts; it's rather unfortunate that their names make it so easy to mistake one for the other.

The final piece of the puzzle is the typeclass. A typeclass, unlike algebraic and abstract data types, is not a type itself. Rather, think of a typeclass as a set of types. In particular, a typeclass is the set of all types that implement certain functions.

The simplest example is Show, which is the class of all types that have a string representation; that is, all types a for which we have a function show ∷ a → String. If a type has a show function, we say it is "in Show"; otherwise, it isn't. Most types you know like Int, Bool and String are all in Show; on the other hand, functions (any type with a ) are not in Show. This is why GHCi cannot print a function.

A typeclass is defined by which functions a type needs to implement to be part of it. For example, Show could be defined² just by the show function:

class Show a where
  show ∷ a → String

Now to add a new type like Foo to Show, we have to write an instance for it. This is the actual implementation of the show function:

instance Show Foo where
  show foo = case foo of
    Bar → "Bar"
    Baz → "Baz"

After this, Foo is in Show. We can write an instance for Foo anywhere. In particular, we can write new instances after the class has been defined, even in other modules. This is what it means for typeclasses to be open; unlike algebraic data types, we can add new things to typeclasses after the fact.

There is more to typeclasses too; you can read about them in the same LYAH chapter.

¹ Technically, there is another value called ⊥ (bottom) as well, but we'll ignore it for now. You can learn about ⊥ later.

² In reality, Show actually has another possible function that takes a list of as to a String. This is basically a hack to make strings look pretty since a string is just a list of Chars rather than its own type.

Introspection answered 29/9, 2013 at 21:0 Comment(0)
B
6

The difference between a type class and an ADT is:

  • Use type classes when you want to dispatch a method based off of something's type
  • Use ADTs when you want to dispatch a method based off of something's value

For example, consider the print function:

print :: (Show a) => a -> IO ()

Types are static and cannot change throughout the lifetime of a program, therefore when you use a type class the method you use is chosen statically at compile time based on the inferred type at the call site. So in this example I know that I am using the Char instance for Show without even running the program:

main = print 'C'

ADTs let you change a function's behavior dynamically. For example, I could define:

print2 :: Either Char String -> IO ()
print2 (Left  c  ) = putStrLn [c]
print2 (Right str) = putStrLn str

Now, if I call print2 in some context:

print2 e

... I can't know which branch the print2 takes unless I know the runtime value of e. If the e is a Left then I take the Left branch and if e is a Right then I take the Right branch. Sometimes I can statically reason about which constructor e will be, but sometimes I cannot, such as in the following example:

main = do
    e <- readLn  -- Did I get a 'Left' or 'Right'?
    print2 e     -- Who knows until I run the program
Bagley answered 29/9, 2013 at 20:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.