There are four simple ways to encode sum types:
- Directly, if your programming language supports them
- "Church encoding"
- "Final style"
- The OO pattern
We'll introduce them and discuss their pros and cons, focusing on open (extensible) sum-types.
Direct
All the up and coming programming languages support sum types, by feature if not by name:
Language | Feature |
---|---|
F# | Discriminated Unions |
Elm | Variants |
Rust | Enumerations |
Swift | Enumerations |
Kotlin | Sealed Classes |
Haskell | Algebraic Data Types |
We'll use Haskell to demonstrate them:
data Shape
= Rect { width :: Float, height :: Float }
| Circle { radius :: Float }
area :: Shape -> Float
Rect w h) = w * h
area (Circle r) = pi * r * r
area (
> area (Rect 3 5)
15
Church encoding
How would we encode our type in legacy languages which don't support sum types?
Famous minimalist Alonzo Church has devised a method:
{-# LANGUAGE RankNTypes #-}
data ShapeHandlers r = ShapeHandlers
handleRect :: Float -> Float -> r
{ handleCircle :: Float -> r
,
}
type Shape = (forall a. ShapeHandlers a -> a)
rect :: Float -> Float -> Shape
= handleRect handlers w h
rect w h handlers
circle :: Float -> Shape
= handleCircle handlers r
circle r handlers
area :: Shape -> Float
=
area shape
shapeShapeHandlers
= \w h -> w * h
{ handleRect = \r -> pi * r * r
, handleCircle
}
> area (rect 3 5)
15
While originally intended for use in his minimal programming language "Lambda Calculus", this encoding is suitable for most popular languages. Java/C# supports it via abstract generic methods. In C++ or Go we'll have to resort to casts or side-effects to encode this (the Visitor pattern).
Final style: Extending church-encodings using type classes
We can encode the record from the church encoding using a type-class:
{-# LANGUAGE RankNTypes #-}
class ShapeHandlers r where
rect :: Float -> Float -> r
circle :: Float -> r
type Shape = (forall a. ShapeHandlers a => a)
newtype Area = Area { area :: Float }
instance ShapeHandlers Area where
= Area (w * h)
rect w h = Area (pi * r * r)
circle r
> area (rect 3 5)
15
The main benefit of using this encoding is that type class
constraints are trivially composable, which translates to encoding
extensible sum-types! For example:
(forall a. (ShapeHandlers a, MoreHandlers a) => a)
With a small modification (avoiding universal quantification) this
becomes Carette et al's "Final Style",
which is also commonly known as "mtl
style".
A similar encoding in OO languages, using interfaces instead of type classes is Oliviera et al's "Object Algebras".
The OO pattern
This is a common way to represent sum types in object oriented languages:
data Rect = Rect { width :: Float, height :: Float }
data Circle = Circle { radius :: Float }
class Area a where
area :: a -> Float
instance Area Rect where
Rect w h) = w * h
area (
instance Area Circle where
Circle r) = pi * r * r
area (
> area (Rect 3 5)
15
This encoding is naturally open! We can add shapes as we please, and in posh languages which support type-classes or traits we can also add more operations on them without modifying existing code.
Putting these approaches to the test
Let's explore how these approaches fare against some simple challenges.
Supporting new shapes
Suppose we wanted to write code that can support more types of shapes, without modifying our shape data definition (aka the Expression problem).
The direct sum-type can't be extended, nor can its church encoding. But the final and OO styles can.
Final style:
class CompositeHandler r where
composite :: r -> r -> r
instance CompositeHandler Area where
Area x) (Area y) = Area (x + y)
composite (
> area (composite (rect 3 5) (circle 1))
18.141592
OO style:
data Composite a b = Composite { first :: a, second :: b }
instance (Area a, Area b) => Area (Composite a b) where
Composite x y) = area x + area y
area (
> area (Composite (Rect 3 5) (Circle 1))
18.141592
Collections
If we wanted to encode a list of shapes:
- A final style list,
[(forall a. (ShapeHandlers a, CompositeHandler a) => a)]
, uses a universal quantifier and closes the list of supported variants - An OO style list will have to use an existensial quantifier and close the list of supported operations
Operations on more than one value
The area
operation discussed earlier converts a given
value to a result. But what if we wanted an operation that processes two
shapes, like generating a diff of them?
This is where all styles discussed above fall short as far as I'm aware.
The fifth approach: Composition of atoms
All the approaches discussed above failed when put to the test. The following approach fares better -
We build upon the OO approach's basic shapes and combine them into a
concrete Shape
sum type:
{-# LANGUAGE DeriveGeneric, DeriveAnyClass #-}
data Shape
= SRect Rect
| SCircle Circle
deriving (Generic, Area)
We use Generic
and DefaultSignatures
based
derivations to derive our class instances (the derivation of
Area
is left as an exercise for the reader).
This approach allows us to implement our basic types, operations, and instances in a modular way, while only closing the type at the top-level. It does allow us to implement operations on more than one value (such as diffs), and we can encode a list in either of the styles supported by Final or OO styles.
Discussion:
(image credit: MissMarvel50sWorld)