This tutorial will get you up to speed with GHC generics quickly. It should be noted that generics is not something academic and useless, quite the contrary, it's a very pragmatic way to reduce the amount of boilerplate (and associated with it errors) in your functional code with minimal mental overhead. In fact, by the time you get to the end of the tutorial, I hope you'll agree that generics are easy to use and easy “to get” as well. The tutorial is for intermediate-level Haskellers.
Let's start from the beginning. What are generics? Generics, like in other languages, are a way to use the same code to manipulate different data types. We should note that in this regard the technique is very close to the much more common notion of polymorphism, which in Haskell comes in two flavors:
-
Parametric polymorphism, when we have type variables in functions/data types. This allows the same function, such as
const
to work with any types of arguments, as long as the more general types from a function's signature can be made equal to concrete types we want to work with by instantiating/fixing the type variables. -
Ad-hoc polymorphism, which allows us to perform a computation abstracted over instances of one or more type classes. We request that a type has some properties of interest and then describe the computation in terms of these properties. The code is then applicable to any data type that has the properties.
How are generics different? Generics allow us to describe transformation that works in terms of not concrete types, but general combinators that describe shape and meta information about a data type. This way, we can declare how to perform a computation on almost any data type!
The idea about generics
Haskell's features that open the possibility for such a thing as generics
are type classes and ad-hoc polymorphism. The ability to describe a data
type in terms of a set of combinators is our property, encapsulated by the
Generic
type class. A data type that is an instance of the type class can
be processed by functions that work with a representation of data type, not
the data type itself. These functions, however, are by definition
polymorphic and also usually hidden behind a type class interface, so the
whole thing usually takes the form of automatic deriving of a type class
instance.
The process can be described like this:
-
A given data type gets
Generic
instance automatically, as it can be generated by the compiler with the help of theDeriveGeneric
language extension. -
There is a type class instance defined for instances of
Generic
(orGeneric1
). Since our data type is now an instance of those type classes, that generics-based instance starts to work for us and we magically get all the useful methods of that type class.
If you know about Data
and Typeable
type classes, then you probably know
that it's possible to do something similar using information which methods
of those type classes provide. Data
and Typeable
are beyond the scope of
this tutorial, but you read about them in
this blog post by Chris Done if
you're interested.
Step 2 is not as simple as it sounds though, at least not until we examine the details of this business more closely.
Combinators that represent the shape of a data type
What could they look like? Well, if I showed you the data types as they are right now, you would probably run away in fear, cursing the tutorial and Haskell's generics. I have a better idea. Let's start with the simplest thing possible and then iterate asking ourselves how to work around some more interesting use cases that we might need to support. This will force us to make the data types more hairy but also more powerful step by step, till we arrive at the definitions that are actually used.
For better or worse, algebraic data types lock us into a view on the world that consists of sums and products (another Haskell's feature that makes the existence of generics framework possible; generics do not work with GADTs for example). So we need to be able to represent the following:
-
Data types without constructors at all: uninhabited types like
Void
. This can be described asdata V1
(note: it has no constructors). -
Constructors without arguments. Let this be something like this:
data U1 = U1
. -
Sums:
data (f :+: g) = L1 f | R1 g
. If we have a sum data type with two alternatives we can represent other sum data types with any number of alternatives using nesting. -
Products:
data (f :*: g) = f :*: g
.
Not bad for a start. Let's try use this representation to derive a Functor
instance. Deriving a such an instance means that we should provide the
fmap
function which looks like this:
fmap :: (a -> b) -> Rep f -> Rep f
Here, Rep f
maps to the type of generic representation of f :: * -> *
.
There is a problem though. You see, the functor's inner type that goes from
a
to b
in this example has no place to live in the representation! Our
current approach will work only for type classes that work with things of
kind *
, such as Show
, but if we want to use this system to derive a
Functor
instance, we need to allow it to work with kind * -> *
.
We want to be able to derive Functor
(and similar) instances generically,
so we have to fix things up. The solution is to add one more type parameter
p
to all our combinators:
data V1 p
data U1 p = U1
data (f :+: g) p = L1 (f p) | R1 (g p)
data (f :*: g) p = (f p) :*: (g p)
This way fmap
would be:
fmap :: (a -> b) -> Rep f a -> Rep f b
-- p = a p = b
But what happens to the type classes that work with *
kinds? Here are our
choices:
-
Have a separate set of combinator types for each case (
*
and* -> *
kinds). -
Use the most general form (with
p
), but for*
kinds just treat the extrap
parameter as a dummy type index that has no meaning.
The authors of the generic extension went with the second option, and I can't blame them. We'll see that there are a lot of wrappers already and we really should try to keep their number from exploding any further (this may make more sense on the second read).
Let's try to map from a data type to its representation and see if we're still missing something:
data Maybe a = Nothing | Just a
-- Interestingly, we could build a representation that works on ‘Maybe a’,
-- that is, a thing of kind *, if we wanted to derive something like ‘Show’.
-- At the same time if we wanted to derive ‘Functor’, we would work with
-- ‘Maybe’ of kind * -> *. This means that there are actually two different
-- possible representations depending on our aim. This is addressed with two
-- different generics type classes, as we will see later.
-- For kind *, things like ‘Show’:
-- type: (U1 :+: ?) p
Oh, how to represent Just a
? We need to way to allow it have an argument!
Let's add the following:
data Rec0 c p = Rec0 { unRec0 :: c }
Rec
part in the type's name hints that this thing may be possibly
recursive.
But I'm lying to you, due to some historical stuff, it's defined a bit differently:
type Rec0 = K1 R
newtype K1 i c p = K1 { unK1 :: c } -- c is the value, ‘a’ in ‘Maybe a’
-- ^ ^
-- | |
-- | +-------- dummy p
-- |
-- type-level tag, R or P
You see this R
type-level tag? There used to be another one, P
and this
type Par0 = K1 P
type synonym, but forget it, it's deprecated. Bottom
line: Rec0
is used for data constuctor arguments (fields), that are
not p
parameter.
With Rec0
, we can finally build representation of Maybe a
:
-- This is the type of our representation: (U1 :+: Rec0 a) p
-- Examples of values for ‘Maybe Int’:
-- Nothing => L1 U1
-- Just 5 => R1 (K1 5) -- remember where K1 comes from?
-- ^
-- |
-- +--- L1 and R1 are from our representation of sum types
Let's derive a different representation that works with * -> *
kinds:
-- Type of our representation: (U1 :+: ?) p
We need a way to tell if we have an argument of p
type (like a
in
Functor f => f a
) or some other type, if we're to write a generic
Functor
function. For this reason the generics extension uses Par1 p
:
newtype Par1 p = Par1 { unPar1 :: p } -- “par” stands for “parameter”
Par1
is used to mark occurrences of p
. Our representation thus becomes:
-- The type of our representation: (U1 :+: Par1) p
-- Examples of values for ‘Maybe Int’:
-- Nothing => L1 U1 -- the same
-- Just 5 => R1 (Par1 5)
Almost there. The final example is that for lists. Given the standard definition of linked list:
data List a = Nil | Cons a (List a)
How do we build its representation (for kind * -> *
)? The tricky part is,
of course, List a
, which is a recursive occurrence of entire functorish
part with parameter inside it. If we mark occurrences of parameter by
wrapping them with Par1
, then why not mark this thing with something too?
Rec1
shall it be.
-- The type of representation: (U1 :+: (Par1 :*: Rec1 List)) p
-- Examples of values for ‘List Int’:
-- Nil => L1 U1
-- Cons 5 Nil => R1 (Par1 5 :*: Rec1 Nil)
-- Cons 5 (Cons 4 Nil) => R1 (Par1 5 :*: Rec1 (Cons 4 Nil))
If we had just arguments of a data constructor that are not related to
parameter p
, plain Rec0
(K1
) would be used for both first and second
arguments of Cons
.
And that's it, we have everything we need.
The Generic
and Generic1
type classes
Type classes that map types to their representations are called Generic
(for type classes that work with *
kinds) and Generic1
(for type classes
that work with * -> *
kinds). They live in the module called
GHC.Generics
together with the types used to build the data type
representations we've just discussed.
Let's see what these type classes have:
class Generic a where
type Rep a :: * -> *
from :: a -> Rep a p
to :: Rep a p -> x
class Generic1 f where
type Rep1 f :: * -> *
from1 :: f p -> Rep1 f p
to1 :: Rep1 f p -> f p
from
and from1
map values of data types to their generic
representations. Rep
and Rep1
are associated type functions (the feature
is a part of the TypeFamilies
GHC extension) that take the type of data we
want to manipulate and yield the type of its representation. Of course, if
we want to derive Functor
instances, we need a way to go back from
representation to actual value of target data type. This is done via to
and to1
. The good thing about this, of course, that GHC can derive
Generic
and Generic1
for us automatically with DeriveGeneric
language
extension enabled.
Now let's open GHCi and try to infer Rep
of some type:
λ> :t (undefined :: Rep (List a) p)
(undefined :: Rep (List a) p)
:: D1
('MetaData "List" "GenericsTutorial" "main" 'False)
(C1 ('MetaCons "Nil" 'PrefixI 'False) U1
:+: C1
('MetaCons "Cons" 'PrefixI 'False)
(S1
('MetaSel
'Nothing 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy)
(Rec0 a)
:*: S1
('MetaSel
'Nothing 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy)
(Rec0 (List a))))
p
Ah OK, there is just a little bit more to this representation thing…
Metadata wrappers
A representation also has associated metadata. It may look messy and difficult to read, but I'll explain the logic behind it in a moment, it should help to clarify why it's done like this.
First of all, metadata should not get in our way if we don't want to deal with it. Thus all metadata is attached using the same simple wrapper:
newtype M1 i c f p = M1 { unM1 :: f p } -- ‘f p’ is what lives inside, U1 for example
-- ^ ^
-- | |
-- | +---- compiler-generated data type that allows us to get meta information
-- |
-- type-level tag, see below
The i
type-level tag can be one of the three:
D
for data type metadatatype D1 = M1 D
C
for constructor metadatatype C1 = M1 C
S
for record selector metadatatype S1 = M1 S
It should be understandable why metadata is attached this way. If we want to, we can ignore metadata easily, since it's just a wrapper:
f (M1 x) = f x
If we want to look at a particular type of metadata we can specify the i
type-level tag, or we can leave it unspecified to deal with all metadata at
once.
The c
type is auto-generated by the compiler and encodes metadata on type
level. Why not store the metadata on value level, in M1
constructor? Well,
if it were there, we would have to provide it if we wanted to generate some
values, and providing metadata for already existing data type is certainly
something that only the compiler can do properly. With the current approach,
a given data type determines the type of its representation, including its
metadata, so we don't have to bother.
The last thing to mention about metadata is which wrappers are generated for you. They are very predictable:
-
The entire representation is wrapped in
D1
, which provides datatype-level information (datatype name, module name, and whether it's anewtype
). -
Every constructor is wrapped in
C1
, which provides information about constructor (such as constructor name, fixity, and whether it's a record). -
Every argument of constructor is wrapped with
S1
(even if it's not actually a record selector), which tells us the selector name.
Look
at the Haddock,
to find out names of functions that help extract the metadata. Using them is
straightforward: feed the wrapped data to functions like datatypeName
and
get the information.
Example: Functor
deriving
Of course, after talking so much about Functor
instances and adding the
clumsy p
parameter to support them, we absolutely must to derive Functor
instance for generics! Anyway, Functor
instance for generics is already
defined in GHC.Generics
, so instead of re-implementing it, let's just go
through the code:
-- If we have a parameter ‘p’, we just map over it, as expected:
instance Functor Par1 where
fmap f (Par1 p) = Par1 (f p)
-- The same with ‘Rec1’ (just use ‘fmap’ because the inner part is a ‘Functor’):
instance Functor f => Functor (Rec1 f) where
fmap f (Rec1 a) = Rec 1 (fmap f a)
-- A constructor without fields only can be returned untouched:
instance Functor U1 where
fmap _ U1 = U1
-- A field that is not ‘p’ parameter should not change:
instance Functor (K1 i c) where
fmap _ (K1 a) = a
-- Metadata has no effect, just unwrap it and continue with the inner value,
-- if the inner value is an instance of ‘Functor’:
instance Functor f => Functor (M1 i c f) where
fmap f (M1 a) = M1 (fmap f a)
-- When we have a sum, we should try to map what we get, provided that it
-- contains something that has ‘Functor’ instance:
instance (Functor l, Functor r) => Functor (l :+: r) where
fmap f (L1 a) = L1 (fmap f a)
fmap f (R1 a) = R1 (fmap f a)
-- The same for products:
instance (Functor a, Functor b) => Functor (a :*: b) where
fmap f (a :*: b) = fmap f a :*: fmap f b
Example: counting constructor fields
I think now we're dangerous enough to implement a simple (and pretty
useless) type class that will count constructor fields of a given value.
(Yeah, I could go with the venerable Encode
example here, but I like to be
“original”, so you get a dumber example instead :-D)
The type class looks like this:
class CountFields a where
-- | Return number of constuctor fields for a value.
countFields :: a -> Natural
We will start by implementing countFields
method that works on
representations:
instance CountFields (V1 p) where
countFields _ = 0
instance CountFields (U1 p) where
countFields _ = 0
instance CountFields (K1 i c p) where
countFields _ = 1
instance CountFields (f p) => CountFields (M1 i c f p) where
countFields (M1 x) = countFields x
instance (CountFields (a p), CountFields (b p)) => CountFields ((a :+: b) p) where
countFields (L1 x) = countFields x
countFields (R1 x) = countFields x
instance (CountFields (a p), CountFields (b p)) => CountFields ((a :*: b) p) where
countFields (a :*: b) = countFields a + countFields b
Let's write a single function called, say, defaultCountFields
that does
the whole thing:
defaultCountFields :: (Generic a, CountFields (Rep a)) => a -> Natural
defaultCountFields = countFields . from
But here is the catch, the code above does not compile. CountFields
has
the kind CountFields :: * -> Constraint
, but we give it Rep a
of kind * -> *
. It's not going to take our abuse like this!
The solution, typical with this setting, is to have a helper class that
works with things of * -> *
kind (this also removes the p
parameters
from signatures):
class CountFields1 f where
countFields1 :: f p -> Natural
defaultCountFields :: (Generic a, CountFields1 (Rep a)) => a -> Natural
defaultCountFields = countFields1 . from
instance CountFields1 V1 where
countFields1 _ = 0
instance CountFields1 U1 where
countFields1 _ = 0
instance CountFields1 (K1 i c) where
countFields1 _ = 1
instance CountFields1 f => CountFields1 (M1 i c f) where
countFields1 (M1 x) = countFields1 x
instance (CountFields1 a, CountFields1 b) => CountFields1 (a :+: b) where
countFields1 (L1 x) = countFields1 x
countFields1 (R1 x) = countFields1 x
instance (CountFields1 a, CountFields1 b) => CountFields1 (a :*: b) where
countFields1 (a :*: b) = countFields1 a + countFields1 b
You could notice that some data types like Par0
, Rec1
didn't get their
definitions. This is OK because we work with Generic
, not Generic1
here,
but omitting definitions for some data types that generic representations
are built from is normal. As GHC.Generics
docs say:
-
If no
:+:
instance is given, the function may still work for empty datatypes or datatypes that have a single constructor, but will fail on datatypes with more than one constructor. -
If no
:*:
instance is given, the function may still work for datatypes where each constructor has just zero or one field, in particular for enumeration types. -
If no
K1
instance is given, the function may still work for enumeration types, where no constructor has any fields. -
If no
V1
instance is given, the function may still work for any datatype that is not empty. -
If no
U1
instance is given, the function may still work for any datatype where each constructor has at least one field.
An M1
instance is always required (but it can just ignore the
meta-information).
Putting the magic into the type classes
Having dealt with generic implementation of the functionality of interest, let's put it all together and use a special GHC extension to allow the user derive type classes without knowing anything about generics.
For a generic implementation to work without user's definition we need to
provide it as the default definition. As you have already seen, a generic
implementation often involves a Generic
constraint and it would be ugly to
add it to every type class (on the left hand side of =>
) just to make
deriving easier. The default
keyword, enabled by the DefaultSignatures
language extension, allows us to give a different type signature for default
implementation of a type class:
class Functor f where
fmap :: (a -> b) -> f a -> f b
default fmap :: (Generic1 f, Functor (Rep1 f)) => (a -> b) -> f a -> f b
fmap = to1 . fmap . from1
class CountFields a where
countFields :: a -> Natural
default countFields :: (Generic a, CountFields1 (Rep a)) => a -> Natural
countFields = defaultCountFields
This way we can have our cake and eat it too: deriving is easy and otherwise no ugly details are visible!
Conclusion
Generics are a powerful means of automating error-prone and boring defining
of type class instances by hand. However, this feature is helpful beyond
just that use-case, as with a bit of creativity it allows us to reason about
data types generically (for example, we could calculate the number of
constructors at compile time), and generate values in a type safe way
(useful, for example, in combination with Quick Check's value generation).
Finally, there are quite a few very interesting packages that complement or
build on top of GHC generics. Once you feel comfortable with vanilla
generics, libraries
like generic-sop
may
be of interest.
See also
A set of recommended resources on the topic:
- A generic deriving Mechanism for Haskell (paper)
- Documentation for
GHC.Generics
on Hackage - The page on Haskell Wiki
- The
generics-sop
package on Hackage - Typeable and Data in Haskell —
the blog by Chris Done contains some straightforward examples of generic
programming using the
Typeable
andData
type class