GHC Generics Explained

This tutorial will get you up to speed with GHC generics quickly. It should be noted that generics is not something academic and useless, quite the contrary, it's a very pragmatic way to reduce the amount of boilerplate (and associated with it errors) in your functional code with minimal mental overhead. In fact, by the time you get to the end of the tutorial, I hope you'll agree that generics are easy to use and easy “to get” as well. The tutorial is for intermediate-level Haskellers.

Let's start from the beginning. What are generics? Generics, like in other languages, are a way to use the same code to manipulate different data types. We should note that in this regard the technique is very close to the much more common notion of polymorphism, which in Haskell comes in two flavors:

Parametric polymorphism, when we have type variables in functions/data types. This allows the same function, such as const to work with any types of arguments, as long as the more general types from a function's signature can be made equal to concrete types we want to work with by instantiating/fixing the type variables.
Ad-hoc polymorphism, which allows us to perform a computation abstracted over instances of one or more type classes. We request that a type has some properties of interest and then describe the computation in terms of these properties. The code is then applicable to any data type that has the properties.

How are generics different? Generics allow us to describe transformation that works in terms of not concrete types, but general combinators that describe shape and meta information about a data type. This way, we can declare how to perform a computation on almost any data type!

The idea about generics

Haskell's features that open the possibility for such a thing as generics are type classes and ad-hoc polymorphism. The ability to describe a data type in terms of a set of combinators is our property, encapsulated by the Generic type class. A data type that is an instance of the type class can be processed by functions that work with a representation of data type, not the data type itself. These functions, however, are by definition polymorphic and also usually hidden behind a type class interface, so the whole thing usually takes the form of automatic deriving of a type class instance.

The process can be described like this:

A given data type gets Generic instance automatically, as it can be generated by the compiler with the help of the DeriveGeneric language extension.
There is a type class instance defined for instances of Generic (or Generic1). Since our data type is now an instance of those type classes, that generics-based instance starts to work for us and we magically get all the useful methods of that type class.

If you know about Data and Typeable type classes, then you probably know that it's possible to do something similar using information which methods of those type classes provide. Data and Typeable are beyond the scope of this tutorial, but you read about them in this blog post by Chris Done if you're interested.

Step 2 is not as simple as it sounds though, at least not until we examine the details of this business more closely.

Combinators that represent the shape of a data type

What could they look like? Well, if I showed you the data types as they are right now, you would probably run away in fear, cursing the tutorial and Haskell's generics. I have a better idea. Let's start with the simplest thing possible and then iterate asking ourselves how to work around some more interesting use cases that we might need to support. This will force us to make the data types more hairy but also more powerful step by step, till we arrive at the definitions that are actually used.

For better or worse, algebraic data types lock us into a view on the world that consists of sums and products (another Haskell's feature that makes the existence of generics framework possible; generics do not work with GADTs for example). So we need to be able to represent the following:

Data types without constructors at all: uninhabited types like Void. This can be described as data V1 (note: it has no constructors).
Constructors without arguments. Let this be something like this: data U1 = U1.
Sums: data (f :+: g) = L1 f | R1 g. If we have a sum data type with two alternatives we can represent other sum data types with any number of alternatives using nesting.
Products: data (f :*: g) = f :*: g.

Not bad for a start. Let's try use this representation to derive a Functor instance. Deriving a such an instance means that we should provide the fmap function which looks like this:

fmap :: (a -> b) -> Rep f -> Rep f

Here, Rep f maps to the type of generic representation of f :: * -> *. There is a problem though. You see, the functor's inner type that goes from a to b in this example has no place to live in the representation! Our current approach will work only for type classes that work with things of kind *, such as Show, but if we want to use this system to derive a Functor instance, we need to allow it to work with kind * -> *.

We want to be able to derive Functor (and similar) instances generically, so we have to fix things up. The solution is to add one more type parameter p to all our combinators:

data V1 p
data U1 p = U1
data (f :+: g) p = L1 (f p) | R1 (g p)
data (f :*: g) p = (f p) :*: (g p)

This way fmap would be:

fmap :: (a -> b) -> Rep f a -> Rep f b
--                  p  =  a    p  =  b

But what happens to the type classes that work with * kinds? Here are our choices:

Have a separate set of combinator types for each case (* and * -> * kinds).
Use the most general form (with p), but for * kinds just treat the extra p parameter as a dummy type index that has no meaning.

The authors of the generic extension went with the second option, and I can't blame them. We'll see that there are a lot of wrappers already and we really should try to keep their number from exploding any further (this may make more sense on the second read).

Let's try to map from a data type to its representation and see if we're still missing something:

data Maybe a = Nothing | Just a

-- Interestingly, we could build a representation that works on ‘Maybe a’,
-- that is, a thing of kind *, if we wanted to derive something like ‘Show’.
-- At the same time if we wanted to derive ‘Functor’, we would work with
-- ‘Maybe’ of kind * -> *. This means that there are actually two different
-- possible representations depending on our aim. This is addressed with two
-- different generics type classes, as we will see later.

-- For kind *, things like ‘Show’:

-- type: (U1 :+: ?) p

Oh, how to represent Just a? We need to way to allow it have an argument! Let's add the following:

data Rec0 c p = Rec0 { unRec0 :: c }

Rec part in the type's name hints that this thing may be possibly recursive.

But I'm lying to you, due to some historical stuff, it's defined a bit differently:

type Rec0 = K1 R

newtype K1 i c p = K1 { unK1 :: c } -- c is the value, ‘a’ in ‘Maybe a’
--         ^   ^
--         |   |
--         |   +-------- dummy p
--         |
-- type-level tag, R or P

You see this R type-level tag? There used to be another one, P and this type Par0 = K1 P type synonym, but forget it, it's deprecated. Bottom line: Rec0 is used for data constuctor arguments (fields), that are not p parameter.

With Rec0, we can finally build representation of Maybe a:

-- This is the type of our representation: (U1 :+: Rec0 a) p

-- Examples of values for ‘Maybe Int’:

-- Nothing => L1 U1
-- Just 5  => R1 (K1 5) -- remember where K1 comes from?
--            ^
--            |
--            +--- L1 and R1 are from our representation of sum types

Let's derive a different representation that works with * -> * kinds:

-- Type of our representation: (U1 :+: ?) p

We need a way to tell if we have an argument of p type (like a in Functor f => f a) or some other type, if we're to write a generic Functor function. For this reason the generics extension uses Par1 p:

newtype Par1 p = Par1 { unPar1 :: p } -- “par” stands for “parameter”

Par1 is used to mark occurrences of p. Our representation thus becomes:

-- The type of our representation: (U1 :+: Par1) p

-- Examples of values for ‘Maybe Int’:

-- Nothing => L1 U1 -- the same
-- Just 5  => R1 (Par1 5)

Almost there. The final example is that for lists. Given the standard definition of linked list:

data List a = Nil | Cons a (List a)

How do we build its representation (for kind * -> *)? The tricky part is, of course, List a, which is a recursive occurrence of entire functorish part with parameter inside it. If we mark occurrences of parameter by wrapping them with Par1, then why not mark this thing with something too? Rec1 shall it be.

-- The type of representation: (U1 :+: (Par1 :*: Rec1 List)) p

-- Examples of values for ‘List Int’:

-- Nil                 => L1 U1
-- Cons 5 Nil          => R1 (Par1 5 :*: Rec1 Nil)
-- Cons 5 (Cons 4 Nil) => R1 (Par1 5 :*: Rec1 (Cons 4 Nil))

If we had just arguments of a data constructor that are not related to parameter p, plain Rec0 (K1) would be used for both first and second arguments of Cons.

And that's it, we have everything we need.

The `Generic` and `Generic1` type classes

Type classes that map types to their representations are called Generic (for type classes that work with * kinds) and Generic1 (for type classes that work with * -> * kinds). They live in the module called GHC.Generics together with the types used to build the data type representations we've just discussed.

Let's see what these type classes have:

class Generic a where
  type Rep a :: * -> *
  from :: a -> Rep a p
  to   :: Rep a p -> x

class Generic1 f where
  type Rep1 f :: * -> *
  from1 :: f p -> Rep1 f p
  to1   :: Rep1 f p -> f p

from and from1 map values of data types to their generic representations. Rep and Rep1 are associated type functions (the feature is a part of the TypeFamilies GHC extension) that take the type of data we want to manipulate and yield the type of its representation. Of course, if we want to derive Functor instances, we need a way to go back from representation to actual value of target data type. This is done via to and to1. The good thing about this, of course, that GHC can derive Generic and Generic1 for us automatically with DeriveGeneric language extension enabled.

Now let's open GHCi and try to infer Rep of some type:

λ> :t (undefined :: Rep (List a) p)
(undefined :: Rep (List a) p)
  :: D1
       ('MetaData "List" "GenericsTutorial" "main" 'False)
       (C1 ('MetaCons "Nil" 'PrefixI 'False) U1
        :+: C1
              ('MetaCons "Cons" 'PrefixI 'False)
              (S1
                 ('MetaSel
                    'Nothing 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy)
                 (Rec0 a)
               :*: S1
                     ('MetaSel
                        'Nothing 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy)
                     (Rec0 (List a))))
       p

Ah OK, there is just a little bit more to this representation thing…

Metadata wrappers

A representation also has associated metadata. It may look messy and difficult to read, but I'll explain the logic behind it in a moment, it should help to clarify why it's done like this.

First of all, metadata should not get in our way if we don't want to deal with it. Thus all metadata is attached using the same simple wrapper:

newtype M1 i c f p = M1 { unM1 :: f p } -- ‘f p’ is what lives inside, U1 for example
--         ^ ^
--         | |
--         | +---- compiler-generated data type that allows us to get meta information
--         |
--  type-level tag, see below

The i type-level tag can be one of the three:

D for data type metadata type D1 = M1 D
C for constructor metadata type C1 = M1 C
S for record selector metadata type S1 = M1 S

It should be understandable why metadata is attached this way. If we want to, we can ignore metadata easily, since it's just a wrapper:

f (M1 x) = f x

If we want to look at a particular type of metadata we can specify the i type-level tag, or we can leave it unspecified to deal with all metadata at once.

The c type is auto-generated by the compiler and encodes metadata on type level. Why not store the metadata on value level, in M1 constructor? Well, if it were there, we would have to provide it if we wanted to generate some values, and providing metadata for already existing data type is certainly something that only the compiler can do properly. With the current approach, a given data type determines the type of its representation, including its metadata, so we don't have to bother.

The last thing to mention about metadata is which wrappers are generated for you. They are very predictable:

The entire representation is wrapped in D1, which provides datatype-level information (datatype name, module name, and whether it's a newtype).
Every constructor is wrapped in C1, which provides information about constructor (such as constructor name, fixity, and whether it's a record).
Every argument of constructor is wrapped with S1 (even if it's not actually a record selector), which tells us the selector name.

Look at the Haddock, to find out names of functions that help extract the metadata. Using them is straightforward: feed the wrapped data to functions like datatypeName and get the information.

Example: `Functor` deriving

Of course, after talking so much about Functor instances and adding the clumsy p parameter to support them, we absolutely must to derive Functor instance for generics! Anyway, Functor instance for generics is already defined in GHC.Generics, so instead of re-implementing it, let's just go through the code:

-- If we have a parameter ‘p’, we just map over it, as expected:

instance Functor Par1 where
  fmap f (Par1 p) = Par1 (f p)

-- The same with ‘Rec1’ (just use ‘fmap’ because the inner part is a ‘Functor’):

instance Functor f => Functor (Rec1 f) where
  fmap f (Rec1 a) = Rec 1 (fmap f a)

-- A constructor without fields only can be returned untouched:

instance Functor U1 where
  fmap _ U1 = U1

-- A field that is not ‘p’ parameter should not change:

instance Functor (K1 i c) where
  fmap _ (K1 a) = a

-- Metadata has no effect, just unwrap it and continue with the inner value,
-- if the inner value is an instance of ‘Functor’:

instance Functor f => Functor (M1 i c f) where
  fmap f (M1 a) = M1 (fmap f a)

-- When we have a sum, we should try to map what we get, provided that it
-- contains something that has ‘Functor’ instance:

instance (Functor l, Functor r) => Functor (l :+: r) where
  fmap f (L1 a) = L1 (fmap f a)
  fmap f (R1 a) = R1 (fmap f a)

-- The same for products:

instance (Functor a, Functor b) => Functor (a :*: b) where
  fmap f (a :*: b) = fmap f a :*: fmap f b

Example: counting constructor fields

I think now we're dangerous enough to implement a simple (and pretty useless) type class that will count constructor fields of a given value. (Yeah, I could go with the venerable Encode example here, but I like to be “original”, so you get a dumber example instead :-D)

The type class looks like this:

class CountFields a where
  -- | Return number of constuctor fields for a value.
  countFields :: a -> Natural

We will start by implementing countFields method that works on representations:

instance CountFields (V1 p) where
  countFields _ = 0

instance CountFields (U1 p) where
  countFields _ = 0

instance CountFields (K1 i c p) where
  countFields _ = 1

instance CountFields (f p) => CountFields (M1 i c f p) where
  countFields (M1 x) = countFields x

instance (CountFields (a p), CountFields (b p)) => CountFields ((a :+: b) p) where
  countFields (L1 x) = countFields x
  countFields (R1 x) = countFields x

instance (CountFields (a p), CountFields (b p)) => CountFields ((a :*: b) p) where
  countFields (a :*: b) = countFields a + countFields b

Let's write a single function called, say, defaultCountFields that does the whole thing:

defaultCountFields :: (Generic a, CountFields (Rep a)) => a -> Natural
defaultCountFields = countFields . from

But here is the catch, the code above does not compile. CountFields has the kind CountFields :: * -> Constraint, but we give it Rep a of kind * -> *. It's not going to take our abuse like this!

The solution, typical with this setting, is to have a helper class that works with things of * -> * kind (this also removes the p parameters from signatures):

class CountFields1 f where
  countFields1 :: f p -> Natural

defaultCountFields :: (Generic a, CountFields1 (Rep a)) => a -> Natural
defaultCountFields = countFields1 . from

instance CountFields1 V1 where
  countFields1 _ = 0

instance CountFields1 U1 where
  countFields1 _ = 0

instance CountFields1 (K1 i c) where
  countFields1 _ = 1

instance CountFields1 f => CountFields1 (M1 i c f) where
  countFields1 (M1 x) = countFields1 x

instance (CountFields1 a, CountFields1 b) => CountFields1 (a :+: b) where
  countFields1 (L1 x) = countFields1 x
  countFields1 (R1 x) = countFields1 x

instance (CountFields1 a, CountFields1 b) => CountFields1 (a :*: b) where
  countFields1 (a :*: b) = countFields1 a + countFields1 b

You could notice that some data types like Par0, Rec1 didn't get their definitions. This is OK because we work with Generic, not Generic1 here, but omitting definitions for some data types that generic representations are built from is normal. As GHC.Generics docs say:

If no :+: instance is given, the function may still work for empty datatypes or datatypes that have a single constructor, but will fail on datatypes with more than one constructor.
If no :*: instance is given, the function may still work for datatypes where each constructor has just zero or one field, in particular for enumeration types.
If no K1 instance is given, the function may still work for enumeration types, where no constructor has any fields.
If no V1 instance is given, the function may still work for any datatype that is not empty.
If no U1 instance is given, the function may still work for any datatype where each constructor has at least one field.

An M1 instance is always required (but it can just ignore the meta-information).

Putting the magic into the type classes

Having dealt with generic implementation of the functionality of interest, let's put it all together and use a special GHC extension to allow the user derive type classes without knowing anything about generics.

For a generic implementation to work without user's definition we need to provide it as the default definition. As you have already seen, a generic implementation often involves a Generic constraint and it would be ugly to add it to every type class (on the left hand side of =>) just to make deriving easier. The default keyword, enabled by the DefaultSignatures language extension, allows us to give a different type signature for default implementation of a type class:

class Functor f where
  fmap :: (a -> b) -> f a -> f b
  default fmap :: (Generic1 f, Functor (Rep1 f)) => (a -> b) -> f a -> f b
  fmap = to1 . fmap . from1

class CountFields a where
  countFields :: a -> Natural
  default countFields :: (Generic a, CountFields1 (Rep a)) => a -> Natural
  countFields = defaultCountFields

This way we can have our cake and eat it too: deriving is easy and otherwise no ugly details are visible!

Conclusion

Generics are a powerful means of automating error-prone and boring defining of type class instances by hand. However, this feature is helpful beyond just that use-case, as with a bit of creativity it allows us to reason about data types generically (for example, we could calculate the number of constructors at compile time), and generate values in a type safe way (useful, for example, in combination with Quick Check's value generation). Finally, there are quite a few very interesting packages that complement or build on top of GHC generics. Once you feel comfortable with vanilla generics, libraries like generic-sop may be of interest.

GHC Generics Explained

Hiring for Haskell? Here's What You Need to Know

Celebrating 10 Years of Our Haskell Training Course

Enhancing Functional Programming in Python with Coconut

The idea about generics

Combinators that represent the shape of a data type

The `Generic` and `Generic1` type classes

Metadata wrappers

Example: `Functor` deriving

Example: counting constructor fields

Putting the magic into the type classes

Conclusion

See also

Hiring for Haskell? Here's What You Need to Know

Celebrating 10 Years of Our Haskell Training Course

Enhancing Functional Programming in Python with Coconut

GHC Generics Explained

Hiring for Haskell? Here's What You Need to Know

Celebrating 10 Years of Our Haskell Training Course

Enhancing Functional Programming in Python with Coconut

The idea about generics

Combinators that represent the shape of a data type

The Generic and Generic1 type classes

Metadata wrappers

Example: Functor deriving

Example: counting constructor fields

Putting the magic into the type classes

Conclusion

See also

Hiring for Haskell? Here's What You Need to Know

Celebrating 10 Years of Our Haskell Training Course

Enhancing Functional Programming in Python with Coconut

The `Generic` and `Generic1` type classes

Example: `Functor` deriving