7 Interlude one: basic category theory for programmers
This interlude introduces category theory as a way to think about programming at an even higher level of abstraction. The chapter frames software development as the construction and composition of abstractions, then presents category theory as the study of relationships between “stuff,” where the stuff can be anything: numbers, people, types, or even categories. A category is described as a collection of objects plus morphisms, which are arrow-like relationships between objects. Morphisms must include an identity arrow for every object and must be transitive: if you can get from one object to a second, and from the second to a third, then you can get from the first to the third.
The chapter then explains functors as functions between categories that preserve morphisms. A covariant functor maps objects from one category to another while keeping the direction and structure of the morphisms intact. When such a functor maps a category to itself, it is called an endofunctor. These ideas are connected to programming language type systems by treating types as objects in a category and assignment compatibility as the morphism: for example, a value of type Giraffe can be assigned to a variable of type Animal, so there is a morphism from Giraffe to Animal. Under this view, a generic interface such as IEnumerable<T> is covariant when the mapping from T to IEnumerable<T> preserves those assignment-compatibility arrows.
The chapter also explains contravariance through interfaces that consume values rather than produce them, such as a comparer. An object that can compare any two animals can also compare any two giraffes, so an IC<Animal> can be used where an IC<Giraffe> is required. In category-theory terms, the mapping from T to IC<T> preserves the existence of morphisms but reverses their direction, making it a contravariant functor. The main takeaway is that language-design terms such as covariance and contravariance come directly from category theory, and that these abstract ideas help explain practical rules in modern type systems.
A small category with three objects, represented by the nodes, and six morphisms, represented by the arrows.
A strangely familiar small category with four objects and ten morphisms.
The function F is represented as dashed arrows mapping from one category to another
The category of four types with the assignment compatibility morphism; the identity morphisms are elided for clarity. Note that both Giraffe and Tiger have morphisms to Animal but not to each other. A Giraffe can’t be assigned to a variable of type Tiger and a Tiger can’t be assigned to a variable of type Giraffe, but both can be assigned to a variable of type Animal or Object.
Adding four constructed generic types to our category. Notice how the structure of the morphisms of the interface types looks very similar to that of the original four types.
The mapping of a function from types to types, shown as dashed arrows.
Part of another infinite category of types, this time with a contravariant interface.
Summary
- Category theory is a modern mathematical discipline that studies relationships between arbitrary objects. It’s cheekily characterized as “generalized abstract nonsense”.
- A category is a collection of objects that have reflexive, transitive morphisms; we can think of a morphism as an arrow going from one object to another.
- A function that maps objects of one category to another such that morphisms are preserved is a covariant functor. One that preserves but reverses morphisms is a contravariant functor. If the two categories are the same, it is an endofunctor.
- The assignment compatibility relation “an expression of type X may be assigned to a variable of type Y” is a common morphism on a category where types are objects.
- Covariant and contravariant interfaces are called that because generic interfaces can be thought of as covariant or contravariant functors: functions from types to types that preserve assignment compatibility morphisms.
- Posetal categories, that define a partial order on a set of objects, are common and particularly useful when objects are types in a programming language. Semilattices are posets that have the additional nice property that you can always find the most specific type that is assignment compatible with any two types.
- The relationship between category theory and analysis of programming languages is very deep; we’ve just scratched the surface. We’ll go a little deeper in subsequent interludes.
Fabulous Adventures in Data Structures and Algorithms ebook for free