# Intro to string diagrams for monads

Posted on December 12, 2021This is a high-level introduction to reasoning about monads using the tool of string diagrams, a 2-dimensional notation used for monoidal categories. The obvious disclaimer applies: I have to simplify things a little bit and some minor errors may abound. But if you haven’t seen this topic before you should really like it.

“Monoidal” categories sound a little intimidating, but off the top of my head I’m pretty confident most of the categories that naturally arise in theoretical computer science are monoidal, so if anything they are more relatable to my presumed audience than (mere) categories. For a quick-and-dirty definition, recall that ordinary categories are settings where we can meaningfully compose morphisms like \(f: A \to B\). In programming language theory, one’s objects are usually types and \(f\) is some kind of program taking input of type \(A\) and computing an output of type \(B\). More generally we tend to think of morphisms as processes, proofs (typically but not always constructive), transformations, subset inclusions, or simply “witnesses” to some kind of abstract less-than-or-equal-to relation.

Well a monoidal category just means there is some kind of “product-ish” (I’ll come back to that) operation \(\square\) defined on both types and processes, so that from two processes \(f : A \to B\) and \(g : C \to D\) we can form a kind of “parallel composition” \[f \square g : A \square C \to B \square D.\] Frequently \(\square\) is written as \(\otimes\) and referred to as a “tensor” operation because the tensor product in linear algebra is a typical example (and one you don’t need to delve into for our purposes). A much simpler example, really the canonical one, is that any ol’ Cartesian product structure will do, in which case \(f \times g\) is just the side-by-side execution of two programs taking inputs \(A\) and \(C\) and returning outputs \(B\) and \(D\), respectively. Cutely, a coproduct (sum type) operation works just as well for \(\square\), so the tensor operation doesn’t have to look like a “product” at all. Informally, all that matters is that this side-by-side structure is associative and that there is some sort of trivial object \(I\) such that any \(A \square I\) and \(I \square A\) are isomorphic to \(A\). Clearly when tensor is product, \(I\) will be just the singleton type. (What is \(I\) when tensor is coproduct?)

Oh, and importantly this structure also needs to be functorial, meaning \(f_1 \square g_1\) followed by \(f_2 \square g_2\) is the same as \((f_2 \circ f_1) \square (g_2 \circ g_1)\). This is obvious, right? When we compose programs running in parallel, this just reduces to running the individual compositions in parallel.

## A first taste of string diagrams

Here is a crash course on string diagrams. You don’t need to understand all the details today—I just want to convince you that string diagrams are useful and pretty, so that you can be motivated to study the details at some later time.

String diagrams are a notation for morphisms in monoidal categories (but they generalize beyond this too). The salient feature of the notation is that it is two-dimensional. This freedom to write in two dimensions allows us to more readily depict the \(\square\) structure, especially the fact that parallel operations don’t interact with each other (by definition) and so it should be irrelevant in which order we execute them. In terms of the traditional algebraic (for lack of a better word) syntax using \(\square\), parallelism is most clearly manifest in the axiomatically-imposed equation for parallel composition: \[\left(f_2 \square g_2\right) \circ \left(f_1 \square g_1 \right) = (f_2 \circ f_1) \square (g_2 \circ g_1).\] Visualized instead as an equality between string diagrams, parallelism is manifested as the fact that we can slide boxes along wires like beads along a string, and that this process does not change the morphism depicted by the diagram. This is best understood by seeing examples.

So far I’ve talked about types and functions, but that’s actually not quite the monoidal category I have in mind. I want to talk about the category of endofunctors. As you should know, this is the category whose objects are endofunctors on some category \(\mathcal{C}\) and whose morphisms are natural transformations between them. \(\mathcal{C}\) is what we think of as the category of types and processes, in which case the simplest endofunctors are algebraic datatypes like `list`

and `tree`

that are parameterized by a single type. Natural transformations are operations *at the level of these datatypes* such as an operation that flattens a tree into some kind of list or the operation that reverses a list. Intuitively, these are parametrically polymorphic operations, meaning they should ignore the type parameter, since flattening a tree and reversing a list (etc.) shouldn’t really depend on the contents of the structure. In fact operations that *do* target the contents, i.e. maps, should really commute with operations that target the structure. And that is one of our fundamental ideas today. (Of course, arbitrary functions on these types may be neither maps nor polymorphic.)

A string diagram is a visual depiction of a single natural transfomation, and in all but the simplest cases this transformation will be defined by composing together several simpler ones. “Composition” is a complex idea in monoidal categories, more than just a linear order in which things happen, because recall there are parallel and vertical dimensions of composition. When expressing a transformation as a string diagram, the exact nature of the composition is visualized by wiring the outputs of some boxes to the inputs of others, where each box is a simpler natural transformation (akin to a subexpression) and the wires exist in a two-dimensional space.

You read each diagram from bottom to top (this isn’t universal but it looks best to me). The wires in these diagrams are functors, or you could say a functor is the “type” of a wire. Again, notice the perspective shift—we’re not talking about types and functions but functors and transformations, since our goal is to visualize the monad laws and monads are functors. Actually, we *will* be able to talk about plain old types and functions when we need to by embedding these as *constant-valued* functors and transformations, respectively. You’ll see.

Let’s get to it then.

The following diagram represents a natural transformation \(\phi : F \Rightarrow G\). We draw \(\phi\) as a box with an input wire of type \(F\), and an output of type \(G\).*identity*natural transformation in a special way, as just a wire with no box. The following diagram depicts the identity transformation on \(F\); we pretty much identify this transformation with \(F\) itself.

*constant functor*that maps every object to \(A\) and every morphism to \(\mathrm{id}_A\). We don’t need to make a hard distinction between this and \(A\) itself, but I sometimes write \(\overline{A}\) with an overline to remind myself that this is technically a functor. To talk about an object like \(F A\), we write

Technically what we have done is composed \(F\) with the constant functor \({-} \mapsto A\). Keep in mind it doesn’t make a ton of sense to put functors to the right side of \(A\) now, since precomposing a constant function by anything has no effect.

We saw that boxes depict natural transformations between functors. What if we just want to talk about a specific morphism in the category \(\mathcal{C}\)? To indicate a morphism \(f : A \to B\) we draw a diagram like this:Once again we have played a small trick: \(f\) here is understood as a natural transformation between the two constant functors \(\overline{A}\) and \(\overline{B}\) (you may want to check that \(f\) really is such a transformation). Again, we pretty much identify this natural transformation with \(f\) itself.

Now, if \(\phi\) is a natural transformation then we have the following equation between string diagrams. This equation is a consequence of the fact that we are allowed to slide each box up or down along its wire without changing the morphism depicted by the diagram (there’s lots of details to this that we don’t need to get into today):Contrast the situation to the ordinary linear notation, where we must write either \[\phi_B \circ \mathrm{fmap}^F f\] or \[\mathrm{fmap}^G f \circ \phi_A.\] Since these are equivalent, meaning these expressions always denotate the same morphism, it’s quite unfortunate that there are two syntactically distinct ways of writing it. But as the previous diagram shows, string diagrams do not force us to make this irrelevant choice. We can write a natural transformation and a morphism side-by-side without ambiguity.

In conclusion, a linear notation forces us to choose an arbitrary order between parallel operations, while string diagrams embrace parallelism at the syntactical level. The net result is that sometimes, extremely tedious reasoning about equalities between morphisms (“diagram chasing”) becomes almost straightforward just by employing a diagrammatic notation. I’ll try to at least indicate this elegance in the rest of this post, using string diagrams as a tool to prove that monads satisfy the Kleisli laws.

As a final thought in this section, if you really want the details, I never described precisely how functors even form a monoidal category. The \(\square\) operation on functors is nothing more than their ordinary composition. This sounds like some kind of hack, but far from it I believe composition of functors is one of the most natural tensor structures you’ll find anywhere (if you find this counter-intuitive it’s probably because you are assuming the tensor structure is always commutative, but this is deeply wrong to assume in general). You may or may not be familiar with how to define \(\square\) on natural transformations: \(\phi \square \psi\) is defined as the parallel composition depicted above, except replacing \(\overline{f}\) with \(\psi\). That is, if \(\phi : F_1 \Rightarrow F_2\) and \(\psi : G_1 \Rightarrow G_2\), then \[\phi \square \psi : F_1 \circ G_1 \Rightarrow F_2 \circ G_2\] is defined \[(\phi \square \psi)_A = \phi_{G_2 A} \circ \mathrm{fmap}^{F_1} \psi_A = \mathrm{fmap}^{F_2} \psi_A \circ \phi_{G_1 A}.\] As you see, using ordinary notation there isn’t even a canonical way to write \(\phi \square \psi\) since we have to pick an order for the operations. Hopefully you are starting to see why using diagrams are, frankly, better to work with.

## String diagrams and monads

Now let’s talk about monads.

As you may be aware, monads in computer science are typically defined in one of two equivalent ways. First, they are monoids in the category of endofunctors—this is the definition in terms of `return`

and `join`

. Second, monads can be presented as so-called “Kleisli triples”—this is the `return`

and `bind`

definition. For the sake of recollection and introducing notation, here are the definitions spelled out. I won’t introduce the diagrams yet—just the raw definition.

By the way, Kleisli is pronounced something like “cly zlee.”

### Monads as monoids

A monad \(\langle T, \mu^T, \eta^T\rangle\) is a functor \(T : \mathrm{Type} \to \mathrm{Type}\) equipped with two natural transformations \[\eta : \forall (A : \mathrm{Type}), A \to T A\] and \[\mu : \forall (A : \mathrm{Type}), T (T A) \to T A\] subject to the (generalized) monoid laws:

\[\mu_{A} \circ \eta_{T A} = \mathrm{id}_{T A} \tag{left identity law}\]

\[\mu_{A} \circ \mathrm{fmap}^T \eta_{A} = \mathrm{id}_{T A} \tag{right identity law}\]

\[\mu_A \circ \mu_{T A} = \mu_A \circ \mathrm{fmap}^T \mu_A \tag{associativity law}\]

I have written the equations in a “point-free” style, meaning in terms of composing functions only, without applying them to arguments. This presentation looks a little intimidating but is clearer for purposes of category theory (as opposed to programming). As shown, I may drop superscripts when \(T\) is clear, and I’ll write type arguments (which are implicit in Haskell) as subscripts. \(\eta\) is often called the *unit* in category theory and called `return`

in functional programming, while \(\mu\) can be called `join`

in either context (or multiplication by comparison to ordinary monoids, which seems to be why the letter \(\mu\) is used). I have chosen a notation that is more common for category theory. To say that these are natural transformations means they commute with appropriately-typed \(\mathrm{fmap}\)s, meaning the following equations, for all morphisms \(f : A \to B\) and all terms \(a : A\) and \(t : T (T A)\):

\[\mathrm{fmap}^T f\ (\eta_A a) = \eta_B\ (f a)\]

\[\mathrm{fmap}^T f\ (\mu_A t) = \mu_B\ \left(\mathrm{fmap}^{T \circ T} f\ t\right) = \mu_B\ \left(\mathrm{fmap}^{T} \left(\mathrm{fmap}^T f\right)\ t\right)\]

\(\eta\) is a natural transformation from the identity functor to \(T\), while \(\mu\) is a transformation from \(T \circ T\) to \(T\), so as an abbreviated notation we can write something like this:

\[\eta : 1 \Rightarrow T\]

\[\mu : T \circ T \Rightarrow T.\]

Here, I have written \(1\) to stand for the identity functor mapping everything to itself. If you’re looking for an intuition for these laws, check out my (opinionated) blog post on the subject.

### String digrams for monads

First let us see the monad operations as diagrams. I am a sucker for color, so to me monadic wires are always red. The monad`return`

(in this context called the “unit” since it acts like the unit of a monoid): `join`

: One fact that I omitted above is that natural transformations *from the identity functor* are drawn as if they appeared “out of thin air.” Likewise, natural transformations to the identity functor look like boxes that “cap off a wire.” That’s why \(\eta\) just pops up out of nowhere, for free. (The identity functor is the unit object for the \(\square\) operation in the category of endofunctors. That’s why it gets special treatment.)

All of this by itself is cool, but does it have any utility, open any new doors? Yes. The string-diagrammatic approach makes it quite plain to understand the theory of the so-called Kleisli category of \(T\), the category of free \(T\)-algebras (often presented as the category of Kleisli arrows, which is equivalent). You don’t need to get into all that, though. If you are familiar with the operation `bind`

from functional programming, you should understand most of the next part.

## Monads as Kleisli triples

A Kleisli triple \(\langle T, \mathrm{bind}^T, \mathrm{return}^T\rangle\) is an *object-map* \(T : \mathrm{Type} \to \mathrm{Type}\) equipped with two operations \[\mathrm{return}^T : \forall A : \mathrm{Type}, A \to T A\] and \[\mathrm{bind}^T : \forall A B : \mathrm{Type}, (A \to T B) \to (T A \to T B)\] subject to the following three laws, for all \(f : A \to T B\), \(g : B \to T C\), \(a : A\), and \(t : T A\):

\[\mathrm{bind} \left(\mathrm{return}\right) t = t\]

\[\mathrm{bind} f \left(\mathrm{return}\ a\right) = f a\]

\[\mathrm{bind}\ g \left(\mathrm{bind}\ f\ t\right) = \mathrm{bind}\ \left(\mathrm{bind}\ g \circ f\right)\ t.\]

You might wonder what I mean by *object-map* instead of *functor*. I simply mean that \(T\) is only required to be a function of type \(\mathrm{Type} \to \mathrm{Type}\), without having an \(\mathrm{fmap}^T\) operation.

### Equivalence with monads

**Theorem**: Monads and Kleisli triples are equivalent.

**Proof**: The \(\eta/\mathrm{return}\) operation is shared by both definitons. The central question is how to get \(\mathrm{bind}\) from \(\mathrm{join}\) and \(\mathrm{fmap}\), and vice versa.

Starting from a monad-qua-monoid, define \[\mathrm{bind}^T\ (f : A \to T B) := \mu^T_B \circ \mathrm{fmap}^T\ f.\] The Kleisli laws are easy to prove using the monoid laws and rules about natural transformations. I’m about to show how to visualize these laws with string diagrams, so we’ll come back to this part.

Now starting from a Kleisli triple, define \[\mathrm{fmap}^T f := \mathrm{bind}^T \left(\mathrm{return}^T \circ f\right).\] You can use the laws of \(\mathrm{bind}\) to show that \(\mathrm{fmap}\) makes \(T\) into a functor. Then define \[\mu^T_A := \mathrm{bind}^T \left(\mathrm{id}: T A \to T A\right) : T (T A) \to T A.\] Then we must prove that \(\eta\) and \(\mu\) are both natural transformations with respect to the \(\mathrm{fmap}^T\) we just defined, which is actually kind of tedious. Finally one proves the monoid laws, which are straightforward.

Oh, and to call these definitions “equivalent” you need to verify that starting with either definition and making a roundtrip gets us back where we started. This means if you start with a monoid and define a Kleisli triple, the monoid you can construct from the Kleisli triple is the same one you started with, and likewise in the other direction.

To get the most out of this post, you really should take the time to prove both directions of this equivalence, but particularly the monoid-to-Kleisli direction. An observation of practical importance is that the Kleisli definition does not require first defining an \(\mathrm{fmap}\) operation on \(T\), nor verifying that the monoid operations are natural transformations. Instead these come for free, which is very nice when you have to prove that something is a monad.

For quick reference, here is what I would say are the pros and cons of each definition.

Pros of the monoid definition:

- conceptually clear category-theoretically (
`join`

is a natural transformation,`bind`

is not) - immediately suggests a kind of “container” semantics (see here)
- meaningful string diagrams

Pros of the Kleisli definition:

- conceptually clear to functional programmers (compare the relative complexity of the two definitions)
- immediately suggests a kind of “substitution” semantics (more on this in a future post)
- easier to formalize typeclass instances in Coq (fewer axioms to prove)

## Visualizing `bind`

and the Klesli laws

Assume some monad \(T\) is fixed. Arrows of the shape \(f : A \to T B\) (for any \(A\) and \(B\)) are called “Kleisli” arrows of \(T\). We’ll get into the abstract theory of Kleisli arrows in a minute, but for now let’s just assume we care about them. If you are familiar with functional programming, recall that we tend to think of monads as encoding some kind of “special effect,” and a Kleisli arrow is a computation that has this kind of special effect.

Now for any Kleisli arrow \(f\), \(\mathrm{bind}^T f\), recall defined as \(\mu_T \circ \mathrm{fmap}^T f\), is just the following diagram:^{1}

In the left hand diagram the dotted boxes depict the two arguments to \(\mathrm{bind}\), and the dotted box on the right side depicts the argument to the outer \(\mathrm{bind}\). Note that all the dotted boxes have the basic same shape: a black input wire, and red output wire on the left and a black output wire on the right. The right-side dotted box, depicting \(\mathrm{bind}^T g \circ f\), has a special name: the *Kleisli composition* of the two Kleisli arrows \(g\) and \(f\). I represent Kleisli composition with an asterisk instead of a circle: \[g \ast f := \mathrm{bind}^T g \circ f.\] We’ll come back to Kleisli composition in a minute. As for the above proof, if you think about it carefully, you’ll see there is actually an initial step where we pull the first \(\mu\) vertically about the \(g\) before we can apply associativity, much like the previous proof. This is explicit in a linear notation but implicit with diagrams.

## Kleisli arrows and Kleisli composition

In the third Kleisli law the so-called Kleisli composition makes an appearance. This is an operation that accepts two Kleisli arrows \(g : B \to T C\) and \(f : A \to T B\) and computes one of type \(A \to T C\). This composition, \(g \ast f\), is given by the diagramAs I will ask you to prove, this definition of composition has a left and right identity, both given by \(\eta^T\) (notice that `return`

is itself a Kleisli arrow). That is, Kleisli-composing or Kleisli-precomposing some \(f\) with \(\eta^T\) is equal to \(f\). Moreover, this notion of composition is associative. That sounds a lot like a category, no? Indeed, morphisms of this shape form a category, the Kleisli category \(\mathcal{C}_T\) of \(T\). The objects of this category are the same as those of \(\mathcal{C}\), but a morphisms from \(A\) to \(B\) in \(\mathcal{C}_T\) is *defined* as any morphism from \(A\) to \(T B\) in \(\mathcal{C}\). (Yes, this means the type of \(f\) will change depending on which category we’re thinking about.) This sounds a little abstract, but you pretty much already saw the proof that \(\mathcal{C}_T\) forms a category:

**Homework**: Prove the following laws twice, once using the laws of \(\mathrm{bind}\) and the second using string diagrams: \[\eta^T_B \ast f = f\] \[f \ast \eta^T_A = f\] \[h \ast (g \ast f) = (h \ast g) \ast f\] Hint: The second law above is precisely the second Kleisli law. The first and third laws are almost exactly the other two axioms of \(\mathrm{bind}\), except that all the morphisms here are pre-composed with \(f\). This exercise is trivial, just wrapped in layers of definitions.

Finally, if you read them very closely, you’ll see that first and third law of \(\mathrm{bind}\) state that it acts as the \(\mathrm{fmap}\) operation of a functor from \(\mathcal{C}_T \to \mathcal{C}\). This functor maps an object \(A\) of \(\mathcal{C}_T\) (which is just an object of \(\mathcal{C}\)) to \(T A\), and Kleisli arrow \(f : A \to T B\) to \(\mathrm{bind}^T f : T A \to T B\). Notice this functor acts like \(T\) on objects, except that its \(\mathrm{fmap}\)-operation operates on Kleisli arrows. Let’s use \(T_{K}\) to denote this functor. Then the first and third^{2} Kleisli laws are nothing but \[\mathrm{fmap}^{T_K} \left(\mathrm{id}^{\mathcal{C}_T}_A\right) = \mathrm{id}^\mathcal{C}_{T A}\] and \[\mathrm{fmap}^{T_K} g \circ \mathrm{fmap}^{T_K} f = \mathrm{fmap}^{T_K} (g \ast f)\]

Also, you will find that there is a functor (let’s call it \(K\)) from \(\mathcal{C} \to \mathcal{C}_T\) mapping each \(f : A \to B\) to \(\eta_B \circ f : A \to T B\). In functional programming (but not always in general), it almost always happens that \[\eta_B \circ f_1 = \eta_B \circ f_2 \iff f_1 = f_2,\] in which case we can see that \(\mathcal{C}_T\) is a *bigger* category than \(\mathcal{C}\) in this sense: composing a function \(f : A \to B\) with `return`

embeds \(\mathcal{C}\) into \(\mathcal{C}_T\) (“embeds” means distinct things get mapped to distinct things). Consequently, we can say there are *more* Kleisli arrows from \(A\) to \(B\) than ordinary arrows of the same type (i.e. there are more \(A \to T B\) than \(A \to B\)) because ordinary arrows can been seen as a subset of Kleisli arrows.

Moreover, try using string diagrams to prove \[\mathrm{bind}^T (\eta_B \circ f) = \mathrm{fmap}^T f.\] This implies that the functor \(T\) “factors though” \(T_K\), meaning we can promote any \(f\) to a Kleisli arrow by composing with `return`

and then apply \(T_K\) (i.e. \(\mathrm{bind}^T\)), or we can just \(\mathrm{fmap}^T\)-it, and these are the same. This means \(T : \mathcal{C} \to \mathcal{C}\) actually decomposes as \[T = (T_K \circ K) : \mathcal{C} \to \mathcal{C}_T \to \mathcal{C}.\]

In functional programming terms, the previous two paragraphs say that functions without effects are just a special subset of functions that do have effects, since one possible effect to have is the trivial one given by composition with \(\eta\). Additionally, mapping an effect-less function over \(T\) is really just a special case of binding an effectful function over \(T\). However there are binds over \(T\) that are not maps, and as always there are operations on objects like \(T A\) that are not binds.

## From Kleisli triples to monoids

We used string diagrams to reason about the Kleisli operations. Can we go in the other direction? Meaning, is there a diagrammatic account of how to start with a Kleisli triple and describe the monoid structure of \(T\)? Unfortunately no. This is because \(\mathrm{bind}\) is not a natural transformation and doesn’t really show up in a diagrammatic notation. It would be circular to try to *start* from the diagram for \(\mathrm{bind}^T\) and define the monoid operation, since \(\mu\) already appears in the diagram defining \(\mathrm{bind}^T\). Indeed, starting from a Kleisli triple we’re not even immediately justified in using string diagrams at all, as it’s not clear from the start that \(T\) is even a functor.

However, if we start from the monoid and define \(\mathrm{bind}\), we can show that the monoid operation defined from \(\mathrm{bind}\) is the same as \(\mu^T\), which is nice. Try it.

# Conclusion

String diagrams are awesome. They can be used to reason about equalities between morphisms in an arbitrary monoidal category, which is any category with a notion of parallel composition. The advantage of using a syntax that embraces parallelism from the outset is that it allows us to focus on what really matters, namely the arrangement of what’s “plugged into” what, and not become completely lost in petty manipulations of syntax. In other words, string diagrams are a more canonical way of writing things—the syntax itself quotients out a lot of the “structural” reasoning.

The particular monoidal category we have in mind is \(\mathrm{End}_\mathcal{C}\), the category of functors and natural transformations. With string diagrams we can see precisely what is meant by “monads are monoids in the category of endofunctors.” This might seem like just a pleasant observation, but it’s actually useful. Using string diagrams makes it easy to see why monads are also Kleisli triples, and to see pictorially the relationship between the Kleisli laws and the monoid ones. Finally, if you follow(ed) the recommended exercises about Kleisli composition, you should get a certain amount of gratifying insight into the nature of the Kleisli category.

As I intend to write about in the future, there are abstractions beyond monads that also have a pleasing diagrammatic account. As an exercise, try dualizing everything in the blog post to get string diagrams for comonads. Of course, this just means every diagram gets flipped vertically, but try to do this from first principles. See if you can write down the co-Kleisli laws by reading them off from the diagrams.

Confirm for yourself that the contents of the dotted box, as a sub-diagram, have the correct type to even be the argument to \(\mathrm{bind}^T\). You determine the input (output) type of a diagram by tensoring the types of the bottom/input (top/output) wires from left to right. Remember that tensor isn’t generally commutative and that our tensor is just composition, and remember than application of a functor to a type is a special case of composition of functors.↩︎

If you’re upset that the second law doesn’t seem to make an appearance here, it’s “because” there is actually a weaker kind of structure called a right monad action of \(T\), out of which arises a functor satisfying only the above two laws. For this general structure the second Kleisli law doesn’t even typecheck. We have just observed that all monads have right actions on themselves, but a monad is more than just this, so it’s not so surprising that one of the axioms isn’t used here.↩︎