# Yes, monads are container-like

Posted on November 13, 2021*Thesis: Monads are containers*

*Antithesis: Monads are not containers*

*Synthesis: Monads are container-like*

It is fashionable in the Haskell world to have at least one blog post explaining monads to n00bs. Prototypically this involves describing monads as some sort of container, like a box or a burrito. This is so common that one sometimes speaks of the monad tutorial fallacy, and the language of “containers” has been known to elicit negative reactions from those who find the analogy misleading (see this Twitter thread for instance). I don’t find the analogy misleading, and this blog post is my attempt to explain the metaphor while remaining grounded in formal theory. It is aimed at an audience that understands at least the definition of a natural transformation.

By “containers” I mean the usual things: sets (however defined), lists, multisets (a/k/a “bags”), tree-like structures, etc. Such data structures form monads, and the claim I’m making here is that *all* monads have something in common with these structures. For our purposes containers don’t have to be finite, nor do we assume we can enumerate the “contained” values. Indeed, infinitary set-like types such as those given by the Coq type constructor `fun A : Type => (A -> Prop)`

form monads, even though an arbitrary predicate say of type `nat -> Prop`

isn’t necessarily decidable or even semi-decidable.^{1} So the container-ness of monads can’t be about enumerating elements. No, what makes monads container-like is that they have a **singleton** and **union** operation that interact in a common-sense way—in fact this is precisely the “monoid-in-the-category-of-endofunctors” definition. Since I want to spell out the connection to “real theory” (or is that an oxymoron?), let’s build up to that famous definition by recalling the basic ingredients.

## Monoids in \(\mathrm{End}_\mathcal{C}\)

Let \(\mathcal{C}\) be a category. We will think of \(\mathcal{C}\) as something like \(\mathrm{Set}\), the category of sets and functions. You can also think of \(\mathrm{Hask}\), the so-called category of Haskell types and functions, which is technically not a category for reasons we can ignore. To keep the functional programming intuition the objects of \(\mathcal{C}\) will be called types, while the morphisms \(\mathrm{Hom}_\mathcal{C}(A, B)\) will be identified with functions of type \(A \to B\). We write \(A \in \mathcal{C}\) when \(A\) is a type. We also write \(a : A\) to indicate that \(a\) is a value of type \(A\).^{2} Code is written `like this`

but departs from Haskell conventions by writing type variables and functors with upper case letters.

An endofunctor \(F : \mathcal{C} \to \mathcal{C}\) is, first of all, a type constructor. Meaning if \(A \in \mathcal{C}\), then \(F A \in \mathcal{C}\) is a new type somehow computed from \(A\). A common source of such type constructors are the algebraic data types defined in Haskell with the keyword `data`

, but Haskellers also know of examples like `(R ->)`

, the Reader monad, where `R`

is any type and `(R ->) A`

means `R -> A`

. But a functor is more than its type constructor, as we must also specify the morphism-lifting operation that Haskellers call `fmap`

. Category-theoretically this is the operation mapping any morphism \(A \to B\) to one of type \(F A \to F B\). From the Haskell perspective, `fmap`

is a polymorphic operation of type `forall A B, (A -> B) -> F A -> F B`

, but I’ll mostly avoid this Haskell-y notation in favor of writing \(\mathrm{fmap}^F\) with type arguments as subscripts. When \(F\) is thought of as a container-like, \(\mathrm{fmap}^F_{A, B} f\) is the operation that maps over the contained values of some container \(t : F A\) and replaces each contained value \(a : A\) with \(f a : B\). (Terms like \(t\) are really the containers, but one sometimes calls \(F\) itself a container.)

The set of endofunctors of \(\mathcal{C}\), written \(\mathrm{End}_\mathcal{C}\), forms a monoid—perhaps a very big one, but that’s okay. The identity of the monoid is the identity functor mapping \(A \in \mathcal{C}\) and \(f : A \to B\) to themselves. The monoid operation is composition of functors. One easily verifies the associativity and identity laws. Now furthermore \(\mathrm{End}_\mathcal{C}\) is a category, one whose objects are endofunctors and whose morphisms are natural transformations (I’m treating this as background knowledge, consult your preferred resource on category theory). But that’s not all. The monoid structure and the categorical structure also play nicely with each other in this sense: If \(\eta_1 : F_1 \Rightarrow G_1\) and \(\eta_2 : F_2 \Rightarrow G_2\) are natural transformations then there is a natural transformation^{3} \[\eta_2 \square \eta_1 : F_2 \circ F_1 \Rightarrow G_2 \circ G_1\] defined \[(\eta_2 \square \eta_1) (t : F_2 (F_1 A)) = \mathrm{fmap}^{G_2} \eta_1 \left(\eta_{2}\ t\right)\] or equivalently (prove this) \[(\eta_2 \square \eta_1) (t : F_2 (F_1 A)) = \eta_2 \left(\mathrm{fmap}^{F_2} \eta_1\ t\right).\] Setting aside some details, this makes \(\mathrm{End}_\mathcal{C}\) into a strict monoidal category, a category with a monoid structure defined not just on objects but on morphisms between them too, in a certain sense.

In general all categories have a notion of sequential composition—that is precisely the definition of a category. *Monoidal* categories are special because they also have a notion of *parallel* composition, here given by \(\square\). One reason to care about this is that the idea of “monoid” can be generalized to make sense in any monoidal category, so in particular it makes sense to talk about monoids in \(\mathrm{End}_\mathcal{C}\). Generalized monoids don’t have to look anything like monoids in the ordinary sense, although one finds that generalized monoids in the category of sets^{4} are precisely ordinary monoids. That’s why it’s fair to say this notion “generalizes” the ordinary one. Experience shows it is very interesting to consider generalized monoids in non-\(\mathrm{Set}\) categories. These can be complex objects, and it is of considerable utility to recognize that such structures can be explained in terms of the axioms of ordinary monoids, a remarkably simple kind of structure, suitably generalized to make sense outside of \(\mathrm{Set}\).

Quite famously, the monads of functional programming are exactly generalized monoids in \(\mathrm{End}_\mathcal{C}\) (whereas monoids in \(\mathcal{C}\) itself are basically ordinary monoids, since type-theoretic categories are already \(\mathrm{Set}\)-like). I’ll spare you the details, but a generalized monoid in \(\mathrm{End}_\mathcal{C}\) by definition is the following:

- A functor \(F : C \to C\)
- Equipped with an “identity element,” a natural transformation \(\eta : 1 \Rightarrow F\) where \(1\) is the identity functor.
- Equipped with a “monoid operation,” a natural transformation \(\mu : F \circ F \Rightarrow F\).
- Satisfying generalizations of the monoid laws that I’ll write down in a second.

If you like Haskell-ish notation, a monad \(F\) is a functor equipped with polymorphic operations

`return : forall A, A -> F A`

`join : forall A, F (F A) -> F A`

Try writing out what it means for these to be natural transformations. I won’t dwell on this part because naturality is nice but a separate matter from the monoid laws.

At first these operations seem cryptic. Unfortunately the generalized monoid laws don’t immediately look enlightening either:

- Left identity law: \[\mu_{A} \circ \eta_{F A} = \mathrm{id}_{F A}\]
- Right identity law: \[\mu_{A} \circ \mathrm{fmap}^F \eta_{A} = \mathrm{id}_{F A}\]
- Associativity law: \[\mu_A \circ \mu_{F A} = \mu_A \circ \mathrm{fmap}^F \mu_A\]

Using Haskell notation is slightly more illuminating, maybe:

`join (return t) = t`

for all`t : F A`

`join (fmap return t) = t`

for all`t : F A`

`join (join t) = join (fmap join) t`

for all`t : F (F (F A))`

How to make sense of these equations? One often finds that tremendous light is shed simply by choosing an appropriate notation, meaning one whose syntax correctly suggests an analogy with something we already understand. Let’s try that.

## Swapping notations

Let’s get away from the \(\eta\)/\(\mu\) notations and use something a little more bold. For \(a : A\) let us write \(\eta_A a : F A\) as \[\{a \} : F A,\] and given some \(t : F (F A)\) let us write \(\mu_A t\) as \[\sqcup t : F A\] (\(\sqcup\) is a stylized version of \(\cup\)). The notation is meant to suggest that \(\{a \}\) should be thought of as analogous to a singleton set, one containing just the element \(a : A\). Of course, polymorphism means \(A\) isn’t special, and indeed we’ll later form singleton “containers” whose sole “element” is another “container.” The notation also suggests that the join operation is a kind of union, but take note: This is a *unary* union, one that takes a container-of-containers and flattens it into just a container (of values). This stands in contrast to a binary union, meaning an operation that joins a pair of containers (arguably the unary version is more fundamental). Thus we should see the monad operations as providing a way of forming “singleton” containers and a way of flattening a sequence of nested containers (indeed `join`

is sometimes called `flatten`

).

To justfy this provacative notation we must now turn to the equational theory. With this notation the monad laws take this form:

- For all \(t : F A\), \[\sqcup \{ t \} = t\]
- For all \(t : F A\), \[\sqcup \left( \mathrm{fmap}^F (a \mapsto \{a\})\ t \right) = t\]
- For all \(t : F (F (F A))\), \[\sqcup \left(\sqcup t\right) = \sqcup \left( \mathrm{fmap}^F (\sqcup)\ t\right)\]

Let’s go point-by-point. The first law \[\sqcup \{ t \} = t\] works as follows. Assume \(t : F A\) is some container of values of type \(A\). Now suppose we form the singleton collection \(\{t\}\) that consists of just \(t\). This is a container-of-containers, technically, even though the outermost container is sort of trivial. Now imagine we apply the union operation to obtain a flat container, one without nesting. The first law of monads is that you don’t talk about monads. I mean, the first law of monads is that taking the “union” of a “singleton” \(\{t\}\) returns \(t\) itself. This is clearly true about sets, for instance. To see this, suppose \(X\) is a set of sets of natural numbers (say). Now \(\cup X\) is defined \[x \in \cup \{X\} \iff \exists Y \in \{X\} \textrm{ and } y \in Y.\] Now \(Y \in \{X\} \iff Y = X\), so one readily sees \(x \in \cup \{X\} \iff x \in X\) hence \(\cup \{X\} = X\). Thus using a set-like notation suggests a principle we know is true of sets, but by definition is true of all monads—evidently there is something set-like about all monads.

To use an example of lists, if `x : list A`

is a list, then applying the concatenation operation `concat`

to `[x]`

just returns `x`

,^{5} suggesting that concatenation is an analogue of set-theoretic union for lists. Later I’ll sketch out how to extend this intuition to cover the Reader monad, where the analogy to containers is less obvious but still applicable in my opinion.

Onto the second law: \[\sqcup \left( \mathrm{fmap}^F (a \mapsto \{a\})\ t \right) = t\] Start again with some container \(t : F A\). Now map over \(t\) with the operation that replaces each contained value \(a : A\) with the singleton \(\{a\}\). Once again we find this gives us a container-of-containers. Previously we formed \(\{t\},\) a container-of-containers with a singleton at the *top* level, or a trivial container holding (more general) containers. This time we form a container-of-containers by replacing the contained values with singletons, yielding a (general) container whose contained containers are trivial. Once again, collapsing these nested containers must return the original \(t\). In set theory this works as follows. Let \(X\) be a set. Form the set \(Y\) by defining \[Y = \{ \{x\} | x \in X\}.\] Now \(x \in \cup Y \iff \exists y \in Y \textrm{ and } x \in y\). One finds that \(x \in \cup Y \iff x \in X\), so \(\cup Y = X\). Once again we have found that the language and notation of “containers” has not misled us, providing more justification for the monads-as-containers metaphor.

It’s not too hard to see the preceding two laws are analogous to the “left” and “right” identity laws of ordinary monoids. Starting with a container \(t\), forming singleton containers at the top-level (\(\{t\}\)) or at the bottom level (\(\mathrm{fmap}^F (a \mapsto \{a\})\ t\)) followed by the flattening operation returns the original container. So the monoid operation is flattening, the formation of singletons is like the monoid identity, and “left” and “right” mean at top-level and bottom-level of a container-of-containers, respectively, i.e. the left and right side of a composition of functors like \(F \circ G\). Pretty nifty, no?

Now can you figure out what the third law is saying in terms of containers? Start with some \(t : F (F (F A))\), a container-of-containers-of-containers, triply nested. The final monad law states that collapsing this hierarchy is associative, meaning:

- We can elect to flatten the outer nesting, flattening \(t\) into \(\mu_{F A} t : F (F A)\), a flattened container of unflattened containers. Then we can repeat the process, flattening this down to \(\mu_A \left(\mu_{F A} t\right)\), a totally flat container.
**OR** - We can elect to map over the outer layer first, leaving it untouched, operating on each contained value of type \(F (F A)\) by flattening it with \(\mu_A\). This yields \(\mathrm{fmap}^F \mu_A\ t\), an unflattened container containing flattened containers. This is still a container-of-containers, so we then flatten this down to a container, giving \(\mu_A \left(\mathrm{fmap}^F \mu_A\ t\right)\).

The third monad law is that these processes are equivalent—we can flatten a triply-nested sequence of containers in whatever order we please. And by extension, *any* amount of nesting can be totally flattened in a unique way.

The foregoing is nothing more or less than the definition of monads (well, one of two common definitions, the other given in terms of `bind`

which I’ll save for another time). So it’s not a question of “Can we think of monads as generalized containers?” I’d say clearly we can. Both the language of containers and even the bold decision to employ a set-theoretic notation for the monad operations leads to an intuitive understanding of the monad laws. Admittedly, however, just because we can doesn’t mean we always should. Below we’ll see an example where we may or may not want to use this analogy. First, a disclaimer and a digression about how analogies serve to enrich our understanding of abstract mathematical concepts.

## The role of analogies

The container analogy has to be taken with a grain of salt. It’s true that some monads really shouldn’t be described as “containers,” even if we just think about the ones used in functional programming. So I’m not saying that you should *restrict* your understanding of monads to fit into this analogy—that’s not how we should use analogies in mathematics. Just the opposite, I’m saying you should *expand* your notion of “container” to encompass monads, or at least see the strong commonality between these two ideas, even if we decide sometimes to drop the word “container.” The key is to realize that the monad laws are exactly the equations of well-behaved singleton- and union-forming operations. Even if we decide that some monads really aren’t containers, it’s more than fair to describe them as generalizations of containers. It’s just that we need to understand the force of the word “generalize,” which here means keeping the singleton/union laws while dropping any requirements that we can actually reach in and grab the “elements.” At the very least, I think we can agree that monadic values can be described as wrapped in a “context,” where the laws of contexts directly generalize those of containers.

I’m not saying… you should restrict your understanding of monads to fit into this analogy… I’m saying you should expand your notion of “container” to encompass monads.

Recall the claim that it is profitable to recognize a certain phenomenon, that of a generalized monoid, which arises in many seemingly unrelated settings. In the ordinary case of \(\mathrm{Set}\) we just get ordinary monoids, and we can use this fact to lift our intuition for monoids and apply it in scenarios we previously wouldn’t have imagined, such as in the category \(\mathrm{End}_\mathcal{C}\). As an example of this utility, in my own research I have gotten a lot of mileage out of the idea of a “monad acting on an fuctor from the right,” or simply a right monad action. This is a slightly more general structure than monads, capturing precisely the set of functors \(F\) with a monad-like `bind`

operation. It has been very helpful to recognize that these are just the \(\mathrm{End}_\mathcal{C}\) analogue of the idea of (right) monoidal actions, a/k/a \(M\textrm{-sets}\).^{6}

Just as we can lift our intuitions for monoids to think about generalized category-theoretic monoids, we can think analogously about how concrete container types relate to monads.^{7} Here, the general phenomenon is that of a well-behaved singleton/union operational pair. In ordinary cases we think of very concrete kinds of containers: lists, finite sets, binary trees, etc. But by isolating just the singleton/union laws we can carry some of our intuitions further, finding that “notions of computation” (a phrase coined by Moggi to describe certain monads) also satisfy the same laws, thereby generalizing containers in some sense. So while monads aren’t always immediately similar to concrete containers, they do share the same sort of \(\eta/\mu\) pair subject to the same equational laws. Appreciating this analogy is part of how one reaches monad Zen.

## Case study: Reader

But I think I should at least spell out how this applies to a real-world example of a non-container. I’ll use the Reader monad. Let some type \(R\) be fixed. Monadic values have types of the form \(R \to A\), and we think of such computations as those that compute a value of type \(A\) assuming we can read (but not write) an environment of type \(R\). I’ll write \(e : R\) for values of type \(R\) (“\(e\)” for “environment”).

Given some \(a : A\), the “singleton” \(\{a\}\) for the Reader monad is the constant function \(e \mapsto a\) (“\(\mapsto\)” is pronounced “maps to”). The join operation accepts some \(x : R \to (R \to A)\) and returns \((e \mapsto x e e) : R \to A\). That is, \(x\) is an environment-dependent operation that computes another environment-dependent operation, and the join of \(x\) is the operation that executes \(x\) and then executes its output *in the same environment.*

Now is \(e \mapsto a\) really a singleton container? No, clearly not in a literal sense. If we try hard, we *can* view computations in Reader as somewhat container-like. We do this the same way we see a value of type \(\mathbb{N} \to A\) as an infinite stream of \(A\) values, or the way that a pair of \(A\)s can be seen as a value of type \(\mathrm{bool} \to A\), for example. That is, we read type \(R \to A\) as “R-indexed multisets of \(A\)-values” (multisets because the same \(A\) value can occur for distinct indices \(e_1, e_2 : R\)). From this perspective all Reader values are multisets, and Reader is literally a kind of container type. But this is something of a red herring because if we’re being too literal this can contradict the monads-as-containers narrative I’ve been developing. “Sure,” you say, “you could argue that values of type \(R \to A\) represent a container of \(R\)-many \(A\) values if you really squint at it. But you told me \(\eta\) creates singletons, and the ‘singleton’ \(e \mapsto a\) has cardinality \(R\). What gives?”

Remember, the “singleton”-like natural of \(\eta\) comes from the equations it satisfies with \(\mu\), not so much as a standalone idea. The intuitive idea behind Reader, captured in the definition of \(\eta\)/\(\mu\), is that we are considering computations that run in some (yet unknown, unmodifiable) environment. So while \(e \mapsto a\) literally looks like a bag of values all equal to \(a\), its “singleton” nature comes from the fact that there is only one possible output of the computation, *regardless* of the environment. Contrast this with arbitrary values of type \(R \to A\), which are not “singletons” insofar as there is some question of what the final value will be, depending on the environment. It’s the total determinism of \(e \mapsto a\), contrasted with the non-determinism of arbitrary Reader computations, that makes the Reader \(\eta\) a “singleton.”

The monad laws are easily verified although not particularly interesting. Recall \(\mathrm{fmap}^{Reader}_{A,B} f\ x = (f \circ x) = (e \mapsto f (x e))\).

- Given \(x : R \to A\), the singleton is \(e \mapsto x\), while the join of this is \(e \mapsto \left((e \mapsto x) e e\right) = e \mapsto x e\), which is equal to \(x\).
- Given \(x : R \to A\), the set obtained by mapping \(\{\textrm{-}\}\) over \(x\) is \(e \mapsto (\{x e\}) = e \mapsto x\), and again the join is equal to \(x\).
- Given \(x : R \to (R \to (R \to A))\), collapsing \(x\) with the two choices of strategy will both yield \(e \mapsto \left(e \mapsto x e e\right) e e\) or just \(e \mapsto x e e e\).

So this example illustrates that monads don’t immediately look like containers. This doesn’t contradict the analogy any more than the fact that monads don’t immediately look like monoids contradicts the fact that they are monoids in \(\textrm{End}_\mathcal{C}\). In the case of Reader, we saw that the computations can be understood as \(R\)-sized multisets, which admittedly doesn’t immediately mesh with the container intuition without taking some care not to interpret “singleton” too literally. The point is that your mileage may vary applying the analogy to your favorite monad, but importantly *the analogy is completely formal*—the equational theory of \(\{ - \}\) and \(\sqcup\) is shared by all monads, meaning there is a fully rigorous but quite abstract sense in which all monads are container-like.

## Final thoughts: Parallelism in \(\mathrm{End}_\mathcal{C}\)

I claimed that monoidal categories have a notion of parallel composition and that this is necessary to define generalized monoids. You may wonder where the “parallelism” of \(\mathrm{End}_\mathcal{C}\) comes into play when we defined monads. Parallelism in \(\mathrm{End}_\mathcal{C}\) is precisely our ability to compose functors and (more to the point) *target individual layers of nested functors* with natural transformations. For instance, for some composition of functors \(G \circ F\), we can apply a natural transformation to \(F\) without distburing \(G\) (by \(\textrm{fmap}\)-ing over \(G\)), and likewise we can apply a natural transformation to \(G\) without disturbing \(F\). This stands in contrast to ordinary (sequential/vertical) composition that allows us to chain several natural transformations, e.g. \(\eta : F_1 \Rightarrow F_2\) and \(\epsilon : F_2 \Rightarrow F_3\) compose to give \((\epsilon \circ \eta) : F_1 \Rightarrow F_3\).

If you don’t know Coq, a value of type

`nat -> Prop`

is a predicate, i.e. a function that accepts a natural number and returns a proposition. There is no suggestion that we can compute whether the proposition is true, so we cannot enumerate the “contents” of a predicate if we think of it as a set.↩︎In full generality category theory doesn’t let us talk about “elements” of objects like this. To be overly formal we could describe terms as morphisms from the unit type. Let’s not be overly formal.↩︎

I’m not aware of a standard notation for this operation. Observe there are two possible definitions depending on which transformation to apply first, but this sequencing is irrelevant and the definitions are equivalent by naturality. This is why \(\square\) is seen as a kind of “horizontal” or “parallel” composition.↩︎

Sets form a monoidal category w.r.t. the Cartesian product, modulo some business about natural isomorphisms. But for purposes of defining “monoid,” the Cartesian product isn’t so special. With \(\mathrm{End}_\mathcal{C}\) we use plain old composition of functors, which is nothing like a Cartesian product and hence monoids here look nothing like ordinary monoids.↩︎

Recall

`concat ([x, y, z, ...]) = x ++ y ++ z + ... + []`

so`concat [x] = x ++ [] = x`

.↩︎The left/right distinction is much more significant in \(\mathrm{End}_\mathcal{C}\) than in \(\textrm{Set}\), owing to the fact that the Cartesian product is essentially commutative while composition of functors isn’t.↩︎

The monads-are-containers analogy isn’t as formal as the monads-are-monoids one because we don’t have a precise definition of containers. If we really wanted to make this precise, I would argue that finitary containers are by definition the

*traversable monads*(something I define in Tealeaves), in which case monads are containers with the traversability condition dropped. “Traversability” is an abstract idea but it captures, among other things, the ability to enumerate the elements of the container. By sacrificing this condition we can talk about container-like types for which we cannot enumerate the elements.↩︎