Interpolation and operads

I’ve been thinking a bit about interpolation recently. In animation programming, we often use interpolation as a way to ensure continuity when transitioning from one animation to another. The most basic type of interpolation is linear interpolation, given by
$$
a +_s b = sa + (1-s)b\,,
$$
for vectors \(a, b\) and weight \(s\). Another common type is spherical interpolation, where we interpolate between two rotations (represented as unit quaternions) along a shared great circle in the \(3\)-sphere.

This got me thinking – can we axiomatize the common properties of different interpolation operators? Let \(+_t\) be a family of binary operators on some set \(A\), indexed by \([0, 1]\). What properties might we ask such a family of operators to satisfy in order to qualify as a form of interpolation? We can ask for a kind of commutativity:

$$
a +_t b = b +_{1-t} a\,.
$$
If we work hard enough, we can also ask for a kind of associativity:
$$
(a +_s b) +_t c = a +_{st} (b +_{\frac{t-st}{1-st}}c)\,,
$$
if \(s\) and \(t\) are not both \(1\), and
$$
(a +_1 b) +_1 c = a +_1 ( b +_t c )
$$
for any \(t\).

We might also want a kind of idempotence:
$$
a +_t a = a\,,
$$
and special behaviour for the endpoints \(1\) and \(0\):
$$
a +_0 b = a\,.
$$

These raise a few questions.

  • Is there a name for these algebraic structures?
  • Does this generalize to an algebraic theory that works in categories other than the category of sets? (For example, monoids can be defined in any monoidal category, commutative monoids in any symmetric monoidal category, and groups in any Cartesian category.)
  • In what sense is it correct to use the words ‘associativity’, ‘commutativity’ and ‘idempotence’ to refer to these rules as we have done?
  • If we have a rule for associativity, why isn’t there a rule for an identity element?
  • Is there a neater presentation (particularly with regards to the associativity rule)?

Let’s start with associativity. A monoid is a set \(M\) together with an associative binary operator \(M \times M \to M\) that has an identity element. But there’s an alternative definition of monoid that I quite like. For this definition, instead of defining a single binary operation, we instead ask that \(M\) should be equipped with an \(n\)-ary operator \(\_*\cdots*\_ \colon M^n \to M\) for each \(n = 0, 1, \cdots\). Furthermore, we ask that these operations should be compatible in the following sense, which can be thought of as a generalized associativity rule.
$$
( a_{1,1} * \cdots * a_{1,k_1} ) * \cdots * ( a_{n,1} * \cdots * a_{n,k_n} ) = a_{1,1} * \cdots * a_{1,k_1} * \cdots * a_{n,1} * \cdots * a_{n,k_n}
$$
This is the same as a monoid, because if we have the binary operation \(\_*\_\), then we can construct the \(n\)-ary operation (for \(n\ge2\)) by repeated application; associativity then tells us precisely that the \(n\)-ary operation so constructed is well-defined: it does not depend on the order in which we chain together the binary operation.

The existence and defining property of the identity element is the \(0\)-ary part of this definition: the \(0\)-ary operation \(1 \to M\) picks out an element \(e\) (thought of as an empty product), and then the above rule tells us that for any \(x\), we must have
$$
e * x = x\,,
$$
taking \(n = 2\), \(k_1=0\), \(k_2 = 1\) and \(a_{1,1}=x\) (and, similarly, that \(x * e = x\)).

Let’s now turn to our interpolation theory. If we start with our binary operations \(+_s\) and apply them repeatedly, what are the \(n\)-ary operations that we’re expecting to get out? A little thought reveals that \(n\)-ary operations should correspond to sequences \(s_1,\cdots,s_n\) of non-negative real numbers that sum to \(1\). Given such a sequence, we write the corresponding operation as $$ s_1a_1 + \cdots + s_na_n\,, $$ where our original \(a +_s b\) would now be written as \(sa + (1 – s)b\). This is still an abstract \(n\)-ary operator – we don’t have a concept of addition and multiplication.

The corresponding generalized associativity rule is as follows. $$ s_1 ( t_{1,1}a_{1,1} + \cdots + t_{1,k_1}a_{1,k_1} ) + \cdots + s_n ( t_{n,1}a_{n,1} + \cdots + t_{n,k_n}a_{n,k_n} ) = s_1t_{1,1}\,a_{1,1} + \cdots + s_1t_{1,k_1}\,a_{1,k_1} + \cdots + s_nt_{n,1}\,a_{n,1} + \cdots + s_nt_{n,k_n}\,a_{n,k_n}\,.
$$
This also answers our question about why there is no identity rule in our theory. There is no \(0\)-length list of non-negative real numbers that sums to \(1\).

Enter operads

The last section revealed a nice little structure on the collection of lists of non-negative real numbers that sum to \(1\). Given natural numbers \(n, k_1,\cdots,k_n\) and sequences of non-negative real numbers
$$
s_1,\cdots s_n\,, \
t_{1,1},\cdots,t_{1,k_1}\,,
\cdots\,,
t_{n,1},\cdots,t_{n,k_n}\,,
$$
each summing to \(1\), we can form a new list of length \(k_1 + \cdots + k_n\), also summing to \(1\), given by
$$
s_1t_{1,1}, \cdots, s_1t_{1,k_1}, \cdots, s_nt_{n,1}, \cdots, s_nt_{n,k_n}\,.
$$
Moreover, this ‘composition’ operation satisfies a natural kind of associativity: if we write \(C(n)\) for the set of all sequences \(s_1,\cdots,s_n\) that sum to \(1\), then the following diagram commutes.
$$
\require{AMScd} \begin{CD} \prod_{i=1}^n\prod_{j=1}^{k_i}C(l_{i,j}) \times \prod_{i=1}^n C(k_i) \times C(n) @>>> \prod_{i=1}^n\prod_{j=1}^{k_i}C(l_{i,j})\times C\left(\sum k_i\right)\\ @VVV @VVV \\ \prod_{i=1}^n C\left(\sum_{j=1}^{k_i}l_{i,j}\right) \times C(n) @>>> C\left(\sum_{i=1}^n\sum_{j=1}^{k_i}l_{i,j}\right) \end{CD} $$
Lastly, there is a ‘unit’ for the composition operation: the \(1\)-element sequence \(1\). Indeed, if we write our composition operator as \((\_;\cdots;\_)\circ\_\), then we have
\begin{align}
(s_1,\cdots,s_n)\circ 1 &= s_1,\cdots,s_n\,;\\
(1;\cdots;1)\circ(s_1,\cdots,s_n) &= s_1,\cdots,s_n\,,
\end{align}
for all sequences \(s_1,\cdots,s_n\in C(n)\).

If you find the diagrams and equations above confusing, then I recommend sitting down and trying to write down the natural definitions that come to mind. You should end up with something similar.

Is there a name for this kind of structure? Yes! It’s called an operad. As the name suggests, it helps very much to think of the elements of \(C(n)\) as being like \(n\)-ary operations: functions that take \(n\) inputs and have one output.

An element of an operad canbe thought of as an abstract \(n\)-ary operation

The composition rule then corresponds to taking the outputs of \(n\) \(k_i\)-ary operations and plugging them all into a single \(n\)-ary operation in order to generate a \(k_1 + \cdots + k_n\)-ary operation. For example, we might compose a ternary operator, a unary operator and a binary operator with another ternary operator to get a \(6\)-ary operator.

Our associativity rule then tells us that if we have three layers of operations, then we get the same result by composing the bottom two first that we do if we compose the top two first, so that it is well defined to draw a tree with three or more levels without specifying the order of composition.

Lastly, the identity rule gives us a single unary operation that is an identity for composition. This means that we can use it to ‘skip’ a level. Together with the associativity rule, this allows us to do away with the ‘level’ concept altogether when reasoning using string diagrams.

It’s important to remember that the elements of an operad are not in general themselves \(n\)-ary operators (as our sequences of real numbers example demonstrates), but that they admit operations very similar to those that are satisfied by real \(n\)-ary operators.

However, if \(A\) is a set and we set \([A^n,A]\) to be the set of all functions \(A^n \to A\) (in other words, actual \(n\)-ary operators on \(A\)), then the sets \([A^n,A]\) do naturally form an operad (via composition of functions) whose elements really are \(n\)-ary operators. Given an abstract operad \(O\), an algebra for \(O\) on a set \(A\) is defined to be an operad morphism \(O(-) \to [A^-,A]\); i.e., collection of functions \(O(n) \to [A^n,A]\) that commute with the composition and identity functions in the natural way. Essentially, this means that for each abstract ‘operator’ \(o\in O(n)\), we pick out an actual \(n\)-ary operator on \(A\) in such a way that the internal composition in \(O(-)\) corresponds to actual composition of functions (and likewise with the identity).

There is nothing particularly special about the category of sets, either. Any monoidal category \(\mathcal M\) admits a notion of \(n\)-ary operator: a morphism \(a \otimes \cdots \otimes a \to a\) for each object \(a\) of \(\mathcal M\). For a given \(a\), these sets of morphisms form an operad and we may define algebras for our operad \(O(-)\) on \(a\).

We’re getting somewhere. A little thought shows that if \(A\) is a set, then algebras for our operad \(C(-)\) on \(A\) correspond exactly to collections of binary operations \(\_+_s\_\) (for \(s\in[0,1]\)) that satisfy our original ‘associativity’ rule. Moreover, this definition extends very naturally on to any monoidal category.

Exercise

Show that there exists an operad whose algebras are monoids. (Work in the category of sets if you’re uncomfortable with monoids in arbitrary monoidal categories, but the operad you’ll come up with will work in that setting too.)

Symmetric operads

Having tackled the associativity rule, let’s move on to commutativity. It turns out that the operads in the previous section are not sufficient to describe this rule. One reason for that is as follows. We saw in the previous section that an operad admits a notion of algebra in any monoidal category. But it is not in general possible even to define the notion of commutativity in such a category, because we have no way to swap the order of variables. If we wanted to say that a binary operator \(\_+\_\colon A \times A \to A\) on a set \(A\) was commutative in the ordinary sense, for example, we could do this by asking that \(\_+\_\) was equal to the composite \((\_+\_) \circ \textrm{sym}\), where \(\textrm{sym}\colon A \times A \to A \times A\) swaps the pair around. But the monoidal product in an arbitrary monoidal category might not come equipped with such a ‘symmetry’ morphism.

We need a new type of operad to handle commutativity, and it’s called a symmetric, permutative or nonplanar operad (Actually, it’s more often called an operad, while the operads we met in the previous section are called nonpermutative or planar.) In this form of the operad, the wires that we use to join operations together are now allowed to cross.

(This is where the name nonplanar comes from.) At the formal level, each set \(O(n)\) is now equipped with an action of the symmetric group \(\mathfrak S_n\). These actions are required to be compatible with the composition in a way that is rather tedious to write down, as it involves combining multiple permutations together to form a larger one. It’s a good exercise to try and write down the rules yourself.

In any case, our operad \(C(-)\) has this structure: given a sequence \(s_1,\cdots,s_n\) that sums to \(1\), we can apply a permutation \(\pi\in\mathfrak S_n\) to it to get the sequence
$$
s_{\pi(1)},\cdots,s_{\pi(n)}\,.
$$

Given a symmetric monoidal category, \(\mathcal S\), the collection of \(n\)-ary morphisms \(a \otimes \cdots \otimes a \to a\) forms a symmetric operad, and this gives us a notion of an algebra for a symmetric operad on any object of a symmetric monoidal category. The action of a permutation \(\pi \in \mathfrak S_n\) on \(O(n)\) is required to correspond to pre-composition with the natural symmetry \(a\otimes \cdots \otimes a \to a\otimes \cdots \otimes a\) corresponding to \(\pi\).

What does this mean for our operad \(C(-)\)? If we take \(\mathcal S\) to be the category of sets with Cartesian product, then it says that we must always have
$$
s_1a_1 + \cdots + s_na_n = s_{\pi(1)}a_{\pi(1)} + \cdots + s_{\pi(n)}a_{\pi(n)}
$$
for any permutation $\pi\in\mathfrak S_n$, which is precisely what the iterated version of our original ‘commutativity’ rule tells us. Therefore, algebras for \(C(-)\), now with its symmetric operad structure, on a set \(A\) correspond to families \(\_+_s\_\) of binary operations satisfying our associativity and commutativity rules.

Exercises

  1. Show that there exists a symmetric operad whose algebras are commutative monoids.
  2. Show that there exists a symmetric operad whose algebras (in a symmetric monoidal category) are (not necessarily commutative) monoids. This should be different from the planar operad of monoids that we met earlier.

Terminal operads

We have briefly touched on the definition of a morphism of operads, but let’s give it here explicitly. Given operads \(O’\) and \(O\), a morphism \(O’ \to O\) is given by functions \(\phi_n \colon O'(n) \to O(n)\) such that (writing operadic composition as \((\_,\cdots,\_);\_\)) we have
$$
\phi_{k_1+\cdots+k_n}((o_1,\cdots,o_n);p) = (\phi_{k_1}(o_1),\cdots\phi_{k_n}(o_n));\phi_n(p)
$$
for \(o_i\in O'(k_i), p\in O'(n)\), and
$$
\phi_1(e) = e\,,
$$
for \(e\) the identity element of each operad. A morphism of symmetric operads is a morphism between the underlying planar operads that also respects the actions of \(\mathfrak S_n\).

This makes operads (or symmetric operads) into a category. A terminal object of a category \(\mathcal C\) is an object \(()\) such that for any other object \(a\) of \(\mathcal C\), there is a unique morphism \(a \to ()\). For example, in the category of sets, the terminal objects are the one-element sets. There is a terminal object in the category of operads, given by a one-element set at each level. This is sometimes called \(\mathbf{Assoc}\), or the associative operad.
$$ \mathbf{Assoc}(n) = \{*\} $$ This name comes from the fact that the algebras for this operad are objects with an associative (and unital) binary operation on them; i.e., monoids. If you have done the exercises then you will have come across it already. It is not difficult to see why this operad is terminal: given any other operad \(O\), there is a unique function \(O(n) \to {}\) for each \(n\), and the equation above is automatically satisfied, since both sides are functions into one-element sets, so are necessarily equal.

A consequence of this is that a monoid is automatically an algebra for any operad \(O\) that you can define. Indeed, we previously defined an algebra for an operad on an object \(s\) of a monoidal category to be an operad morphism into the operad
$$ [a^{\otimes n},a] $$
given by morphisms \(a \otimes \cdots \otimes A \to a\).

That means that a monoid on \(a\) is given by an operad morphism
$$ \mathbf{Assoc} \to [a^{\otimes \_},a]\,. $$
If \(O\) is another operad, then we have a unique operad morphism \(O \to \mathbf{Assoc}\), which we can compose with this morphism to give us the structure of an \(O\)-algebra on \(a\).
$$ O \to \mathbf{Assoc} \to [a^{\otimes \_},a]
$$
This is borne out by our operad \(C\) of sequences of reals summing to \(1\): given a monoid on a set \(M\) with binary operation \(*\), we can set \(a +_s b = a * b\) for all \(s\) and get something that satisfies the ‘associativity’ axiom that we gave for the operators \(\_+_s\_\). This shows that we were in some sense justified in using the word ‘associativity’ to describe that axiom.

The category of symmetric operads has a terminal object too. Once again, this is given by a singleton set at each level; the action of the symmetric groups is trivial. If you’ve done the exercise, you’ll know that the algebras for this symmetric operad are precisely the commutative monoids; hence, this is often called the commutative operad or \(\mathbf{Comm}\). You’ll also know that there is another symmetric operad whose algebras are not-necessarily commutative monoids (in a symmetric monoidal category). This is also called \(\mathbf{Assoc}\), since it represents the same algebraic theory, but is now given by
$$
\mathbf{Assoc}(n) = \mathfrak{S}_n\,,
$$
where the action of \(\mathfrak S_n\) is by composition of permutations. Since \(\mathbf{Comm}\) is terminal in the category of operads, we have a natural operad morphism \(\mathbf{Assoc} \to \mathbf{Comm}\), giving us a natural way to interpret any algebra for \(\mathbf{Comm}\) as an algebra for \(\mathbf{Assoc}\). This is not surprising, because:

Every commutative monoid is a monoid.

We can also link things to our operad \(C\) of sequences of reals. Given a commutative monoid with operation \(\_+\_\), we can set \(a +_s b = a + b\) for all \(s\) to get a model of both our ‘associativity’ and ‘commutativity’ axioms. This justifies our use of the word ‘commutativity’ to describe that axiom.

Cartesian Operads

Let’s get back to our operad of sequences of non-negative real numbers summing to \(1\). So far, we have managed to capture our original ‘associativity’ and ‘commutativity’ rules, but there are still two left – the ‘idempotence’ rule
$$
a +_t a = a
$$
and the rule for \(0\)
$$
a +_0 b = b\,.
$$
The operads we have met so far are not sufficient to capture these notions. To see why, let’s consider ordinary idempotence. Given a binary operation \(\_*\_\colon A \times A \to A\) on a set \(A\), we say that \(\_*\_\) is idempotent if the composite
$$ \require{AMSmath} A \xrightarrow{\Delta} A \times A \xrightarrow{\_*\_} A
$$
is equal to the identity. Notice that we needed to bring in the diagonal map \(\Delta\) in order to make this definition. This is because in the defining equation for idempotence,
$$
a + a = a\,,
$$
the variable \(a\) is duplicated on the left hand side.

The operads we have met give us algebras in arbitrary symmetric monoidal categories. But not all symmetric monoidal categories admit a sensible morphism \(a \to a \otimes a\) to take on the role of the diagonal. Nor do they necessarily admit a morphism \(a \to I\) to allow us to forget variables. This means that since our ‘idempotence’ rule duplicates the variable \(a\) on the left, and since our zero rule forgets it on the right, neither of these rules makes sense in an arbitrary symmetric monoidal category. And so it wouldn’t make sense to try to find an operad to handle either of these equations.

In order to handle these rules, we need to extend our definition of an operad further. A Cartesian operad is like a symmetric operad, except that now each set \(O(n)\) admits an action not only of permutations \([n] \to [n]\), but of any function \(f\colon[n] \to [m]\). This means that such a function \(f\) gives rise to a function
$$
O(f) \colon O(n) \to O(m)\,.
$$
This assignment should make \(O\) into a functor; i.e., we should have \(O(g\circ f) = O(g)\circ O(f)\), and \(O(\textrm{id}) = \textrm{id}\). In addition, it should be compatible with the composition operator of the operad (I will leave it as an exercise to write down exactly what this means).

Sequences of non-negative real numbers that sum to \(1\) form a Cartesian operad. Indeed, given a function \(f \colon [n] \to [m]\) and a sequence \(a_1, \cdots, a_n\) of length \(n\) of non-negative real numbers that sum to \(1\), we can define a sequence of length \(m\), still summing to \(1\), given by
$$ (C(f)(a))_i = \sum_{p\in f^{-1}({i})} a_p\,.
$$

Now it is also possible to define a suitable notion of algebras for Cartesian operads. Given a Cartesian category \(\mathcal C\) (i.e., a monoidal category \(\mathcal C\) whose monoidal product is a category-theoretic Cartesian product) and an object \(A\) of \(\mathcal C\), we get a Cartesian operad
$$ [A^\_,A] $$
given at level \(n\) by the set of morphisms \(A^n \to A\), where the action of functions \(f \colon [n] \to [m]\) is given by pre-composition with the projection
$$ \langle \textrm{pr}_{f(1)}, \cdots, \textrm{pr}_{f(n)} \rangle \colon A^m \to A^n\,.
$$
This allows us to define an algebra for a Cartesian operad \(O\) on the object \(A\): it is an operad morphism \(O \to [A^\_, A]\) – in other words, a collection of actual operators \(A^n \to A\) replacing the abstract operators of the operad such that composition and function action in the operad correspond to composition of functions and pre-composition with projections as above. What do algebras for our operad \(C\) look like if we add in the Cartesian structure? To start with, since a morphism of Cartesian operads is in particular a morphism between the underlying symmetric operads, such an algebra must already satisfy all the rules we have put in place so far — namely, the associativity and commutativity rules. But now that we have to respect the Cartesian structure as well, we see that algebras for \(C\) with its Cartesian structure must in addition satisfy our idempotence and zero laws.

Indeed, suppose we have such an algebra on a set \(A\). Taking \(t\) to be the function \([2] \to [1]\) that sends both elements of \([2]\) to the unique element of \([1]\), we have the map
$$
C(t) \colon C([2]) \to C([1])
$$
sending the sequence \((s, 1-s)\) to the sequence \((1)\). Since \((1)\) is the operad identity, the operator \(+_s\) in the algebra must therefore satisfy
$$
a +_s a = a
$$
for all \(a \in A\).

Similarly, let \(c_2\) be the function \([1] \to [2]\) that sends \(1\) to \(2\). Then the function
$$
C(c_2) \colon C([1]) \to C([2])
$$
will send the unique unary length-\(1\) sequence \((1)\in C([1])\) to the length-\(2\) sequence \((0, 1)\). This means that we have
$$
b = a +_0 b
$$
for any \(a, b\in A\).

So at last we arrive at a possible alternative definition for the algebraic structures we started with: they are algebras for the Cartesian operad of finite sequences of real numbers summing to \(1\).

Lawvere Theories

A nice fact about Cartesian operads is that they are equivalent to things called Lawvere Theories (In fact, because they are equivalent, you can use either ‘Cartesian operad’ or ‘Lawvere theory’ to describe both concepts). To start with, Lawvere theories look quite different from the operads we have met so far. A Lawvere theory is defined to be a category \(\mathcal L\) whose objects are the natural numbers and which is such that for any \(m, n\), the object \(m + n\) is the category-theoretic product of \(m\) and \(n\). A morphism of Lawvere theories is defined to be a finite-product-preserving functor between categories.

I’ll stick to an informal description of how to get from a Cartesian operad to a Lawvere theory and vice versa, rather than giving a complete proof of equivalence. If we start with a Lawvere theory \(\mathcal L\), then we can define a Cartesian operad \(\bar{\mathcal L}\), where \(\bar{\mathcal L}(n)\) is the set of morphisms \(n \to 1\) in \(\mathcal L\). Given morphisms \(f_i \colon k_i \to 1\) for \(i = 1,\cdots,n\) and a morphism \(g \colon n \to 1\), the fact that \(k_1 + \cdots + k_n\) is the category theoretic product of the \(k_i\) and that \(n\) is the category-theoretic product of \(n\) copies of \(1\), allows us to compose these morphisms together into a morphism \((f_1,\cdots,f_n);g \colon k_1 + \cdots + k_n \to 1\). Given a function \(q \colon [m] \to [n]\), the action of \(q\) is precomposition with the morphism
$$ \langle \textrm{pr}_{q(1)}, \cdots, \textrm{pr}_{q(m)} \rangle \colon n = \underbrace{1 \times \cdots \times 1}_n \to \underbrace{1 \times \cdots \times 1}_m = m\,.
$$

In the other direction, suppose \(O\) is some Cartesian operad. Define the category \(\tilde{O}\) to have the natural numbers for objects and so that the morphisms \(m\to n\) are \(n\)-tuples of elements of \(O(m)\).

This means that we have \(\tilde O(k, m + n) \cong \tilde O(k, m) \times \tilde O(k, n)\), which is ultimately the reason why \(m+n\) is indeed the category-theoretic product of \(m\) and \(n\).

It remains to explain what composition of morphisms is in \(\tilde O\). Let \(m, n, k\) be natural numbers. Then we are looking for a way to take \(n\) elements of \(O(m)\) and \(k\) elements of \(O(n)\) and put them together to get \(n\) elements of \(O(m)\).

This doesn’t quite match up with our usual operadic notion of composition, which shouldn’t be a surprise, since the operadic composition in \(\bar{\mathcal L}\) didn’t line up with ordinary composition of morphisms in \(\mathcal L\). In order to define composition in \(\tilde O\), we need to imagine that we have \(m\) inputs that are shared between all \(n\) of our \(m\)-ary operators. Feeding these \(m\) inputs into each \(m\)-ary operator gives us \(n\) outputs, which we can then feed into each of the \(k\) \(n\)-ary operators in order to get \(k\) final outputs.

In symbols, let \(o_1,\cdots,o_n\in O(m)\) be a sequence of \(m\)-ary abstract operators and let \(p_1,\cdots,p_k\in O(n)\) be a sequence of \(k\)-ary abstract operators. for each \(j = 1, \cdots, k\), ordinary operadic composition would allow us to compose \(p_j\) with all of the \(o_i\) to get an \(mn\)-ary operator
$$
(o_1,\cdots,o_n);p_j \in O(mn)\,.
$$
We have a function \([mn] \to [m]\) that sends \(r\) to \(r \mathop{\textrm{mod}} n\) (in effect, dealing out the \(m\) inputs to each of the \(n\) \(m\)-ary operators). Applying the action, this now gives us our desired element of \(O(m)\).

In a Lawvere theory, all global inputs are considered as an input for all operations.

What happens if we apply this process to our Cartesian operad \(C\) of finite sequences of real numbers summing to \(1\)? Given \(m,n\), the set of morphisms \(m \to n\) is the set of \(n\)-tuples of length-\(m\) sequences that sum to \(1\); we will write the \(i\)-th element of the \(j\)-th sequence as \(a_{i,j}\).

Now suppose we have some such \(a_{i,j}\), for \(i=1,\cdots,m\) and \(j=1,\cdots,n\), and that we also have \(b_{j,k}\) for \(j=1,\cdots,n\) and \(k=1,\cdots,p\). How do we compose these together? Firstly, for each \(k\), we can form our usual operadic composition to yield the sequence
$$
a_{1,1}b_{1,k}, \cdots, a_{1,n}b{n,k},\cdots, a_{m,1}b_{1,k}, \cdots, a_{m,n}b_{n,k}\,.
$$
Now applying the ‘mod \(n\)’ function means collecting together all the terms \(a_{i,j}b_{j,k}\) for each \(j=1,\cdots,n\) and summing them together. We end up with the sequence given by
$$
(a;b)_{i,k} = \sum_{j=1}^n a_{i,j}b_{j,k}\,. $$
That looks familiar – it’s the formula for matrix multiplication!

Everything becomes a bit easier once we’ve made that observation. We realize that the morphisms \(m \to n\) in our Lawvere theory can be thought of as \(m\times n\) real matrices whose rows sum to \(1\), otherwise known as \(m\times n\) stochastic matrices. Composition of morphisms is given by matrix multiplication, and we can verify that \(m+n\) is indeed the category theoretic product of \(m\) and \(n\) (if you like, identified with \(\mathbb R^{m+n}\), \(\mathbb R^m\) and \(\mathbb R^n\)), which comes down to the observations that the two projection matrices

$$ \left( \begin{array}{cccccc} 1 & & \mbox{$\Large{0}$} & & &\\ & \ddots & & & \mbox{$\Huge{0}$} & \\ \mbox{$\Large{0}$} & & 1 & & & \end{array} \right) \hspace{24pt} \left( \begin{array}{cccccc} & & & 1 & & \mbox{$\Large{0}$} \\ & \mbox{$\Huge{0}$} & & & \ddots & \\ & & & \mbox{$\Large{0}$} & & 1 \end{array} \right)
$$

are stochastic and that a \(k\times (m+n)\) stochastic is uniquely the ‘pairing’ of a \(k\times m\) stochastic matrix \(A\) and a \(k\times n\) stochastic matrix \(B\):

$$
\begin{pmatrix}
\Huge A \\ \hline \Huge B
\end{pmatrix}
$$

We end up with the following nice characterization of the algebraic structures we started with: they are algebras for the Lawvere theory of stochastic matrices.

Showing that an algebra for the stochastic matrix Lawvere theory satisfies the axioms we gave at the start boils down to showing a few matrix identities. For the associativity law, we start off with the two concrete ternary operators
$$
\begin{align} (a, b, c) & \mapsto a +_s b\\ (a, b, c) & \mapsto c\,.
\end{align} $$
Since an algebra for a Lawvere theory is a product-preserving functor into our ambient Cartesian category, we know that projection matrices must be sent to actual projections, so the second of these is described by the \(1\times 3\) projection matrix
$$
\begin{pmatrix} 0 & 0 & 1 \end{pmatrix}\,.
$$
The definition of the operator \(+_s\) is that it is the image of the \(1\times 2\) stochastic matrix \(\begin{pmatrix} s & 1-s \end{pmatrix}\). So the first operator corresponds to the matrix product
$$ \begin{pmatrix} s & 1-s \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{pmatrix}= \begin{pmatrix} s & 1-s & 0 \end{pmatrix}\,. $$
We pair these two \(1\times 3\) matrices to give us the \(2\times 3\) matrix
$$
\begin{pmatrix} s & 1-s & 0 \\ 0 & 0 & 1\end{pmatrix}
$$
The interpretation of the operator corresponding to the expression \((a +_s b) +_t c\) now corresponds to multiplying this matrix by the \(1\times 2\) matrix \(\begin{pmatrix} t & 1-t \end{pmatrix}\), which gives us
$$ \begin{pmatrix}t & 1-t\end{pmatrix}\begin{pmatrix} s & 1-s & 0 \\ 0 & 0 & 1 \end{pmatrix}=\begin{pmatrix} st & (1-s)t & 1-t \end{pmatrix}\,. $$
Going in the opposite direction tells us that the operator corresponding to the expression \(a +_{s’} (b +_{t’} c)\) is given by the matrix
$$
\begin{pmatrix} s’ & (1-s’)t’ & (1-s’)(1-t’) \end{pmatrix}\,.
$$
If \(s\) and \(t\) are not both \(1\), we see that setting \(s’ = st\) and \(t’ = \frac{(1-s)t}{1-st}\) gives us equality between these two matrices. Whereas if \(s\) and \(t\) are both \(1\), then setting \(s’=1\) and letting \(t’\) be arbitrary will do.

Our remaining rules also correspond to various matrix identities. Our commutativity rule \(a +_s b = b +_{1-s} a\) corresponds to the fact that the matrix \(\begin{pmatrix} 1 – s & s \end{pmatrix}\) is what you get when you multiply \(\begin{pmatrix} s & 1-s \end{pmatrix}\) with the pairing of the projection on to the second factor with the projection on to the first factor (thus swapping the variables \(a\) and \(b\)).
$$
\begin{pmatrix} s & 1 – s \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} 1 – s & s \end{pmatrix}
$$
The idempotence rule \(a +_s a = a\) corresponds to the matrix identity
$$
\begin{pmatrix} s & 1-s \end{pmatrix} \begin{pmatrix} 1\\1 \end{pmatrix} = \begin{pmatrix} 1 \end{pmatrix}\,.
$$

Lastly, the zero rule corresponds to the fact that \(\begin{pmatrix} 0 & 1 \end{pmatrix}\) is projection on to the second factor.

None of these identities is particularly remarkable. The point is to show that we can condense and unify our original axioms using this stochastic matrix abstraction.

Checking the literature

So far, we have not actually proved that this new axiomatization via stochastic matrices is equivalent to what we started with. We have showed that any algebra for the Lawvere theory of stochastic matrices must in particular satisfy our four original axioms. But the possibility still remains that there are some equations, not implied by those original four, which nevertheless hold for all stochastic matrix algebras.

Proving that the two theories are in fact equivalent, while not particularly difficult, is not very enlightening. Fortunately, the work has been done for us. A quick online search for Lawvere theory of stochastic matrices turns up a paper by the mathematician Tobias Fritz, A presentation of the category of stochastic matrices [1]. Gratifyingly, that paper is like a reversed version of this blog post: it starts with the Lawvere theory of stochastic matrices (considered as a monoidal category) and tries to find a collection of axioms that generate the theory. The axioms he comes up with are the structural axioms for a Cartesian category, together with our original associativity, commutativity, idempotence and zero axioms.

Final thoughts

I hope this has been an enlightening introduction to the different types of operads, and to the idea that mathematical objects (in this case, finite sequences of real numbers summing to \(1\)) can sometimes be thought of as abstract operators. There are many directions we could go in from here. For example, there is a fairly simple monadic presentation of our axioms that we have not touched on, though it only works in the category of sets (In fact, all operads give rise to monads on the category of sets such that algebras for that monad are precisely the \(\mathbf{Set}\)-based algebras for the operad). Another area we have not explored is the topological structure of our operad. Sequences of reals of a fixed length (and, more generally, real-valued matrices) have a natural topological structure, under which the operads we have come up with are topological operads. In the Lawvere theory case, this says that the category in question is enriched over the category of topological spaces: indeed, stochastic matrices of fixed dimensions form a topological space, and matrix multiplication is continuous with respect to that topology. Useful real-world concepts of ‘interpolation’ should be continuous with respect to the interpolation constant and we’d need this topological structure in order to make that precise.

Many of the most interesting examples of operads come from geometric constructions. In Topological Quantum Field Theory, the symmetric operad of cobordisms (where an \(n\)-ary abstract operator is a diffeomorphism class of smooth manifolds with boundary whose boundary is made up of \(n\) negatively oriented circles together with one positively oriented circle. Given a suitable collection of these, we can glue them together along the boundaries in a way that almost exactly mirrors the string diagrams we have been using for illustration. It turns out that algebras for this operad can also be characterized by a simple equational theory [2], much as our stochastic matrix algebras could be characterized by our four axioms.

Conformal field theory involves a similar operad: this time, the abstract \(n\)-ary operators are Riemann spheres with \(n+1\) punctures, \(n\) negatively oriented and one positively oriented, with local coordinates vanishing at the punctures. By identifying punctures along the map \(z \mapsto 1/z\) via the local coordinates, we sometimes have a way to compose such spheres, in the operadic sense, to get a new punctured sphere. For technical reasons, this is not always possible, so what we really have is a partial operad: one where composition is not defined everywhere, but that satisfies the operad axioms where it is defined. A holomorphic algebra (i.e., an algebra in the category of complex vector spaces and holomorphic functions) for this operad is known as a vertex operator algebra [3], an important concept not only in quantum physics, but also in group theory and number theory. As in the topological case, Vertex Operator Algebras have a purely equational definition, but a notoriously complicated and unenlightening one. By contrast, the operadic definition gives us a concrete way to think about a vertex operator algebra: it gives us a way to introduce elements of a complex vector space at the negatively oriented punctures of a Riemann sphere and get a new element out at the positively oriented puncture, in a way that respects the gluing of punctured spheres that we have just introduced.

[1] T. Fritz, A presentation of the category of stochastic matrices, 2009.
[Bibtex]
@misc{Fritz,
Author = {Tobias Fritz},
Title = {A presentation of the category of stochastic matrices},
Year = {2009},
Eprint = {arXiv:0902.2554},
}
[2] nLab authors, Frobenius algebra, 2021.
[Bibtex]
@misc{nlab:frobenius_algebra,
author = {{nLab authors}},
title = {{{F}}robenius algebra},
howpublished = {\url{http://ncatlab.org/nlab/show/Frobenius%20algebra}},
note = {\href{http://ncatlab.org/nlab/revision/Frobenius%20algebra/40}{Revision 40}},
month = may,
year = 2021
}
[3] [doi] Y. Huang and J. Lepowsky, “Vertex operator algebras and operads,” in The Gelfand mathematical seminars, 1990–1992, I. M. Gelfand, L. Corwin, and J. Lepowsky, Eds., Boston, MA: Birkhäuser boston, 1993, p. 145–161.
[Bibtex]
@Inbook{Huang1993,
author="Huang, Yi-Zhi
and Lepowsky, James",
editor="Gelfand, Israel M.
and Corwin, Lawrence
and Lepowsky, James",
title="Vertex operator algebras and operads",
bookTitle="The {{G}}elfand Mathematical Seminars, 1990--1992",
year="1993",
publisher="Birkh{\"a}user Boston",
address="Boston, MA",
pages="145--161",
abstract="The notion of vertex operator algebra arose in the vertex operator construction of the Monster (see [FLM1], [B1] and [FLM2]). The algebraic theory of vertex operator algebras provides deep conceptual understanding of many (but not yet all) of the phenomena of monstrous moonshine (see [CN]) and at the same time establishes a solid foundation for the algebraic aspects of conformai field theory.",
isbn="978-1-4612-0345-2",
doi="10.1007/978-1-4612-0345-2_9",
url="https://doi.org/10.1007/978-1-4612-0345-2_9"
}

How big is a trillion, anyway?

2020 was a sad year for lovers of large numbers. On 9th March, we lost Richard K. Guy at the age of 103; barely a month later, John Horton Conway is no longer with us. These two mathematicians, between them, in their Book of Numbers (1996) outlined a system for naming all numbers strictly less than \(1,000\)-illion. More specifically, for any \(n\) from \(21\) to \(999\), they gave a systematic way to name the number \(n\)-illion (the numbers from a million up to \(20\)-illion – or one vigintillion – already had names).

I say \(n\)-illion, because the precise meaning differs depending on whom you ask. There are two systems in use today: the short scale or American system, for which one \(n\)-illion is equal to \(1,000^{n+1}\), and the more sensible long scale, or European system, for which one \(n\)-illion is equal to \(1,000,000^{n}\).

I don’t find either of these systems particularly satisfying, as neither seems to follow the pattern that we use for smaller numbers (up to one million). Up to a million, we seem to be using the following rules.

  • Our basic unit is \(1,000\). That is to say, any \(n\)-illion should be some power of \(1,000\); to get the name for arbitrary \(10^n\), we add ten or a hundred to the start.
  • Repeating the same number twice – e.g., ‘a thousand thousand’ or ‘seventeen billion billion billion billion billion’ is ugly and inefficient, so we have to come up with new names to avoid doing so.

Just based on these axioms, then, both the short and the long scales seem to add new names exponentially more often than necessary. As any long scale adherent will tell you, there is no need to call \(10^9\) one billion when one thousand million will do. But the same devotee will, in the same breath, refer to \(10^{18}\) as one trillion, when the perfectly congenial one million billion is available. If we choose the principle that we should add new illions as sparingly as possible — after all, we have only \(999\) of them to cover all the numbers we might conceivably want to use — then the correct definition of \(n\)-illion is \(1,000^{2^n}\).

This gives us a nice way to convert \(1,000^n\) into words: it amounts to writing \(n\) in binary. Let’s see how the other systems compare.

The Googol Test

The first test of any system of naming large numbers should be: does it give a name to a googol? Recall that one googol is a one followed by a hundred noughts; i.e., \(10^{100} = 10 \times 1,000^{33}\).

  • The short scale calls this number ten duotrigintillion (\(10 \times 1,000^{33} = 10 \times 1,000^{32 + 1}\)).
  • The long scale calls it ten thousand sedecillion (\(10 \times 1,000^{33} = 10,000 \times 1,000^{32} = 10,000 \times 1,000,000^{16}\)).
  • Our scale calls it ten thousand quintillion (\(10 \times 1,000^{33} = 10 \times 1,000^{2^0 + 2^5} = 10 \times 1,000^{2^0} \times 1,000^{2^5}\)).

All the systems passed the test. However, while the long and short scales used the rather fantastical sounding duotrigintillion and sedecillion, we are still in the realm of the relatively pedestrian sounding quintillion.

The Googolplex Test

Not convinced? Let’s move on to the next test: does the system give a name to a googolplex? A googolplex is one with a googol noughts after it; i.e., \(10^{10^{100}}\).

Unfortunately, neither the long nor the short system is able to give a name to a googolplex: the largest power of ten that the short system can name is \(10^{3,000}\), while the long system can go only as far as \(10^{5,994}\), and one googol is far larger than either \(3,000\) or \(5,994\).

Our system, however, is perfectly suited for describing numbers of the form \(10^{10^n}\). One googolplex is equal to \(10,000\times 10^{3\cdots3}\), where there are \(100\) \(3\)s in \(3\cdots 3\). \(3\cdots 3\) is the number we have to convert into binary. At this point, I’ll hand over to the computer.

10 ^

Doesn’t exactly trip off the tongue, does it? But at least it’s possible to write down.

Some History

Our new number system might seem strange and unfamiliar, but it is in fact closer than both the long and the short scales to one of our very earliest methods for naming large numbers, namely that described by Archimedes in his The Sand Reckoner. As with our system, Archimedes’ displayed double exponential growth. The largest number that he could name was
$$
\left((10^8)^{(10^8)}\right)^{(10^8)} = 10^{80000000000000000}\,.
$$
Again, this is far larger than anything that either the long or the short system can name. But we can give it a name – and in our system it isn’t even as large as a centillion.

10 ^

‘Spaces’

In early 2017, the Natural History Museum in London removed from their grand entrance hall the skeleton of a diplodocus, which had been a fixture of the museum since 1905. Affectionately nicknamed ‘Dippy’, the skeleton had moved around between various museum halls, but had become a favourite of museum-goers during his tenure as the museum’s centrepiece, ever since he replaced the collection of elephants and other animals in 1979.

Not everyone was happy following the museum’s announcement of the change in January 2015: the hashtag #SaveDippy trended on Twitter and an online petition calling for his reinstatement gathered tens of thousands of signatures. The museum was quick to defend itself. Dippy would be sent on a grand tour of the UK, allowing our country cousins, unable to afford the train fare to London, a chance to gaze with awe upon the majestic sauropod. Only the museum were telling us in the same breath that he wasn’t so majestic after all. We were reminded that Dippy was only a replica of the original Diplodocus carnegii holotype (displayed in the Carnegie Museum of Natural History in Pittsburgh), while his replacement, the suspended 25-metre long skeleton of a young blue whale, was not only larger than Dippy, but was the actual skeleton. Murmurs were made about conservation and about looking to the future rather than the past.

You didn’t have to wait long to find out the real reason. Head to the museum’s Facebook page today and you’ll find it advertising dozens of New Years Eve parties, yoga classes and silent discos taking place in the main entrance hall. On their website, there’s a shiny new section titled ‘venue hire’, which proudly proclaims that the hall – their ‘largest venue’ (!) – can seat a wedding party of 450, hold a reception for 1,200 or host ‘the perfect summer party’. Don’t think it’s all fancy shindigs for Londoners with more money than sense, though – barely a year after removing Dippy from the hall, the museum hosted a party from the Saudi embassy, shortly after the murder of Jamal Khashoggi. The Saudis paid £23,700 for the hire of the hall. In any case, by replacing Dippy with the whale skeleton – which, suspended from the ceiling, takes up zero floor space – the museum found an absolutely brilliant way to free up their ‘venue’ for more events like these, with only a fraction of the outrage that would have ensued had they cleared the hall completely.

Image result for nhm george the elephant
The Natural History Museum Hall in a more innocent time. Copyright Natural History Museum.

The reshuffling of exhibits, you see, was never about the whale. It was about the ‘space’. It’s always about the ‘space’. The word appears six times on the museum’s wedding page, competing with ‘venue’ to be the most pathetic and banal way possible to describe a gorgeous Victorian Romanesque hall designed by Alfred Waterhouse. Both words are, of course, symptoms of the lazy language endemic in the world of marketing (‘Earth Hall provides a striking space’ etc. etc.), but they have, through repeated usage, come to stand for a particular well-defined and pervasive concept.

For these are far from the only ‘spaces’ to have popped up in the last decade. Bath Abbey and the church of St Laurence in Ludlow have both recently removed their fine Victorian pews (in Bath’s case, designed by George Gilbert Scott) in order to convert the building into, in the words of one spokesperson, a ’21st-century event venue’. The Facebook page for St Laurence’s, meanwhile, frequently advertises various enterprises (most recently, a mediaeval-style bazaar and a ceilidh/hog-roast) taking place in the nave of the church – except that it’s never called that, it’s now always ‘the Space’ with a capital S.

Now, I can’t entirely blame either the Natural History Museum or the pew-removers for the choices they have made. The Church of England does not receive any funding for the upkeep of churches that are, in very many cases, priceless parts of our national heritage (and very expensive to maintain). The burden is particularly great for enormous parish churches, like the ones in Bath and Ludlow, which are not eligible for the same tax relief given to our cathedrals. The Natural History Museum does not charge for entry and relies on moonlighting as a conference centre in order to stop the dinosaur skeletons falling to pieces. Nevertheless, it is worth reflecting on how we have reached the point where a 12th century church or a national museum packed with valuable and beloved artefacts should find that its most valuable asset is its brute physical dimensions.

One of the last photographs taken of the pews in St Laurence’s. Copyright John Gowers.

We could point to the fact that there are now more of us and less space available per person. However, there is evidence to suggest that this view is misplaced. House-building, for instance, has outstripped population growth even in places like London. Research published by CaChE suggests that there are a number of other factors in play. The financial circumstances post-2008 mean that banks are now much more reluctant to grant mortgages, leading to a decline in home ownership unrelated to the quantity of housing available. Meanwhile, the value of buildings not as homes but as assets remains high. Property, even now, can pay dividends to an investor just by sitting there. The Natural History Museum shares a borough with 1,399 empty homes. In a market like that, if you want to actually do something with a building, then you’d better find a way to squeeze every last penny out of it. This is why rents in London are high. This is why we can’t have a museum in one building, a church in another and a convention centre in a third. By existing, a large building is a money-making opportunity and it has to behave like one.

The Labour Party-commissioned report Land for the Many, makes the point succinctly.

An imbalance in the use and ownership of land also crowds out public amenities. The
expansion of private space at the expense of public space shuts down opportunities to
pursue pleasure, fitness and peace of mind, creating deprivation.

Land for the Many, George Monbiot (editor), Robin Grey, Tom Kenny, Laurie Macfarlane, Anna Powell-Smith, Guy Shrubsole, Beth Stratford.

Private space is not necessarily a bad thing, but they’re not talking about people’s homes. Private ownership of property is a desperately unequal affair, with a few tycoons owning huge portfolios and the rest mopped up by those who are merely very well off. However, the ill effects of this are not felt only by private buildings: those places that we should be able to rely on to serve a public good – our museums, our places of worship, our libraries, our schools – are being forced by the same market pressures to become instruments of the same failing system.

Addendum

There’s one curious counterexample to the ‘space’ phenomenon that I ran into recently. Norwich Castle Museum is currently in possession of an enormous and wonderful space, namely the hollowed-out keep of the mediaeval castle. The castle has undergone many changes in its life (many ill-advised), including the removal of the original floors and walls of the keep by Sir John Soane and the later cladding of the outside of the castle in Bath Stone. The museum directors have now taken the bizarre decision to try to recreate the internal floors and walls from the Middle Ages (they say ‘reinstate’, but since they have very little information about the floor plan, that seems overly generous). Sadly, the casualty again seems to be our Victorian heritage – this time, the elegant galleries and arcades of Edward Boardman.

Perhaps this can be explained away by the fact that the conversion involves placing several floors, thus increasing the total floor space. Mercifully, this idea seems not to have occurred to their colleagues in London. Keep an eye out for the Turkish executive dining in the newly rebuilt Norwich Castle banqueting hall….

What’s so special about [0,1]?

It happens to everyone. First, you learn real analysis and suddenly you understand what it means to do rigorous mathematics. You learn the precise definition of what a continuous function is; you learn that if they are defined on a closed bounded interval then they are bounded and attain their bounds; you learn about things called ‘Cauchy sequences’ and ‘The Bolzano-Weierstrass Theorem’.

Maybe you even start to think about doing analysis in \(\mathbb R^n\), but you then quickly learn that this is but one example of something called a ‘metric space’. New avenues open. You generalize the definition of continuity. You learn about the ‘contraction mapping theorem’ and how to apply that to the study of differential equations.

Then the whole thing gets turned on its head when you learn what a topological space is. Suddenly, you have a vastly more general definition of what it means for something to be continuous. You realize that all the things you learned in real analysis were really worked examples of topological facts. Continuous functions on a closed bounded interval are bounded and attain their bounds because the continuous image of a compact space is compact. \(\mathbb R^n\)? You know what \(\mathbb R\) is and you know what the product topology is, so what more do you need to know?

Then you start to learn algebraic topology and suddenly closed bounded intervals of real numbers are all over the place again. \([0,1]\) is a crucial part of the definition of a path or a homotopy. Maybe you start proving theorems that apply not to all topological spaces, but to CW complexes: i.e., spaces built up in a fairly complicate way from unit balls in \(\mathbb R^n\). It’s not unreasonable to feel a little cheated. Wasn’t it the whole point of topology that we didn’t have to deal with \([0,1]\) any more? Why is it everywhere suddenly?

Now, it’s true that every topological space admits a weak homotopy equivalence with some CW complex. But that doesn’t answer our question. Firstly, the definition of a weak homotopy equivalence involves \([0,1]\) anyway. And, anyway, why shouldn’t we be able to come up with some alternative definition of ‘CW complex’, based on some different fundamental topological space, and prove a similar result for that. What’s so special about \([0,1]\)?

Pathing systems

Let’s come back to the first definition we tend to meet in topological that relies on \([0,1]\) in an essential way. When we learn topology, we learn two non-equivalent definitions of what it means for a space to be connected. One definition is purely topological: a space \(X\) is connected if it cannot be written as the union of two disjoint non-empty open sets.

For the second definition, we define a path from a point \(x\) to a point \(y\) in a space \(X\) to be a continuous function \(\gamma \colon [0,1] \to X\) such that \(\gamma(0) = x\) and \(\gamma(1) = y\). We then say that \(X\) is path connected if any two points in \(X\) can be joined by a path in this way.

That raises the question of what would happen if we replaced \([0,1]\) with some other space in the above definition. Would we still get a sensible definition of connectedness?

There are some silly choices of space to use: if we use \({0, 1}\), then every space is connected. If we use \({0}\), then only empty or singleton spaces are.

But let’s not forget that \([0,1]\) has some useful properties that make it a good choice for defining paths. In particular, if we have paths \(\gamma,\gamma’ \colon [0,1] \to X\) such that \(\gamma\) is a path from \(x\) to \(y\) and \(\gamma’\) a path from \(y\) to \(z\), then we can come up with a path \(\gamma;\gamma’ \colon [0,1] \to X\) that is a path from \(x\) to \(z\). We do this by defining \(\gamma;\gamma'(t) = \gamma(2t)\) if \(t<0.5\) and \(\gamma;\gamma'(t) = \gamma'(2t-1)\) if \(t\ge 0.5\).

It makes sense, then, to insist that our new candidate spaces have similar properties. That is, we define a pathing system to be given by a space \(J\), together with distinguished points \(j_0,j_1\in J\), and, for any space \(X\) and any pair of continuous functions \(\gamma,\gamma’ \colon J \to X\) such that \(\gamma(j_1) = \gamma'(j_0)\), a continuous function \(\gamma;\gamma’ \colon J \to X\) such that \(\gamma;\gamma'(j_0) = \gamma(j_0)\) and \(\gamma;\gamma'(j_1) = \gamma'(j_1)\).

I shall also impose one more condition to make sure that the choice of \(\gamma;\gamma’\) is at least fairly well-behaved. If we have a continuous function \(\phi \colon X \to Y\), where \(Y\) is some other space, then we should insist that
$$
\phi \circ (\gamma;\gamma’) = (\phi \circ \gamma);(\phi \circ \gamma’)
$$
as continuous functions \(J \to Y\).

From the perspective of category theory, this kind of requirement is known as naturality: it says that a certain collection of morphisms forms a natural transformation. You might like to work out the details.

Multiple concatenations

Of course, we are not limited to composing together two paths. For example, if we have four paths, then we can compose them in pairs to get two paths, and then compose those paths together to get a single path. In a similar way, we can define the \(2^n\)-fold composition of \(2^n\) paths inductively as follows. The bifold composition is the usual composition. Then the \(2^{n+1}\)-fold composition of \(2^{n+1}\) paths (with matching endpoints) is formed by splitting into two equal collections of \(2^n\) paths, forming the \(2^n\)-fold composition of each and then composing the two paths thus formed. Equivalently, we could split our collection of \(2^{n+1}\) paths into \(2^n\) pairs of paths, compose each pair to form \(2^n\) single paths, and then take the \(2^n\)-fold composition.

Now, given a pathing system \((J, j_0, j_1)\), write \(\bigwedge_{2^n} J\) for the space formed by taking \(2^n\) disjoint copies of \(J\) and then passing to the quotient space given by identifying \(j_1\) in copy \(i\) with \(j_0\) in copy \(i+1\). There are \(2^n\) obvious continuous functions \(J \to \bigwedge_{2^n} J\) whose endpoints match up, and we can take the \(2^n\)-fold composition of these paths to get a single continuous function
$$
p_n \colon J \to \bigwedge_{2^n} J
$$
such that \(p_n(j_0)\) is \(j_0\) in the first copy of \(J\) and \(p_n(j_1)\) is the \(j_1\) in the last copy of \(J\).

Now let us number the copies of \(J\) in binary from \(\underbrace{0\cdots0}_n\) to \(\underbrace{1\cdots1}_n\). I claim that if \(x\in J\) and \(m<n\), then the sequence of binary digits corresponding to the copy of \(J\) that \(p_m(x)\) is contained in is a prefix of the sequence of binary digits corresponding to the copy of \(J\) that \(p_n(x)\) is contained in. Indeed, by the definition of \(2^k\)-fold composition, \(p_n\) may be regarded as the \(2^m\)-fold concatenation of \(2^{n-m}\) paths \(J \to \bigwedge_{2^m}J\).

This means that, as \(n\) gets larger and larger, we get an infinite binary sequence corresponding to each \(x\in J\). We define a function \(\phi \colon J \to [0,1]\) by sending each \(x\in J\) to the real number whose binary expansion is given by that sequence.

\(\phi\) is easily seen to be continuous: indeed, the preimage of any set of the form \([0, q/2^n)\) is precisely the preimage under \(p_n\) of some open set in \(\bigwedge_{2^n}J\), and is therefore open in \(J\), and similarly for any set of the form \((q/2^n, 1]\). Moreover, we certainly have \(\phi(j_0) = 0\) and \(\phi(j_1) = 1\).

This means that if there is a \([0,1]\)-path from \(x\) to \(y\) in a space \(X\), then there is a \(J\)-path from \(x\) to \(y\) for any pathing system \((J, j_0, j_1)\). In other words, \([0,1]\) is special, because it gives us the most restrictive possible notion of path-connectedness while still allowing basic notions like composition of paths.

Universality of \([0,1]\)

Are there any other spaces with this special property? It’s fairly easy to show that there aren’t. First, notice that the function \(\phi\) that we constructed above makes the following diagram commute.
$$
\require{AMScd}
\begin{CD}
J @>{\phi}>> [0,1]\\
@V{p_2}VV @VV{_\times 2}V \\
J \wedge J @>{\phi\wedge\phi}>> [0,1] \wedge [0,1]
\end{CD}
$$
Moreover, \(\phi\) is in fact the unique continuous function \(J \to [0,1]\) with this property. In category theory, we say that \([0,1]\) is the final coalgebra for the functor \(\_ \wedge \_\). A general fact about final coalgebras is that they are unique up to isomorphism. You might like to deduce for yourself that any space \(I\) having the same property we’ve found for \([0,1]\) is in fact homeomorphic to \([0,1]\).

Other directions

There are a few other generalizations we could make to our notion of a ‘path’. We should be careful to avoid being too general: for example, we could define a path from \(a\) to \(b\) in a space \(X\) to be a connected subspace of \(X\) that contains \(a\) and \(b\) — then we would end up with the usual definition of connectedness.

One thing we might try is not to insist that every path is a continuous function out of the same space \(J\): so that the composition of two paths could have a different domain from either of its constituent parts. I’ll save that discussion for another time.

What’s the difference between an orientation and a rotation?

This question came up at work recently, and I was unable to find a really good answer online. For the purposes of this article, I will be using the terms orientation and rotation as they are used in computer graphics; this might not be the same definitions you are used to.

If you search online for what the difference is between a rotation and an orientation, then you get several seemingly conflicting answers:

  • ‘An orientation is given by Euler angles, whereas a rotation is given by a quaternion.’
  • ‘An orientation is the destination that you reach at the end of a rotation; the rotation is the route to that destination.’
  • ‘Orientations only allow you to rotate from \(0\) to \(360\) degrees, whereas rotations allow you to go beyond \(360\) degrees.

I think the last of these is the one that comes closest. In the two-dimensional case, the answer is easy: an orientation is an element of \(SO(2)\) (i.e., confusingly, a rotation matrix), whereas a rotation also comes equipped with a winding number about the origin. If we want, we can identify orientations of 2D space with elements of the circle \(S^1\) and rotations with real numbers. Then we have a continuous function \(\lambda\) from rotations to orientations given by \(\lambda(x) = e^{2\pi i x}\).

This continuous map \(\lambda\) has an important property: it is a covering map. This means that if \(z\) is some orientation, then there is some open neighbourhood \(U\) of \(z\) such that \(\lambda\inv U\) is the union of disjoint open sets, each mapped homeomorphically on to \(U\) by \(\lambda\). This means that if \(x\) is a rotation, and we modify the corresponding orientation \(\lambda(x)\) very slightly, to get an orientation $w$ then we can get a corresponding modified rotation \(y\) such that \(\lambda(y) = w\).

This means that, for example, if we were filming a rotating object then, assuming our frame rate were fast enough, we could track the overall rotation of the object over time by capturing its orientation at each frame. Why might we want to do this?

Let’s look at an example. There’s a game called Getting Over It with Bennett Foddy, in which you control a character named Diogenes who lives in a barrel. The only way Diogenes can get around is by moving a hammer in circles around his body. The player controls the hammer by moving the mouse around, and the tip of the hammer follows the mouse point.

In the real game, he can move the hammer in a full circle. But let’s suppose instead that he was unable to move the hammer through the bottom of the circle — perhaps the barrel is in the way.
If we only keep track of the orientation of the the mouse cursor around the centre of the rotation, we are at risk of introducing some graphical glitches. Suppose the player moves the mouse in a straight line right through the bottom of the barrel. Then we will move very suddenly from this position

to this one.

Since we would like to give the impression of continuous motion, this is undesirable. If, however, we keep track of the overall rotation of the mouse cursor, then the game will know that it should not jump to the second position, because the rotation is \(390^\circ\), rather than \(30^\circ\).

Now the input to the game is given by the mouse position: loosely speaking, an orientation. Therefore, the property of covering spaces that we have mentioned above is crucial: as long as we can assume that we are sampling the mouse position quickly enough, we can translate the changing mouse orientations into chainging rotations, which the game can then understand.

Another way to understand this is to recall that covering maps satisfy a property called path lifting: if \(p \colon X \to Y\) is a covering map, \(\gamma \colon [0,1] \to Y\) is a path in \(Y\) and \(x\) is a point in \(X\) such that \(p(x) = \gamma(0)\), then there is a path \(\hat{\gamma} \colon [0,1] \to X\) such that \(\hat{\gamma}(0) = x\) and \(p\circ\hat{\gamma} = \gamma\).

The covering map \(\lambda\) of \(S^1\) by the real line is particularly important, because the real line is simply connected, meaning that it is in fact a universal cover for \(S^1\). A universal cover is defined as a covering map whose source is simply connected, but the word universal refers to the fact that we can prove that any other covering map must factor through the universal cover. There are infinitely many coverings maps on to the circle (indeed, take the map from \(S^1\) to itself given by \(e^{i\theta} \mapsto e^{ni\theta}\) for any \(n\)), but the real line is the one that stores the most information.

An important fact about universal covers \(p \colon U \to X\), which we shall come back to later, is that the order of the cover (i.e., the order of the set \(p^{-1}({x})\) for any \(x\)) is the same as the order of the fundamental group of \(X\). \(S^1\) has fundamental group \(\mathbb Z\), so there are infinitely many rotations corresponding to any particular orientation.

A fact in topology is that any space \(X\) that is connected, locally path-connected and semilocally simply-connected admits a universal cover [If you are ever bored in an airport, get out some paper and work out the proof. If you, like most people, don’t know what ‘semilocally simply-connected’ means, don’t look it up – just start writing out the proof and work it out for yourself.]. It therefore makes sense to generalize our discussion above to \(n\) dimensions as follows.

  • An orientation of \(\mathbb R^n\) is an element of \(SO(n)\).
  • A rotation of \(\mathbb R^n\) is an element of the (technically, a) universal cover of \(SO(n)\).

This brings us back to the second answer at the top of the article, because the elements of the univeral cover of a space \(X\) may be identified with homotopy classes of paths in \(X\) from a fixed source point \(x_0\). Therefore (up to homotopy) a rotation of \(\mathbb R^n\) really does encode the history of all the orientations that an object has been through, where the word ‘history’ is taken in the most basic sense imaginable: a path through the space of orientations.

That ‘(up to homotopy)’ becomes a lot more important in \(n>2\) than in \(2\) dimensions, however. In two dimensions, there is not much difference between two homotopic paths around the circle: they might travel at different speeds for different parts of the journey, or one might overshoot the target and double back. But the two paths are still clearly the same rotation in some sense. In three dimensions and above, more complicated homotopies between paths. Watch the rotating candles in this Balinese candle dance, for example.

Balinese demonstration that \(\pi_1(SO(2))\) is the cyclic group of order \(2\).

The candles rotate about the up-down axis by two full rotations, yet the dancers’ arms are completely untwisted at the end. This reflects a fact about \(SO(3)\): a rotation of \(360^\circ\) is not homotopic to the identity, but a rotation of \(720^\circ\) is. In fact, the fundamental group of \(SO(3)\) is \(\mathbb Z/2\), which means that its universal cover is a double cover: there are precisely two rotations corresponding to each orientation of 3D space.

This might not be a problem if we wanted to extend our man-in-a-barrel example to 3D space. In that case, we didn’t really need to track infinitely many rotations around the centre: just being able to tell when we’d gone slightly more than a full rotation round was enough. But for more complicated linkages of multiple joints, this mathematical fact can lead to problems that are very difficult to solve.

Where do quaternions come into this? The simple answer is that the space of unit quaternions is the universal cover for \(SO(3)\). That is, the universal cover of \(SO(3)\) is the \(3\)-sphere \(S^3\), and if we identify \(S^3\) with the unit quaternions then the covering map commutes with quaternion multiplication and multiplication of matrices. Therefore, we can store rotations of 3D space with quaternions (four numbers), while orientations require less nice representations such as matrices (nine numbers) or Euler angles (three numbers, but without the nice properties of covering maps).

The topological closed graph theorem

There are a number of theorems in various settings that link continuity of a function \(f\colon X \to Y\) to closedness of that function’s graph – i.e., the set of all points \((x,y)\in X \times Y\) such that \(y = f(x)\).

For example, we learn in functional analysis that if \(f \colon X \to Y\) is a linear map between Banach spaces, then \(f\) is continuous if and only if its graph is a closed subspace of \(X \times Y\).

Normally, when we learn this fact, we also learn a particular way of looking at it. We know that a function \(f \colon X \to Y\) is continuous if and only if whenever \(x_n \to x\) is a convergent sequence in \(X\), we have \(f(x_n) \to f(x)\) in \(Y\).

Meanwhile, to say that the graph of \(f\) is closed means that it contains all its limit points in \(X \times Y\): i.e., the graph of \(f\) is closed if and only if whenever \(x_n \to x\) is a convergent sequence in \(X\) such that \(f(x_n) \to y\) for some \(y\), we have \(y = f(x)\).

It’s clear then, that continuity of \(f\) is a stronger statement than closedness of its graph: to prove that \(f\) has a closed graph, we can proceed exactly as we would for proving continuity of \(f\), except that we only need to consider those convergent sequences \(x_n \to x\) such that \(f(x_n)\) converges to some limit \(y\).

Let’s turn this into a formal proof: we’re given a continuous function \(f \colon X \to Y\) between (say) normed spaces \(X\) and \(Y\), and we want to prove that the graph of \(f\) is closed. Let \(x_n \to x\) be a convergent sequence in \(X\), and suppose that \(f(x_n) \to y\) in \(Y\). Then, by continuity of \(f\), we know that \(f(x_n) \to f(x)\). Therefore, \(y = f(x)\) by uniqueness of limits.

This last part – uniqueness of limits in \(Y\) – plays an essential role, which we see when we try to generalize this implication to arbitrary topological spaces. In more general topological spaces, sequences can converge to more than one limit – and, indeed, it is possible for a continuous function to have a graph that is not closed.

However, if \(Y\) is a Hausdorff space, then limits of sequences in \(Y\) are unique, and we get our first topological closed graph theorem.

Exercise 1. Let \(X,Y\) be topological spaces, where \(Y\) is a Hausdorff space. Prove that if \(f \colon X \to Y\) is a continuous function, then the graph of \(f\) is closed.

More exciting is that we can use this as a characterization of Hausdorff spaces: i.e., if \(Y\) is a topological space such that the graph of any continuous function into \(Y\) is closed, then \(Y\) must be a Hausdorff space. In fact, it suffices to consider the identity function \(Y \to Y\).

Exercise 2. Convince yourself that the statement ‘\(Y\) is a Hausdorff space’ is exactly the same thing as saying that the graph of the identity function \(Y \to Y\) is closed in \(Y \times Y\).

That’s nice, but what about the other direction? In functional analysis, we know that the converse holds when \(X\) and \(Y\) are Banach spaces, but the topological version is a bit different.

Exercise 3. Let \(X,Y\) be topological spaces, where \(Y\) is compact. Let \(f\colon X \to Y\) be a function such that the graph of \(f\) is closed in \(X \times Y\). Prove that \(f\) is continuous.

(This is very different from the theorem in analysis: non-zero dimensional Banach spaces are never compact!)

It would be very nice if this theorem gave a characterization of compact topological spaces in the same way that the other direction gave a characterization of Hausdorff spaces: we often see ‘compact’ and ‘Hausdorff’ together as though they were brother concepts, even though the definitions are very different, so the prospect of giving new, more explicitly dual definitions for them is attractive.

Unfortunately, there are spaces \(Y\) that satisfy the conclusion of Exercise 3 (for arbitrary \(X\)) without being compact. As an example, give the set \(\mathbb N\) of natural numbers a topology by declaring the open sets to be the downward closed sets (i.e., those sets \(U\) such that if \(n \in U\) and \(m \le n\) then \(m \in U\)).

Exercise 4. Prove that if \(X\) is a non-empty topological space and \(f \colon X \to \mathbb N\) is a function, then the graph of \(f\) is not closed.

Thus, the conclusion of Exercise 3 is vacuously satisfied for this space \(\mathbb N\): the only function into \(\mathbb N\) that has a closed graph is the empty function, which is obviously continuous. Nevertheless, \(\mathbb N\) with this topology is certainly not compact.

The problem here is that some particularly nasty topological spaces throw up barriers to graphs being closed that are unrelated to whether or not the underlying functions are continuous. If we examine the example above a bit more closely, it becomes clear that the fundamental property of a graph of a function \(X \to Y\) – namely, that it intersects each set \(\{x\} \times Y\) in a single point – is not in general compatible with being a closed set.

To try and get around this, we’ll relax this property by generalizing from functions to relations: i.e., functions that can take multiple values – or no value at all. We can identify a relation between spaces \(X\) and \(Y\) with its graph, which is an arbitrary subset of \(X \times Y\).

Let’s start by trying to extend the definition of continuity to relations. We know that a function \(f \colon X \to Y\) is continuous if and only if \(f^{-1}(U)\) is open in \(X\) whenever \(U\) is an open set in \(Y\).

Definition 5. Let \(R \subset X \times Y\) be a relation. Given a subset \(Z\) of \(Y\), we write \(R^{-1}(Z)\) for the set of all \(x\in X\) such that \((x, z)\in R\) for some \(z\in Z\). We say that \(R\) is continuous if \(R^{-1}(U)\) is open in \(X\) for any open subset \(U\) of \(Y\).

We can readily extend our earlier result to continuous relations.

Exercise 6. Let \(X,Y\) be topological spaces, where \(Y\) is compact. Let \(R\subset X \times Y\) be a closed set. Prove that \(R\) is a continuous relation between \(X\) and \(Y\).

What is more exciting is that we can use this to give a characterization of compactness. We will use an adapted version of a lovely argument due to Martín Escardó (see the brilliantly named Intersections of Compactly Many Open Sets are Open [1]).

Theorem 7. Let \(Y\) be a topological space, and suppose that whenever \(X\) is a topological space and \(R\subset X \times Y\) is a closed space, then \(R\) is a continuous relation between \(X\) and \(Y\). Then \(Y\) is compact.

Proof. Let \((U_\alpha)_{\alpha\in A}\) be an open cover of \(Y\). Define a topological space \(X\) as follows.

  • The points of \(X\) are the open sets \(U\) of \(Y\).
  • A set \(\mathcal U\subset X\) is open if:
    • it is upwards closed – i.e., if \(U \in \mathcal U\) and \(U \subset V\), then \(V \in \mathcal U\); and if
    • there is some \(U \in \mathcal U\) such that \(U\) may be written as the union of finitely many of the \(U_\alpha\).

To show that \((U_\alpha)\) admits a finite subcover, it will suffice to prove that the set \(\{Y\}\) is open in \(X\). Since \((U_\alpha)\) was arbitrary, this will suffice to prove that \(Y\) is compact.

First, however, we need to show that this is indeed a valid topology on \(X\). It is easy to see that it is closed under unions. Moreover, if we have two such sets \(\mathcal U\) and \(\mathcal V\), then the intersection \(\mathcal U \cap \mathcal V\) is certainly upwards closed; moreover, we have some \(U \in \mathcal U\) and \(V \in \mathcal V\) such that \(U\) and \(V\) are the union of finitely many of the \(U_\alpha\): then \(U \cup V\) is also the union of finitely many of the \(U_\alpha\), and is contained in both \(\mathcal U\) and \(\mathcal V\) by upwards closedness.

[As an aside, notice how our new definition of compactness doesn’t mention finiteness at all: the argument above shows that this is because it has, in some sense, been ‘cancelled out’ by the fact that open sets in a topological space must be closed under finite intersections. If you’ve ever used compactness in a proof in order to show that a particular collection of open sets can be taken to be finite and therefore to have an open intersection (for example, in Exercise 3), then there’s a good chance that you’d have been better served by using this new definition.]

We will now consider the relation \(\not\ni\) between \(X\) and \(Y\) (i.e., \(U \not\ni x\) if it is not the case that \(x \in U\)). We claim that the graph of this relation is closed in \(X \times Y\).

Indeed, suppose that \(x\in V\) – so \((V, x)\) is in the complement of \(\not\ni\). We seek an open neighbourhood of \((V, x)\) that does not intersect with \(\not\ni\). Indeed, since \((U_\alpha)\) is an open cover of \(Y\), there is some \(U_\beta\) such that \(x\in U_\beta\). Then let \(\mathcal V\) be the set of all open sets in \(Y\) that contain \(U_\beta \cap V\).

\(\mathcal V\) is certainly upwards closed: moreover, it contains \(U_\beta\), so it is open in \(X\). Then the set $$\mathcal V \times (U_\beta \cap V)$$ is an open neighbourhood of \((V, x)\) in \(X \times Y\), and it does not intersect the graph of \(\not\ni\), since every set in \(\mathcal V\) contains every element of \(U_\beta \cap V\) by definition. This completes the proof that \(\not\ni\) is closed in \(X \times Y\).

By hypothesis, we can deduce that \(\not\ni\) is a continuous relation. In particular, this means that \(\not\ni^{-1}(Y)\) is closed in \(Y\). But this is precisely the set of all open subsets \(U \subset Y\) such that \(x \not \in U\) for some \(x\) – i.e., the set of all proper open subsets of \(Y\). So \(\{Y\}\) is the complement of this closed set and is therefore open as desired. \(\Box\)

Unfortunately, it’s not necessarily true that the graph of a continuous relation into a Hausdorff space is always closed, so we don’t quite get that explicitly dual pair of definitions for ‘compact space’ and ‘Hausdorff space’. I think we’ve come pretty close though!

[1] M. Escardó, Intersections of compactly many open sets are open, 2009.
[Bibtex]
@MISC{MartinCompact,
author = {Martín Escardó},
title = {Intersections of compactly many open sets are open},
year = {2009}
}