Equational theories

9 Metatheorems from Invariants

For the purposes of this chapter, a theorem is a (true) statement about particular equations, for example ‘(387 implies 43)’ is a theorem. A metatheorem is a general statement about implications; one can usually get many theorems from a single metatheorem. This chapter is all about generating many interesting metatheorems using a meta-metatheorem, called the fundamental property of invariants. If all this is making your head spin, don’t worry. Look at the sections below for examples of metatheorems you can probably agree are both concrete and interesting. Once you have done that, come back here and we will show you how to prove these and other metatheorems using invariants.

9.1 Invariants

Let \(E, E_1\), and \(E_2\) be equations. If \(E \Rightarrow E_1\) and \(E_1 \Rightarrow E_2\), then \(E \Rightarrow E_2\). Very trivial. Rephrasing this, we see that if \(E \Rightarrow E_1\) and \(E \nRightarrow E_2\), then \(E_1\nRightarrow E_2\).

Extending this idea, suppose we compute the set of all equations which are implied by \(E\); we will call this set \(\mathcal{Y}(E)\) (we use \(\mathcal{Y}\) because this is an example of a Yoneda embedding). Then \(\mathcal{Y}(E)\) is upwards closed, or closed under forward implication: no equation in \(\mathcal{Y}(E)\) can imply an equation not in \(\mathcal{Y}(E)\). If we know \(\mathcal{Y}(E)\) well, this already settles a potentially large number of implications in the negative.

While computing \(\mathcal{Y}(E)\) for an arbitrary equation \(E\) may seem daunting, for some nice equations we can find invariants, which makes the task manageable. An invariant for \(E\) is some sort of data associated with expressions \(w\) so that

\[ \mathcal{Y}(E) = \{ w = w' \mid \text{Invariant}(w) = \text{Invariant}(w')\} \]

If we can find an invariant which is computable for each term \(w\), then we can easily describe \(\mathcal{Y}(E)\). The fact that \(\mathcal{Y}(E)\) is upwards closed is rephrased as follows; this is called the fundamental property of invariants. Remember that an invariant is a function taking expressions and outputting some data.

Meta-metatheorem 9.1 Fundamental property of invariants

Let \(I\) be an invariant of \(E\). If \(w = w'\) implies \(w'' = w'''\) and \(I(w) = I(w')\) (that is, \(E\) implies \(w = w'\)), then \(I(w'') = I(w''')\).

More succinctly, for an invariant \(I\) of \(E\) we must have

\[ (w = w' \Rightarrow w'' = w''') \implies (I(w) = I(w') \Rightarrow I(w'') = I(w''')). \]

When using this result, we commonly take the contrapositive: if \(I(w) = I(w')\) and \(I(w'') \neq I(w''')\), then \(w=w'\) cannot imply \(w'' = w'''\). Note that the conclusion is independent of the equation \(E\); all we need to know is that \(I\) is an invariant.

Note for category theorists

Let \(\Pi \) denote the preorder of magma equations ordered by implication. If \(I\) is an invariant then define

\[ I(w = w')\coloneqq \begin{cases} \texttt{true} & \text{ if }I(w) = I(w’)\\ \texttt{false} & \text{ otherwise} \end{cases}. \]

(In programming languages we would say \(I(w = w') \coloneqq I(w) == I(w')\)). Let \(\textbf{Bool} = \{ \texttt{true}, \texttt{false}\} \) be the poset where \(\texttt{false} \leq \texttt{true}\). Then \(I\) becomes a function \(\Pi \to \textbf{Bool}\), and the fundamental property of invariants just says that this function is monotone, i.e. functorial. Thus for every invariant \(I\) we obtain a functor \(\Pi \to \textbf{Bool}\).

Question 1: Does every functor \(\Pi \to \textbf{Bool}\) come from an invariant?

Question 2: What can we say about the category of functors \(\Pi \to \textbf{Bool}\)? Give a nice interpretation of natural transformations between invariants.

The fundamental property of invariants is not a theorem, nor a metatheorem: it is a meta-metatheorem, in the sense that it will allow us to get a metatheorem for every invariant we find.

Example: absorption law

Let \(E\) be the equation \(x \diamond y = x\). Intuitively, we must have

\[ \mathcal{Y}(E) = \{ w = w' \mid \text{the leftmost variable is the same for $w$ and $w'$}\} . \]

We will talk about proving statements like this one (say in Lean) later on; take it as given for now. The invariant is clear: we define \(I(w)\) to be the leftmost variable of \(w\). Instantiating this invariant in the fundamental property of invariants, we get the following metatheorem.

Metatheorem 9.2
#

Let \(w = w'\) be an equation such that the leftmost variable of \(w\) is the same as the leftmost variable of \(w'\). Then \(w = w'\) cannot imply an equation that does not have the property from the last sentence.

Example: associativity

For a more complicated example, let \(E\) be the associativity equation \(x \diamond (y \diamond z) = (x \diamond y) \diamond z\). Intuitively, we must have

\[ \mathcal{Y}(E) = \{ \text{equations that, when we remove all parentheses, are of the form $w = w$}\} . \]

There is an invariant lurking behind: it is the (ordered) list of variables appearing in an expression, counting repetitions. More formally, we define \(I(w)\) to be the tuple of variables appearing in \(w\), listed from left to right, say. Again, from the fundamental property of invariants we get the following.

Metatheorem 9.3
#

Let \(w = w'\) be an equation such that the variables appearing in \(w\), taking into account order and repetitions, are the same ones that appear in \(w'\). Then \(w = w'\) cannot imply an equation that does not have the property from the last sentence.

If we were coding a computer program that computes \(I(w)\) given \(w\), one could take the string of symbols that is \(w\), ignore all parentheses, replace all symbols \(\diamond \) by commas, and surround with an appropriate delimiter. (I imagine one could easily do this using regular expressions.

We can compute other examples, but the invariant can get complicated even for simple equations. Exercise: what is the invariant for commutativity? Answer: To compute \(I(w)\) from \(w\) replace all parentheses with curly braces and all symbols \(\diamond \) with commas, and interpret the result as nested sets.

9.2 Expanding the language

The method of invariants really shines when we expand our formal language. Right now our language consists of variables, parentheses, the equal sign, and \(\diamond \) (there is also an implicit use of \(\forall \) but let’s ignore that for now). Let \(\Pi \) denote the preorder of equations (built from the language described) ordered by implication.

We will add the symbol \(\wedge \) (‘and’) to our language. Then we consider a bigger preorder \(\Pi ' \supseteq \Pi \) which includes equations and also conjunctions of equations. Even if we only care about \(\Pi \) it will be apparent that studying invariants in \(\Pi '\) gives us useful metatheorems about \(\Pi \). Equations and conjunctions of equations are examples of formulas (or formulae, according to taste).

If \(\varphi \) is a formula, we can define \(\mathcal{Y}(\varphi )\) to be the set of all formulae implied by \(\varphi \); this agrees with our previous definition. Now define an invariant of \(\varphi \) to be a function \(I\) on terms such that

\[ \mathcal{Y}(\varphi ) \cap \Pi = \{ w = w' \mid I(w) = I(w')\} . \]

Again, this clearly agrees with our previous definition. Although \(\mathcal{Y}(\varphi ) \cap \Pi \) might not be upwards closed in \(\Pi '\), it is upwards closed in \(\Pi \), which is enough to get the fundamental property of invariants verbatim. This leads to more metatheorems we didn’t have access to before.

Example: associativity and idempotency

Let \(\varphi \) be the conjunction of the associative law and the idempotency law (\(x \diamond x = x\)). Again, we will rely on our intuition, which says that an invariant \(I\) defined by taking \(I(w)\) to be the set of all variables appearing in \(w\), works. The corresponding metatheorem is the following

Metatheorem 9.4
#

Let \(w = w'\) be an equation such that the set of variables appearing in \(w\) is equal to the set of variables appearing on \(w'\). Then \(w = w'\) cannot imply an equation that does not have the property from the last sentence.

Example: associativity and commutativity

For a similar example, we can let \(\varphi \) be the conjunction of the associative and the commutative laws. Here we can define \(I(w)\) to be the multiset of variables appearing in \(w\). We obtain the following metatheorem.

Metatheorem 9.5
#

Let \(w = w'\) be an equation such that the variables appearing in \(w\), taking into account multiplicity, are the same ones that appear in \(w'\). Then \(w = w'\) cannot imply an equation that does not have the property from the last sentence.

Trivia: this was the first example of a metatheorem obtained by use of an invariant.

Example: associativity and commutativity with a twist

We can keep expanding our language if it helps us express more intricate invariants. For instance, we can add the symbol ‘1’ to our language. Let \(\varphi \) be the conjunction of associativity, commutativity, the equations \(1 \diamond x = x\), and

\[ \underbrace{x \diamond x \diamond \cdots \diamond x}_{m\text{ times}} = 1, \]

for some fixed positive integer \(m\). Pause to guess the invariant before we move on.

The invariant \(I(w)\) is the multiset of variables appearing in \(w\) but multiplicities are computed modulo \(m\). Thus we have the pretty metatheorem:

Metatheorem 9.6
#

Fix some positive integer \(m\). Let \(w = w'\) be an equation such that every variable appearing in \(w\) appears the same number of times in \(w'\) modulo \(m\). Then \(w = w'\) cannot imply an equation that does not have the property from the last sentence.

9.3 Proving metatheorems from invariants in Lean

For the rest of this chapter we readopt the convention of calling ‘theorem’ an important result, not necessarily pertaining to specific equations.

An invariant is generally a syntactic property of an expression. However, invariants can also be described and calculated semantically through the notion of a lifting magma family, described below. The general idea is that the value of an invariant for an expression can be computed by substituting specific values for the variables in the expression and evaluating the result in a certain magma in the lifting magma family; additional requirements ensure that the fundamental property of invariants is satisfied.

Definition 9.7 Lifting Magma Family
#

A lifting magma family is a family of magmas \(\{ G_\alpha \} \), one for each type \(\alpha \), satisfying the following properties:

  • For each type \(\alpha \), there is a function \(\iota _\alpha : \alpha \to G_\alpha \).

  • Given a function \(f : \alpha \to G_\alpha \), there is a magma homomorphism \(\operatorname {lift}{f} : G_\alpha \to G_\alpha \) such that \(\operatorname {lift}{f}(\iota _\alpha (x)) = f(x)\) for all \(x\) in \(\alpha \).

Example 1
#

The free Abelian groups form a lifting magma family. When the underlying set is finite, the groups are isomorphic to \(\mathbb {Z}^n\).

Example 2
#

Lists form a lifting magma family.

The key consequence of the Definition 9.7 is that it is significantly easier to check whether an equation is satisfied in a lifting magma family.

Theorem 9.8 Evaluation theorem for lifting magma families

Suppose \(E\) is an equation involving a set of variables \(X\), and let \(G\) be a lifting magma family.

Determining whether \(E\) is satisfied by \(G_X\) is equivalent to checking that \(E\) is true with the specific substitution \(\iota _X\).

Proof

For the forward direction, suppose \(E\) is satisfied by \(G_X\). Then, by definition, any substitution of the variables in \(E\) with elements of \(G_X\) will yield a true equation. In particular, substituting according to \(\iota _X\) will yield a true equation.

For the reverse direction, suppose that \(E\) is true when evaluated with the substitution \(\iota _X\). Now, consider an arbitrary substitution of variables \(f : X \to G_X\). By the lifting magma family property, there is a magma homomorphism \(\operatorname {lift}{f} : G_X \to G_X\) such that \(\operatorname {lift}{f}(\iota _X(x)) = f(x)\) for all \(x\) in \(X\). In other words, applying the substitution \(f\) is equivalent to first applying to substitution \(\iota _X\) and then applying the homomorphism \(\operatorname {lift}{f}\). Since \(E\) is true when evaluated with the substitution \(\iota _X\), it is also true after applying the homomorphism \(\operatorname {lift}{f}\). Thus, \(E\) is satisfied by \(G_X\).

Theorem 9.9 The fundamental property of invariants

Let \(E\) and \(E'\) be equations involving a set of variables \(X\), and let \(G\) be a lifting magma family.

If \(E\) is true with the substitution \(\iota _X\), and \(E\) implies \(E'\), then so is \(E'\).

Proof

Applying the evaluation Theorem 9.8, we see that \(E\) is satisfied by \(G_X\). Since \(E\) implies \(E'\), \(E'\) is also satisfied by \(G_X\), and in particular, \(E'\) is true with the substitution \(\iota _X\).

Remark 1
#

The result of evaluating an expression along the function \(\iota _X : X \to G_X\) is the invariant.

In the case of Abelian groups, the result of evaluation is the variables in the expression with multiplicity. In the case of lists, the result of evaluation is the variables in the expression in the order they appear.

When the lifting magma family has good computational properties, calculating the invariant becomes easy.

Remark 2
#

Given an equation \(\phi \) in the language of magmas (possibly involving logical operations other than equality and universal quantification), the initial (i.e., most general) magmas satisfying \(\phi \) (provided they exist) form a lifting magma family.

However, for the purpose of generating invariants, we are interested in lifting magma families with convenient descriptions that are computationally tractable.

Remark 3
#

Suppose \(S\) is a finite set of equations in the language of magmas that is a confluent term rewriting system under a certain ordering of the terms (in the sense of the Knuth-Bendix algorithm). Then the initial magmas satisfying \(S\) form a lifting magma family where equality of elements in the magma is decidable.

This offers a way of generating examples of lifting magma families with good computational properties for computing invariants of expressions.

9.4 Generating laws from equations

The invariants defined in this chapter are properties of the syntax of the equations being considered. In other words, they are properties of the laws associated with the equations, rather than of the equations themselves. Proving non-implications using invariants requires a way to operate on the syntax of the equations and then translate the reasoning back to results about the original equations.

A magma law can be generated from an equation by accessing the syntax used in its definition and converting it to a declaration representing a magma law through metaprogramming. There is a choice in the variable set of the magma law – one one hand, it can be a finite set whose size matches the number of variables, and on the other hand, it can be the set of natural numbers. The advantage of the former is that one can generate proofs that the satisfiability of the magma law is equivalent to the satisfiability of the original equation (this only needs to be done for variable sets of size up to six, since that is the maximum size currently being considered in the project; it’s convenient to prove individual lemmata for each variable set size establishing this equivalence). The advantage of the latter is that it bypasses the need to cast between various finite sets while constructing a model as a counter-example.

One approach is to generate both forms of the law, using the first to establish the equisatisfiability of the law and the equation and then transporting this result to the second form of the law. The conversion from the first form to the second is summarised in the lemma below.

Lemma 9.10
#

[Compatibility between magma laws over finite sets and the natural numbers] Let \(E\) be a magma law defined over \(n\) variables and let \(\tilde{E}\) be the same equation with variables ranging over the natural numbers (formally, \(\tilde{E}\) is the image of \(E\) under the canonical map from the finite set with \(n\) elements to the natural numbers). Then any magma \(M\) satisfies \(E\) if and only if it satisfies \(\tilde{E}\).

Proof

In the forward direction, suppose \(\phi : \mathbb {N} \to M\) is a substitution. Then the restriction of \(\phi \) to the first \(n\) natural numbers is a substitution for the variables of \(E\), and since \(M\) satisfies \(E\), the law \(E\) is true in \(M\) under this substitution. Since \(\tilde{E}\) is the same as \(E\) under the substitution \(\phi \), \(M\) satisfies \(\tilde{E}\).

In the reverse direction, suppose \(\phi : \{ 0, 2, \ldots , n-1\} \to M\) is a substitution. Then \(\phi \) can be extended to a substitution \(\tilde{\phi } : \mathbb {N} \to M\) by setting \(\tilde{\phi }(i) = \phi (i)\) for \(i \leq n\) and \(\tilde{\phi }(i) = 0\) for \(i \geq n\). Since \(M\) satisfies \(\tilde{E}\) under the substitution \(\tilde{\phi }\), it satisfies \(E\) under the restriction of \(\tilde{\phi }\) to the first \(n\) natural numbers, which is precisely \(\phi \). The special case where \(n = 0\) is in fact impossible, since there cannot be an expression with no variables.

9.5 Conclusion: Beyond Invariants

We are still lacking:

  • A large collection of invariants.

  • An estimate for how many implications the resulting metatheorems will settle.

  • Algorithms (in Lean, Python, or otherwise) to compute known invariants.

  • General results about lifting magmas.

  • Formalization of the method of invariants and resulting metatheorems.

  • Knowledge about the category-theoretic interpretation of invariants (see the questions in the note for category theorists).

Related to the last bullet point, we note the following. If all that matters about invariants is the fundamental property, we can apply the old French trick of turning a (meta-meta)theorem into a definition.

Q: If we were to define invariants as any functions satisfying the fundamental property, would anything change? (For those who read the note for category theorists: an equivalent redefinition is to consider invariants as functors \(\Pi \to \textbf{Bool}\)).