3 Entropic Ruzsa calculus
In this section \(G\) will be a finite additive group. (May eventually want to generalize to infinite \(G\).)
If \(X\) is \(G\)-valued, then \(\mathbb {H}[-X]=\mathbb {H}[X]\).
Immediate from Lemma 2.2.
If \(X,Y\) are \(G\)-valued, then \(\mathbb {H}[X \pm Y | Y]=\mathbb {H}[X|Y]\) and \(\mathbb {H}[X \pm Y, Y] = \mathbb {H}[X, Y]\).
Immediate from Lemma 2.12 and Lemma 2.13.
If \(X,Y\) are \(G\)-valued random variables on \(\Omega \), we have
By Corollary 2.19, 3.2, 2.16, 3.1 we have
and similarly with the roles of \(X,Y\) reversed, giving the claim.
If \(X,Y\) are \(G\)-valued random variables on \(\Omega \) and \(Z\) is another random variable on \(\Omega \) then
This follows from Lemma 3.3 by conditioning to \(Z = z\) and summing over \(z\) (weighted by \( \mathbb {P}[Z=z]\)).
If \(X,Y\) are independent \(G\)-valued random variables, then
Combine Lemma 3.3 with Lemma 2.23.
One random variable is said to be a copy of another if they have the same distribution.
If \(X'\) is a copy of \(X\) then \(\mathbb {H}[X'] = \mathbb {H}[X]\).
Immediate from Definition 2.1.
Let \(X_i : \Omega _i \to S_i\) be random variables for \(i=1,\dots ,k\). Then if one gives \(\prod _{i=1}^k S_i\) the product measure of the laws of \(X_i\), the coordinate functions \((x_j)_{j=1}^k \mapsto x_i\) are jointly independent random variables which are copies of the \(X_1,\dots ,X_k\).
Explicit computation.
Let \(X,Y\) be \(G\)-valued random variables (not necessarily on the same sample space). The Ruzsa distance \(d[X ;Y]\) between \(X\) and \(Y\) is defined to be
where \(X',Y'\) are (the canonical) independent copies of \(X,Y\) from Lemma 3.7.
If \(X\) is a \(G\)-valued random variable and \(0\) is the random variable taking the value \(0\) everywhere then
This is an immediate consequence of the definitions and \(X-0\equiv X\) and \(\mathbb {H}(0)=0\).
If \(X',Y'\) are copies of \(X,Y\) respectively then \(d[X';Y']=d[X ;Y]\).
If \(X,Y\) are independent \(G\)-random variables then
Immediate from Definition 3.8 and Lemmas 2.2, 3.6.
If \(X,Y\) are \(G\)-valued random variables, then
Immediate from Lemma 3.1 and Definition 3.8.
If \(X,Y\) are \(G\)-valued random variables, then
Immediate from Corollary 3.5 and Definition 3.8, and also Lemma 3.1.
If \(X,Y\) are independent \(G\)-valued random variables, then
Immediate from Corollary 3.5 and Definition 3.8, and also Lemma 3.1.
If \(X,Y\) are \(G\)-valued random variables, then
Immediate from Lemma 3.13.
If \(G\) is an additive group and \(X\) is a \(G\)-valued random variable and \(H\leq G\) is a finite subgroup then, with \(\pi :G\to G/H\) the natural homomorphism we have (where \(U_H\) is uniform on \(H\))
WLOG, we make \(X\), \(U_H\) independent (Lemma 3.7). Now by Lemmas 2.20, 3.2, 2.3
By Lemma 2.13
and therefore
Furthermore by Lemma 3.13
Adding these inequalities gives the result.
If \(X,Y,Z\) are \(G\)-valued random variables on \(\Omega \) with \((X,Y)\) independent of \(Z\), then
This is an improvement over the usual Ruzsa triangle inequality because \(X,Y\) are not assumed to be independent. However we will not utilize this improvement here.
Apply Corollary 2.21 to obtain
Using
(from Lemma 2.2, Corollary 2.18),
(from Lemma 2.2), and
(from Lemma 2.2 and Corollary 2.24) and rearranging, we indeed obtain 1.
If \(X,Y,Z\) are \(G\)-valued random variables, then
By Lemma 3.10 and Lemmas 3.7, 3.11, it suffices to prove this inequality assuming that \(X,Y,Z\) are defined on the same space and are independent. But then the claim follows from Lemma 3.17 and Definition 3.8.
If \((X, Z)\) and \((Y, W)\) are random variables (where \(X\) and \(Y\) are \(G\)-valued) we define
similarly
The expression \(d[X | Z;Y | W]\) is unchanged if \((X,Z)\) or \((Y,W)\) is replaced by a copy. Furthermore, if \((X,Z)\) and \((Y,W)\) are independent, then
and similarly
Straightforward thanks to Lemma 3.6, Lemma 3.10, Lemma 3.11, Definition 3.19, Definition 2.11.
Suppose that \(X, Y, Z\) are independent \(G\)-valued random variables. Then
From Corollary 2.20 we have
However, using Lemmas 2.24, 2.2 repeatedly we have \(\mathbb {H}[X, X + Y+ Z] = \mathbb {H}[X, Y+ Z] = \mathbb {H}[X] + \mathbb {H}[Y+ Z]\), \(\mathbb {H}[Z, X + Y + Z] = \mathbb {H}[Z, X + Y] = \mathbb {H}[Z] + \mathbb {H}[X + Y]\) and \(\mathbb {H}[X, Z, X + Y+ Z] = \mathbb {H}[X, Y, Z] = \mathbb {H}[X] + \mathbb {H}[Y] + \mathbb {H}[Z]\). The claim then follows from a calculation.
For \(X,Y\) random variables, there exist random variables \(X_1,X_2,Y'\) on a common probability space with \((X_1, Y'), (X_2, Y')\) both having the distribution of \((X,Y)\), and \(X_1, X_2\) conditionally independent over \(Y'\) in the sense of Definition 2.28.
Explicit construction.
Let \(A,B\) be \(G\)-valued random variables on \(\Omega \), and set \(Z := A+B\). Then
Let \((A_1, B_1)\) and \((A_2, B_2)\) (and \(Z'\), which by abuse of notation we call \(Z\)) be conditionally independent trials of \((A,B)\) relative to \(Z\) as produced by Lemma 3.22, thus \((A_1,B_1)\) and \((A_2,B_2)\) are coupled through the random variable \(A_1 + B_1 = A_2 + B_2\), which by abuse of notation we shall also call \(Z\).
Observe from Lemma 3.11 that the left-hand side of 2 is
since, crucially, \((A_1 | Z=z)\) and \((B_2 | Z=z)\) are independent for all \(z\).
Applying submodularity (Corollary 2.21) gives
We estimate the second, third and fourth terms appearing here. First note that, by Corollary 2.30 and Lemma 2.2 (noting that the tuple \((A_1 - B_2, A_1, B_1)\) determines the tuple \((A_1, A_2, B_1, B_2)\) since \(A_1+B_1=A_2+B_2\))
Next observe that
Finally, we have
Substituting 7, 8 and 9 into 4 yields
and so by Corollary 2.19
Since
and similarly for \(\mathbb {H}[B_2 | Z]\), we see that 3 is bounded by \(3 \mathbb {I}[A:B] + 2\mathbb {H}[Z]-\mathbb {H}[A]-\mathbb {H}[B]\) as claimed.
Suppose that \((X, Z)\) and \((Y, W)\) are random variables, where \(X, Y\) take values in an abelian group. Then
In particular,
Using Lemma 3.20 and Lemma 3.7, if \((X',Z'), (Y',W')\) are independent copies of the variables \((X,Z)\), \((Y,W)\), we have
Here, in the middle step we used Corollary 2.19, and in the last step we used Definition 3.8 and Definition 2.15.
Let \(X, Y, Z\) be random variables taking values in some abelian group of characteristic \(2\), and with \(Y, Z\) independent. Then we have
and
We first prove 10. We may assume (taking an independent copy, using Lemma 3.7 and Lemma 3.10, 3.11) that \(X\) is independent of \(Y, Z\). Then we have
Combining this with Lemma 3.21 gives the required bound. The second form of the result is immediate Lemma 3.11.
Turning to 11, we have from Definition 2.15 and Lemma 2.2
and so 11 is a consequence of Lemma 3.24. Once again the second form of the result is immediate from Lemma 3.11.
Let \(X, Y, Z, Z'\) be random variables taking values in some abelian group, and with \(Y, Z, Z'\) independent. Then we have
By Lemma 3.25 (with a change of variables) we have
Adding this to 10 gives the result.