PFR

3 Entropic Ruzsa calculus

In this section \(G\) will be a finite additive group. (May eventually want to generalize to infinite \(G\).)

Lemma 3.1 Negation preserves entropy
#

If \(X\) is \(G\)-valued, then \(\mathbb {H}[-X]=\mathbb {H}[X]\).

Proof

Immediate from Lemma 2.2.

If \(X,Y\) are \(G\)-valued, then \(\mathbb {H}[X \pm Y | Y]=\mathbb {H}[X|Y]\) and \(\mathbb {H}[X \pm Y, Y] = \mathbb {H}[X, Y]\).

Proof

Immediate from Lemma 2.12 and Lemma 2.13.

If \(X,Y\) are \(G\)-valued random variables on \(\Omega \), we have

\[ \max (\mathbb {H}[X], \mathbb {H}[Y]) - \mathbb {I}[X:Y] \leq \mathbb {H}[X \pm Y]. \]
Proof

By Corollary 2.19, 3.2, 2.16, 3.1 we have

\[ \mathbb {H}[X\pm Y] \geq \mathbb {H}[X\pm Y|Y] = \mathbb {H}[X|Y]= \mathbb {H}[X] - \mathbb {I}[X:Y] \]

and similarly with the roles of \(X,Y\) reversed, giving the claim.

If \(X,Y\) are \(G\)-valued random variables on \(\Omega \) and \(Z\) is another random variable on \(\Omega \) then

\[ \max (\mathbb {H}[X|Z], \mathbb {H}[Y|Z]) - \mathbb {I}[X:Y|Z] \leq \mathbb {H}[X\pm Y|Z], \]
Proof

This follows from Lemma 3.3 by conditioning to \(Z = z\) and summing over \(z\) (weighted by \( \mathbb {P}[Z=z]\)).

Corollary 3.5 Independent lower bound on sumset

If \(X,Y\) are independent \(G\)-valued random variables, then

\[ \max (\mathbb {H}[X], \mathbb {H}[Y]) \leq \mathbb {H}[X\pm Y]. \]
Proof

Combine Lemma 3.3 with Lemma 2.23.

One random variable is said to be a copy of another if they have the same distribution.

Lemma 3.6 Copy preserves entropy

If \(X'\) is a copy of \(X\) then \(\mathbb {H}[X'] = \mathbb {H}[X]\).

Proof

Immediate from Definition 2.1.

Let \(X_i : \Omega _i \to S_i\) be random variables for \(i=1,\dots ,k\). Then if one gives \(\prod _{i=1}^k S_i\) the product measure of the laws of \(X_i\), the coordinate functions \((x_j)_{j=1}^k \mapsto x_i\) are jointly independent random variables which are copies of the \(X_1,\dots ,X_k\).

Proof

Explicit computation.

Definition 3.8 Ruzsa distance
#

Let \(X,Y\) be \(G\)-valued random variables (not necessarily on the same sample space). The Ruzsa distance \(d[X ;Y]\) between \(X\) and \(Y\) is defined to be

\[ d[X ;Y] := \mathbb {H}[X' - Y'] - \mathbb {H}[X']/2 - \mathbb {H}[Y']/2 \]

where \(X',Y'\) are (the canonical) independent copies of \(X,Y\) from Lemma 3.7.

Lemma 3.9 Distance from zero
#

If \(X\) is a \(G\)-valued random variable and \(0\) is the random variable taking the value \(0\) everywhere then

\[ d[X;0]=\mathbb {H}(X)/2. \]
Proof

This is an immediate consequence of the definitions and \(X-0\equiv X\) and \(\mathbb {H}(0)=0\).

Lemma 3.10 Copy preserves Ruzsa distance

If \(X',Y'\) are copies of \(X,Y\) respectively then \(d[X';Y']=d[X ;Y]\).

Proof

Immediate from Definitions 3.8 and Lemma 3.6.

Lemma 3.11 Ruzsa distance in independent case

If \(X,Y\) are independent \(G\)-random variables then

\[ d[X ;Y] := \mathbb {H}[X - Y] - \mathbb {H}[X]/2 - \mathbb {H}[Y]/2. \]
Proof

Immediate from Definition 3.8 and Lemmas 2.2, 3.6.

Lemma 3.12 Distance symmetric
#

If \(X,Y\) are \(G\)-valued random variables, then

\[ d[X ;Y] = d[Y;X]. \]
Proof

Immediate from Lemma 3.1 and Definition 3.8.

Lemma 3.13 Distance controls entropy difference
#

If \(X,Y\) are \(G\)-valued random variables, then

\[ |\mathbb {H}[X]-H[Y]| \leq 2 d[X ;Y]. \]
Proof

Immediate from Corollary 3.5 and Definition 3.8, and also Lemma 3.1.

Lemma 3.14 Distance controls entropy growth

If \(X,Y\) are independent \(G\)-valued random variables, then

\[ \mathbb {H}[X-Y] - \mathbb {H}[X], \mathbb {H}[X-Y] - \mathbb {H}[Y] \leq 2d[X ;Y]. \]
Proof

Immediate from Corollary 3.5 and Definition 3.8, and also Lemma 3.1.

Lemma 3.15 Distance nonnegative
#

If \(X,Y\) are \(G\)-valued random variables, then

\[ d[X ;Y] \geq 0. \]
Proof

Immediate from Lemma 3.13.

Lemma 3.16 Projection entropy and distance
#

If \(G\) is an additive group and \(X\) is a \(G\)-valued random variable and \(H\leq G\) is a finite subgroup then, with \(\pi :G\to G/H\) the natural homomorphism we have (where \(U_H\) is uniform on \(H\))

\[ \mathbb {H}(\pi (X))\leq 2d[X;U_H]. \]
Proof

WLOG, we make \(X\), \(U_H\) independent (Lemma 3.7). Now by Lemmas 2.20, 3.2, 2.3

\begin{align*} & \mathbb {H}(X-U_H|\pi (X)) \geq \mathbb {H}(X-U_H|X) & = \mathbb {H}(U_H) \\ & \mathbb {H}(X-U_H|\pi (X)) \leq \log |H| & = \mathbb {H}(U_H) \end{align*}

By Lemma 2.13

\[ \mathbb {H}(X-U_H)=\mathbb {H}(\pi (X))+\mathbb {H}(X-U_H|\pi (X))=\mathbb {H}(\pi (X))+\mathbb {H}(U_H) \]

and therefore

\[ d[X;U_H]=\mathbb {H}(\pi (X))+\frac{\mathbb {H}(U_H)-\mathbb {H}(X)}{2}. \]

Furthermore by Lemma 3.13

\[ d[X;U_H]\geq \frac{\lvert \mathbb {H}(X)-\mathbb {H}(U_H)\rvert }{2}. \]

Adding these inequalities gives the result.

Lemma 3.17 Improved Ruzsa triangle inequality
#

If \(X,Y,Z\) are \(G\)-valued random variables on \(\Omega \) with \((X,Y)\) independent of \(Z\), then

\begin{equation} \label{submod-explicit} \mathbb {H}[X - Y] \leq \mathbb {H}[X-Z] + \mathbb {H}[Z-Y] - \mathbb {H}[Z]\end{equation}
1

This is an improvement over the usual Ruzsa triangle inequality because \(X,Y\) are not assumed to be independent. However we will not utilize this improvement here.

Proof

Apply Corollary 2.21 to obtain

\[ \mathbb {H}[X - Z, X - Y] + \mathbb {H}[Y, X - Y] \geq \mathbb {H}[X - Z, Y, X - Y] + \mathbb {H}[X - Y]. \]

Using

\[ \mathbb {H}[X - Z, X - Y] \leq \mathbb {H}[X - Z] + \mathbb {H}[Y - Z] \]

(from Lemma 2.2, Corollary 2.18),

\[ \mathbb {H}[Y, X - Y] = \mathbb {H}[X, Y] \]

(from Lemma 2.2), and

\[ \mathbb {H}[X - Z, Y, X - Y] = \mathbb {H}[X, Y, Z] = \mathbb {H}[X, Y] + \mathbb {H}[Z] \]

(from Lemma 2.2 and Corollary 2.24) and rearranging, we indeed obtain 1.

Lemma 3.18 Ruzsa triangle inequality
#

If \(X,Y,Z\) are \(G\)-valued random variables, then

\[ d[X ;Y] \leq d[X ;Z] + d[Z;Y]. \]
Proof

By Lemma 3.10 and Lemmas 3.7, 3.11, it suffices to prove this inequality assuming that \(X,Y,Z\) are defined on the same space and are independent. But then the claim follows from Lemma 3.17 and Definition 3.8.

Definition 3.19 Conditioned Ruzsa distance
#

If \((X, Z)\) and \((Y, W)\) are random variables (where \(X\) and \(Y\) are \(G\)-valued) we define

\[ d[X | Z; Y | W] := \sum _{z,w} \mathbb {P}[Z=z] \mathbb {P}[W=w] d[(X|Z=z); (Y|(W=w))]. \]

similarly

\[ d[X ; Y | W] := \sum _{w} \mathbb {P}[W=w] d[X ; (Y|(W=w))]. \]
Lemma 3.20 Alternate form of distance

The expression \(d[X | Z;Y | W]\) is unchanged if \((X,Z)\) or \((Y,W)\) is replaced by a copy. Furthermore, if \((X,Z)\) and \((Y,W)\) are independent, then

\[ d[X | Z;Y | W] = \mathbb {H}[X-Y|Z,W] - \mathbb {H}[X|Z]/2 - \mathbb {H}[Y|W]/2 \]

and similarly

\[ d[X ;Y | W] = \mathbb {H}[X-Y|W] - \mathbb {H}[X]/2 - \mathbb {H}[Y|W]/2. \]
Proof
Lemma 3.21 Kaimanovich-Vershik-Madiman inequality
#

Suppose that \(X, Y, Z\) are independent \(G\)-valued random variables. Then

\[ \mathbb {H}[X + Y + Z] - \mathbb {H}[X + Y] \leq \mathbb {H}[Y+ Z] - \mathbb {H}[Y]. \]
Proof

From Corollary 2.20 we have

\[ \mathbb {H}[X, X + Y+ Z] + \mathbb {H}[Z, X + Y+ Z] \geq \mathbb {H}[X, Z, X + Y+ Z] + \mathbb {H}[X + Y+ Z]. \]

However, using Lemmas 2.24, 2.2 repeatedly we have \(\mathbb {H}[X, X + Y+ Z] = \mathbb {H}[X, Y+ Z] = \mathbb {H}[X] + \mathbb {H}[Y+ Z]\), \(\mathbb {H}[Z, X + Y + Z] = \mathbb {H}[Z, X + Y] = \mathbb {H}[Z] + \mathbb {H}[X + Y]\) and \(\mathbb {H}[X, Z, X + Y+ Z] = \mathbb {H}[X, Y, Z] = \mathbb {H}[X] + \mathbb {H}[Y] + \mathbb {H}[Z]\). The claim then follows from a calculation.

Lemma 3.22 Existence of conditional independent trials
#

For \(X,Y\) random variables, there exist random variables \(X_1,X_2,Y'\) on a common probability space with \((X_1, Y'), (X_2, Y')\) both having the distribution of \((X,Y)\), and \(X_1, X_2\) conditionally independent over \(Y'\) in the sense of Definition 2.28.

Proof

Explicit construction.

Lemma 3.23 Balog-Szemerédi-Gowers
#

Let \(A,B\) be \(G\)-valued random variables on \(\Omega \), and set \(Z := A+B\). Then

\begin{equation} \label{2-bsg-takeaway} \sum _{z} \mathbb {P}[Z=z] d[(A | Z = z); (B | Z = z)] \leq 3 \mathbb {I}[A:B] + 2 \mathbb {H}[Z] - \mathbb {H}[A] - \mathbb {H}[B]. \end{equation}
2

Proof

Let \((A_1, B_1)\) and \((A_2, B_2)\) (and \(Z'\), which by abuse of notation we call \(Z\)) be conditionally independent trials of \((A,B)\) relative to \(Z\) as produced by Lemma 3.22, thus \((A_1,B_1)\) and \((A_2,B_2)\) are coupled through the random variable \(A_1 + B_1 = A_2 + B_2\), which by abuse of notation we shall also call \(Z\).

Observe from Lemma 3.11 that the left-hand side of 2 is

\begin{equation} \label{lhs-to-bound} \mathbb {H}[A_1 - B_2| Z] - \mathbb {H}[A_1 | Z]/2 - \mathbb {H}[B_2 | Z]/2. \end{equation}
3

since, crucially, \((A_1 | Z=z)\) and \((B_2 | Z=z)\) are independent for all \(z\).

Applying submodularity (Corollary 2.21) gives

\begin{equation} \label{bsg-31} \begin{split} & \mathbb {H}[A_1 - B_2] + \mathbb {H}[A_1 - B_2, A_1, B_1] \\ & \qquad \leq \mathbb {H}[A_1 - B_2, A_1] + \mathbb {H}[A_1 - B_2,B_1]. \end{split}\end{equation}
4

We estimate the second, third and fourth terms appearing here. First note that, by Corollary 2.30 and Lemma 2.2 (noting that the tuple \((A_1 - B_2, A_1, B_1)\) determines the tuple \((A_1, A_2, B_1, B_2)\) since \(A_1+B_1=A_2+B_2\))

\begin{equation} \label{bsg-24} \mathbb {H}[A_1 - B_2, A_1, B_1] = \mathbb {H}[A_1, B_1, A_2, B_2,Z] = 2\mathbb {H}[A,B] - \mathbb {H}[Z].\end{equation}
7

Next observe that

\begin{equation} \label{bsg-23} \mathbb {H}[A_1 - B_2, A_1] = \mathbb {H}[A_1, B_2] \leq \mathbb {H}[A] + \mathbb {H}[B]. \end{equation}
8

Finally, we have

\begin{equation} \label{bsg-25} \mathbb {H}[A_1 - B_2, B_1] = \mathbb {H}[A_2 - B_1, B_1] = \mathbb {H}[A_2, B_1] \leq \mathbb {H}[A] + \mathbb {H}[B].\end{equation}
9

Substituting 78 and 9 into 4 yields

\[ \mathbb {H}[A_1 - B_2] \leq 2 \mathbb {I}[A:B] + \mathbb {H}[Z] \]

and so by Corollary 2.19

\[ \mathbb {H}[A_1 - B_2 | Z] \leq 2 \mathbb {I}[A:B] + \mathbb {H}[Z]. \]

Since

\begin{align*} \mathbb {H}[A_1 | Z] & = \mathbb {H}[A_1, A_1 + B_1] - \mathbb {H}[Z] \\ & = \mathbb {H}[A,B] - \mathbb {H}[Z] \\ & = \mathbb {H}[Z] - \mathbb {I}[A:B] - 2 \mathbb {H}[Z]- \mathbb {H}[A]-\mathbb {H}[B]\end{align*}

and similarly for \(\mathbb {H}[B_2 | Z]\), we see that 3 is bounded by \(3 \mathbb {I}[A:B] + 2\mathbb {H}[Z]-\mathbb {H}[A]-\mathbb {H}[B]\) as claimed.

Lemma 3.24 Upper bound on conditioned Ruzsa distance

Suppose that \((X, Z)\) and \((Y, W)\) are random variables, where \(X, Y\) take values in an abelian group. Then

\[ d[X | Z;Y | W] \leq d[X ; Y] + \tfrac {1}{2} \mathbb {I}[X : Z] + \tfrac {1}{2} \mathbb {I}[Y : W]. \]

In particular,

\[ d[X ;Y | W] \leq d[X ; Y] + \tfrac {1}{2} \mathbb {I}[Y : W]. \]
Proof

Using Lemma 3.20 and Lemma 3.7, if \((X',Z'), (Y',W')\) are independent copies of the variables \((X,Z)\), \((Y,W)\), we have

\begin{align*} d[X | Z; Y | W]& = \mathbb {H}[X’-Y’|Z’,W’] - \tfrac {1}{2} \mathbb {H}[X’|Z’] - \tfrac {1}{2}H[Y’|W’] \\ & \le \mathbb {H}[X’-Y’]- \tfrac {1}{2} \mathbb {H}[X’|Z’] - \tfrac {1}{2}H[Y’|W’] \\ & = d[X’;Y’] + \tfrac {1}{2} \mathbb {I}[X’ : Z’] + \tfrac {1}{2} \mathbb {I}[Y’ : W’]. \end{align*}

Here, in the middle step we used Corollary 2.19, and in the last step we used Definition 3.8 and Definition 2.15.

Lemma 3.25 Comparison of Ruzsa distances, I

Let \(X, Y, Z\) be random variables taking values in some abelian group of characteristic \(2\), and with \(Y, Z\) independent. Then we have

\begin{align} \nonumber d[X ; Y + Z] -d[X ; Y] & \leq \tfrac {1}{2} (\mathbb {H}[Y+ Z] - \mathbb {H}[Y]) \\ & = \tfrac {1}{2} d[Y; Z] + \tfrac {1}{4} \mathbb {H}[Z] - \tfrac {1}{4} \mathbb {H}[Y]. \label{lem51-a} \end{align}

and

\begin{align} \nonumber d[X ;Y|Y+ Z] - d[X ;Y] & \leq \tfrac {1}{2} \bigl(\mathbb {H}[Y+ Z] - \mathbb {H}[Z]\bigr) \\ & = \tfrac {1}{2} d[Y;Z] + \tfrac {1}{4} \mathbb {H}[Y] - \tfrac {1}{4} \mathbb {H}[Z]. \label{ruzsa-3} \end{align}
Proof

We first prove 10. We may assume (taking an independent copy, using Lemma 3.7 and Lemma 3.10, 3.11) that \(X\) is independent of \(Y, Z\). Then we have

\begin{align*} d[X ;Y+ Z] & - d[X ;Y] \\ & = \mathbb {H}[X + Y + Z] - \mathbb {H}[X + Y] - \tfrac {1}{2}\mathbb {H}[Y + Z] + \tfrac {1}{2} \mathbb {H}[Y].\end{align*}

Combining this with Lemma 3.21 gives the required bound. The second form of the result is immediate Lemma 3.11.

Turning to 11, we have from Definition 2.15 and Lemma 2.2

\begin{align*} \mathbb {I}[Y : Y+ Z] & = \mathbb {H}[Y] + \mathbb {H}[Y + Z] - \mathbb {H}[Y, Y + Z] \\ & = \mathbb {H}[Y] + \mathbb {H}[Y + Z] - \mathbb {H}[Y, Z] = \mathbb {H}[Y + Z] - \mathbb {H}[Z],\end{align*}

and so 11 is a consequence of Lemma 3.24. Once again the second form of the result is immediate from Lemma 3.11.

Lemma 3.26 Comparison of Ruzsa distances, II
#

Let \(X, Y, Z, Z'\) be random variables taking values in some abelian group, and with \(Y, Z, Z'\) independent. Then we have

\begin{align} \nonumber & d[X ;Y + Z | Y + Z + Z’] - d[X ;Y] \\ & \qquad \leq \tfrac {1}{2} ( \mathbb {H}[Y + Z + Z’] + \mathbb {H}[Y + Z] - \mathbb {H}[Y] - \mathbb {H}[Z’]).\label{7111} \end{align}
Proof

By Lemma 3.25 (with a change of variables) we have

\[ d[X ; Y + Z | Y + Z + Z'] - d[X ; Y + Z] \leq \tfrac {1}{2}( \mathbb {H}[Y + Z + Z'] - \mathbb {H}[Z']). \]

Adding this to 10 gives the result.