3 Entropic Ruzsa calculus

In this section $G$ will be a finite additive group. (May eventually want to generalize to infinite $G$ .)

Lemma 3.1 Negation preserves entropy

✓

If $X$ is $G$ -valued, then $H [- X] = H [X]$ .

Proof ▶

Lemma 3.2 Shearing preserves entropy

✓

If $X, Y$ are $G$ -valued, then $H [X \pm Y | Y] = H [X | Y]$ and $H [X \pm Y, Y] = H [X, Y]$ .

Proof ▶

Lemma 3.3 Lower bound of sumset

✓

If $X, Y$ are $G$ -valued random variables on $Ω$ , we have

max (H [X], H [Y]) - I [X : Y] \leq H [X \pm Y] .

Proof ▶

By Corollary 2.19, 3.2, 2.16, 3.1 we have

H [X \pm Y] \geq H [X \pm Y | Y] = H [X | Y] = H [X] - I [X : Y]

and similarly with the roles of $X, Y$ reversed, giving the claim.

Corollary 3.4 Conditional lower bound on sumset

✓

If $X, Y$ are $G$ -valued random variables on $Ω$ and $Z$ is another random variable on $Ω$ then

max (H [X | Z], H [Y | Z]) - I [X : Y | Z] \leq H [X \pm Y | Z],

Proof ▶

This follows from Lemma 3.3 by conditioning to $Z = z$ and summing over $z$ (weighted by $P [Z = z]$ ).

Corollary 3.5 Independent lower bound on sumset

✓

If $X, Y$ are independent $G$ -valued random variables, then

max (H [X], H [Y]) \leq H [X \pm Y] .

Proof ▶

One random variable is said to be a copy of another if they have the same distribution.

Lemma 3.6 Copy preserves entropy

✓

If $X^{'}$ is a copy of $X$ then $H [X^{'}] = H [X]$ .

Proof ▶

Lemma 3.7 Existence of independent copies

✓

Let $X_{i} : Ω_{i} \to S_{i}$ be random variables for $i = 1, \dots, k$ . Then if one gives $\prod_{i = 1}^{k} S_{i}$ the product measure of the laws of $X_{i}$ , the coordinate functions $(x_{j})_{j = 1}^{k} \mapsto x_{i}$ are jointly independent random variables which are copies of the $X_{1}, \dots, X_{k}$ .

Proof ▶

Definition 3.8 Ruzsa distance

✓

Let $X, Y$ be $G$ -valued random variables (not necessarily on the same sample space). The Ruzsa distance $d [X; Y]$ between $X$ and $Y$ is defined to be

d [X; Y] := H [X^{'} - Y^{'}] - H [X^{'}] / 2 - H [Y^{'}] / 2

where $X^{'}, Y^{'}$ are (the canonical) independent copies of $X, Y$ from Lemma 3.7.

Lemma 3.9 Distance from zero

✓

If $X$ is a $G$ -valued random variable and $0$ is the random variable taking the value $0$ everywhere then

d [X; 0] = H (X) / 2.

Proof ▶

This is an immediate consequence of the definitions and $X - 0 \equiv X$ and $H (0) = 0$ .

Lemma 3.10 Copy preserves Ruzsa distance

✓

If $X^{'}, Y^{'}$ are copies of $X, Y$ respectively then $d [X^{'}; Y^{'}] = d [X; Y]$ .

Proof ▶

Lemma 3.11 Ruzsa distance in independent case

✓

If $X, Y$ are independent $G$ -random variables then

d [X; Y] := H [X - Y] - H [X] / 2 - H [Y] / 2.

Proof ▶

Lemma 3.12 Distance symmetric

✓

If $X, Y$ are $G$ -valued random variables, then

d [X; Y] = d [Y; X] .

Proof ▶

Lemma 3.13 Distance controls entropy difference

✓

If $X, Y$ are $G$ -valued random variables, then

| H [X] - H [Y] | \leq 2 d [X; Y] .

Proof ▶

Lemma 3.14 Distance controls entropy growth

✓

If $X, Y$ are independent $G$ -valued random variables, then

H [X - Y] - H [X], H [X - Y] - H [Y] \leq 2 d [X; Y] .

Proof ▶

Lemma 3.15 Distance nonnegative

✓

If $X, Y$ are $G$ -valued random variables, then

d [X; Y] \geq 0.

Proof ▶

Lemma 3.16 Projection entropy and distance

✓

If $G$ is an additive group and $X$ is a $G$ -valued random variable and $H \leq G$ is a finite subgroup then, with $π : G \to G / H$ the natural homomorphism we have (where $U_{H}$ is uniform on $H$ )

H (π (X)) \leq 2 d [X; U_{H}] .

Proof ▶

WLOG, we make $X$ , $U_{H}$ independent (Lemma 3.7). Now by Lemmas 2.20, 3.2, 2.3

\begin{aligned} H (X - U_{H} | π (X)) \geq H (X - U_{H} | X) & = H (U_{H}) \\ H (X - U_{H} | π (X)) \leq \log | H | & = H (U_{H}) \end{aligned}

By Lemma 2.13

H (X - U_{H}) = H (π (X)) + H (X - U_{H} | π (X)) = H (π (X)) + H (U_{H})

and therefore

d [X; U_{H}] = H (π (X)) + \frac{H (U_{H}) - H (X)}{2} .

Furthermore by Lemma 3.13

d [X; U_{H}] \geq \frac{| H (X) - H (U_{H}) |}{2} .

Adding these inequalities gives the result.

Lemma 3.17 Improved Ruzsa triangle inequality

✓

If $X, Y, Z$ are $G$ -valued random variables on $Ω$ with $(X, Y)$ independent of $Z$ , then

H [X - Y] \leq H [X - Z] + H [Z - Y] - H [Z]

This is an improvement over the usual Ruzsa triangle inequality because $X, Y$ are not assumed to be independent. However we will not utilize this improvement here.

Proof ▶

Apply Corollary 2.21 to obtain

H [X - Z, X - Y] + H [Y, X - Y] \geq H [X - Z, Y, X - Y] + H [X - Y] .

Using

H [X - Z, X - Y] \leq H [X - Z] + H [Y - Z]

(from Lemma 2.2, Corollary 2.18),

H [Y, X - Y] = H [X, Y]

(from Lemma 2.2), and

H [X - Z, Y, X - Y] = H [X, Y, Z] = H [X, Y] + H [Z]

(from Lemma 2.2 and Corollary 2.24) and rearranging, we indeed obtain 1.

Lemma 3.18 Ruzsa triangle inequality

✓

If $X, Y, Z$ are $G$ -valued random variables, then

d [X; Y] \leq d [X; Z] + d [Z; Y] .

Proof ▶

By Lemma 3.10 and Lemmas 3.7, 3.11, it suffices to prove this inequality assuming that $X, Y, Z$ are defined on the same space and are independent. But then the claim follows from Lemma 3.17 and Definition 3.8.

Definition 3.19 Conditioned Ruzsa distance

✓

If $(X, Z)$ and $(Y, W)$ are random variables (where $X$ and $Y$ are $G$ -valued) we define

d [X | Z; Y | W] := \sum_{z, w} P [Z = z] P [W = w] d [(X | Z = z); (Y | (W = w))] .

similarly

d [X; Y | W] := \sum_{w} P [W = w] d [X; (Y | (W = w))] .

Lemma 3.20 Alternate form of distance

✓

The expression $d [X | Z; Y | W]$ is unchanged if $(X, Z)$ or $(Y, W)$ is replaced by a copy. Furthermore, if $(X, Z)$ and $(Y, W)$ are independent, then

d [X | Z; Y | W] = H [X - Y | Z, W] - H [X | Z] / 2 - H [Y | W] / 2

and similarly

d [X; Y | W] = H [X - Y | W] - H [X] / 2 - H [Y | W] / 2.

Proof ▶

Lemma 3.21 Kaimanovich-Vershik-Madiman inequality

✓

Suppose that $X, Y, Z$ are independent $G$ -valued random variables. Then

H [X + Y + Z] - H [X + Y] \leq H [Y + Z] - H [Y] .

Proof ▶

From Corollary 2.20 we have

H [X, X + Y + Z] + H [Z, X + Y + Z] \geq H [X, Z, X + Y + Z] + H [X + Y + Z] .

However, using Lemmas 2.24, 2.2 repeatedly we have $H [X, X + Y + Z] = H [X, Y + Z] = H [X] + H [Y + Z]$ , $H [Z, X + Y + Z] = H [Z, X + Y] = H [Z] + H [X + Y]$ and $H [X, Z, X + Y + Z] = H [X, Y, Z] = H [X] + H [Y] + H [Z]$ . The claim then follows from a calculation.

Lemma 3.22 Existence of conditional independent trials

✓

For $X, Y$ random variables, there exist random variables $X_{1}, X_{2}, Y^{'}$ on a common probability space with $(X_{1}, Y^{'}), (X_{2}, Y^{'})$ both having the distribution of $(X, Y)$ , and $X_{1}, X_{2}$ conditionally independent over $Y^{'}$ in the sense of Definition 2.28.

Proof ▶

Lemma 3.23 Balog-Szemerédi-Gowers

✓

Let $A, B$ be $G$ -valued random variables on $Ω$ , and set $Z := A + B$ . Then

\sum_{z} P [Z = z] d [(A | Z = z); (B | Z = z)] \leq 3 I [A : B] + 2 H [Z] - H [A] - H [B] .

Proof ▶

Let $(A_{1}, B_{1})$ and $(A_{2}, B_{2})$ (and $Z^{'}$ , which by abuse of notation we call $Z$ ) be conditionally independent trials of $(A, B)$ relative to $Z$ as produced by Lemma 3.22, thus $(A_{1}, B_{1})$ and $(A_{2}, B_{2})$ are coupled through the random variable $A_{1} + B_{1} = A_{2} + B_{2}$ , which by abuse of notation we shall also call $Z$ .

Observe from Lemma 3.11 that the left-hand side of 2 is

H [A_{1} - B_{2} | Z] - H [A_{1} | Z] / 2 - H [B_{2} | Z] / 2.

since, crucially, $(A_{1} | Z = z)$ and $(B_{2} | Z = z)$ are independent for all $z$ .

Applying submodularity (Corollary 2.21) gives

\begin{aligned} H [A_{1} - B_{2}] + H [A_{1} - B_{2}, A_{1}, B_{1}] \\ \leq H [A_{1} - B_{2}, A_{1}] + H [A_{1} - B_{2}, B_{1}] . \end{aligned}

We estimate the second, third and fourth terms appearing here. First note that, by Corollary 2.30 and Lemma 2.2 (noting that the tuple $(A_{1} - B_{2}, A_{1}, B_{1})$ determines the tuple $(A_{1}, A_{2}, B_{1}, B_{2})$ since $A_{1} + B_{1} = A_{2} + B_{2}$ )

H [A_{1} - B_{2}, A_{1}, B_{1}] = H [A_{1}, B_{1}, A_{2}, B_{2}, Z] = 2 H [A, B] - H [Z] .

Next observe that

H [A_{1} - B_{2}, A_{1}] = H [A_{1}, B_{2}] \leq H [A] + H [B] .

Finally, we have

H [A_{1} - B_{2}, B_{1}] = H [A_{2} - B_{1}, B_{1}] = H [A_{2}, B_{1}] \leq H [A] + H [B] .

Substituting 7, 8 and 9 into 4 yields

H [A_{1} - B_{2}] \leq 2 I [A : B] + H [Z]

and so by Corollary 2.19

H [A_{1} - B_{2} | Z] \leq 2 I [A : B] + H [Z] .

Since

\begin{aligned} H [A_{1} | Z] & = H [A_{1}, A_{1} + B_{1}] - H [Z] \\ = H [A, B] - H [Z] \\ = H [Z] - I [A : B] - 2 H [Z] - H [A] - H [B] \end{aligned}

and similarly for $H [B_{2} | Z]$ , we see that 3 is bounded by $3 I [A : B] + 2 H [Z] - H [A] - H [B]$ as claimed.

Lemma 3.24 Upper bound on conditioned Ruzsa distance

✓

Suppose that $(X, Z)$ and $(Y, W)$ are random variables, where $X, Y$ take values in an abelian group. Then

d [X | Z; Y | W] \leq d [X; Y] + \frac{1}{2} I [X : Z] + \frac{1}{2} I [Y : W] .

In particular,

d [X; Y | W] \leq d [X; Y] + \frac{1}{2} I [Y : W] .

Proof ▶

Using Lemma 3.20 and Lemma 3.7, if $(X^{'}, Z^{'}), (Y^{'}, W^{'})$ are independent copies of the variables $(X, Z)$ , $(Y, W)$ , we have

\begin{aligned} d [X | Z; Y | W] & = H [X^{'} - Y^{'} | Z^{'}, W^{'}] - \frac{1}{2} H [X^{'} | Z^{'}] - \frac{1}{2} H [Y^{'} | W^{'}] \\ \leq H [X^{'} - Y^{'}] - \frac{1}{2} H [X^{'} | Z^{'}] - \frac{1}{2} H [Y^{'} | W^{'}] \\ = d [X^{'}; Y^{'}] + \frac{1}{2} I [X^{'} : Z^{'}] + \frac{1}{2} I [Y^{'} : W^{'}] . \end{aligned}

Here, in the middle step we used Corollary 2.19, and in the last step we used Definition 3.8 and Definition 2.15.

Lemma 3.25 Comparison of Ruzsa distances, I

✓

Let $X, Y, Z$ be random variables taking values in some abelian group of characteristic $2$ , and with $Y, Z$ independent. Then we have

\begin{aligned} d [X; Y + Z] - d [X; Y] & \leq \frac{1}{2} (H [Y + Z] - H [Y]) \\ = \frac{1}{2} d [Y; Z] + \frac{1}{4} H [Z] - \frac{1}{4} H [Y] . \end{aligned}

and

\begin{aligned} d [X; Y | Y + Z] - d [X; Y] & \leq \frac{1}{2} (H [Y + Z] - H [Z]) \\ = \frac{1}{2} d [Y; Z] + \frac{1}{4} H [Y] - \frac{1}{4} H [Z] . \end{aligned}

Proof ▶

We first prove 10. We may assume (taking an independent copy, using Lemma 3.7 and Lemma 3.10, 3.11) that $X$ is independent of $Y, Z$ . Then we have

\begin{aligned} d [X; Y + Z] & - d [X; Y] \\ = H [X + Y + Z] - H [X + Y] - \frac{1}{2} H [Y + Z] + \frac{1}{2} H [Y] . \end{aligned}

Combining this with Lemma 3.21 gives the required bound. The second form of the result is immediate Lemma 3.11.

Turning to 11, we have from Definition 2.15 and Lemma 2.2

\begin{aligned} I [Y : Y + Z] & = H [Y] + H [Y + Z] - H [Y, Y + Z] \\ = H [Y] + H [Y + Z] - H [Y, Z] = H [Y + Z] - H [Z], \end{aligned}

and so 11 is a consequence of Lemma 3.24. Once again the second form of the result is immediate from Lemma 3.11.

Lemma 3.26 Comparison of Ruzsa distances, II

✓

Let $X, Y, Z, Z^{'}$ be random variables taking values in some abelian group, and with $Y, Z, Z^{'}$ independent. Then we have

\begin{aligned} d [X; Y + Z | Y + Z + Z^{'}] - d [X; Y] \\ \leq \frac{1}{2} (H [Y + Z + Z^{'}] + H [Y + Z] - H [Y] - H [Z^{'}]) . \end{aligned}

Proof ▶

By Lemma 3.25 (with a change of variables) we have

d [X; Y + Z | Y + Z + Z^{'}] - d [X; Y + Z] \leq \frac{1}{2} (H [Y + Z + Z^{'}] - H [Z^{'}]) .

Adding this to 10 gives the result. □