12 The $m$ -torsion case

12.1 Data processing inequality

Lemma 12.1 Data processing for a single variable

✓

Let $X$ be a random variable. Then for any function $f$ on the range of $X$ , one has $H [f (X)] \leq H [X]$ .

Proof ▶

We have

H [X] = H [X, f (X)] = H [f (X)] + H [X | f (X)]

thanks to Lemma 2.2 and Lemma 2.13, giving the claim.

Lemma 12.2 One-sided unconditional data processing inequality

✓

Let $X, Y$ be random variables. For any function $f, g$ on the range of $X$ , we have $I [f (X) : Y] \leq I [X : Y]$ .

Proof ▶

By Lemma 2.16 it suffices to show that $H [Y | X] \leq H [Y | f (X)]$ . But this follows from Corollary 2.20 (and Lemma 2.2).

Lemma 12.3 Unconditional data processing inequality

✓

Let $X, Y$ be random variables. For any functions $f, g$ on the ranges of $X, Y$ respectively, we have $I [f (X) : g (Y)] \leq I [X : Y]$ .

Proof ▶

From Lemma 12.2, Lemma 2.9 we have $I [f (X) : Y] \leq I [X : Y]$ and $I [f (X) : g (Y)] \leq I [f (X) : Y]$ , giving the claim.

Lemma 12.4 Data processing inequality

✓

Let $X, Y, Z$ . For any functions $f, g$ on the ranges of $X, Y$ respectively, we have $I [f (X) : g (Y) | Z] \leq I [X : Y | Z]$ .

Proof ▶

Apply Lemma 12.3 to $X, Y$ conditioned to the event $Z = z$ , multiply by $P [Z = z]$ , and sum using Definition 2.25.

12.2 More Ruzsa distance estimates

Let $G$ be an additive group.

Lemma 12.5 Flipping a sign

✓

If $X, Y$ are $G$ -valued, then

d [X; - Y] \leq 3 d [X; Y] .

Proof ▶

Without loss of generality (using Lemma 3.10 and Lemma 3.7) we may take $X, Y$ to be independent. By $(X_{1}, Y_{1})$ , $(X_{2}, Y_{2})$ be copies of $(X, Y)$ that are conditionally independent over $X_{1} - Y_{1} = X_{2} - Y_{2}$ (this exists thanks to Lemma 3.22). By Lemma 3.7, we can also find another copy $(X_{3}, Y_{3})$ of $(X, Y)$ that is independent of $X_{1}, Y_{1}, X_{2}, Y_{2}$ . From Corollary 2.21, one has

H [X_{3} - Y_{2}, X_{1} - Y_{3}, X_{2}, Y_{1}, X_{3}, Y_{3}, X_{3} + Y_{3}] + H [X_{3} + Y_{3}] \leq H [X_{3} - Y_{2}, X_{1} - Y_{3}, X_{2}, Y_{1}, X_{3} + Y_{3}] + H [X_{3}, Y_{3}, X_{3} + Y_{3}] .

From Lemma 3.11, Lemma 3.1, Lemma 3.10 we have

H [X_{3} + Y_{3}] = \frac{1}{2} H [X_{3}] + \frac{1}{2} H [- Y_{3}] + d [X_{3}; - Y_{3}] = \frac{1}{2} H [X] + \frac{1}{2} H [Y] + d [X; - Y] .

Since $X_{3} + Y_{3}$ is a function of $X_{3}, Y_{3}$ , we see from Lemma 2.2 and Corollary 2.24 that

H [X_{3}, Y_{3}, X_{3} + Y_{3}] = H [X_{3}, Y_{3}] = H [X, Y] = H [X] + H [Y] .

Because $X_{1} - Y_{1} = X_{2} - Y_{2}$ , we have

X_{3} + Y_{3} = (X_{3} - Y_{2}) - (X_{1} - Y_{3}) + (X_{2} + Y_{1})

and thus by Lemma 2.2

H [X_{3} - Y_{2}, X_{1} - Y_{3}, X_{2}, Y_{1}, X_{3} + Y_{3}] = H [X_{3} - Y_{2}, X_{1} - Y_{3}, X_{2}, Y_{1}]

and hence by Corollary 2.18

H [X_{3} - Y_{2}, X_{1} - Y_{3}, X_{2}, Y_{1}, X_{3} + Y_{3}] \leq H [X_{3} - Y_{2}] + H [X_{1} - Y_{3}] + H [X_{2}] + H [Y_{1}] .

Since $X_{3}, Y_{2}$ are independent, we see from Lemma 3.11, Lemma 3.10 that

H [X_{3} - Y_{2}] = \frac{1}{2} H [X] + \frac{1}{2} H [Y] + d [X; Y] .

Similarly

H [X_{1} - Y_{3}] = \frac{1}{2} H [X] + \frac{1}{2} H [Y] + d [X; Y] .

We conclude that

H [X_{3} - Y_{2}, X_{1} - Y_{3}, X_{2}, Y_{1}, X_{3} + Y_{3}] \leq 2 H [X] + 2 H [Y] + 2 d [X; Y] .

Finally, from Lemma 12.1 we have

H [X_{1}, Y_{1}, X_{2}, Y_{2}, X_{3}, Y_{3}] \leq H [X_{3} - Y_{2}, X_{1} - Y_{3}, X_{2}, Y_{1}, X_{3}, Y_{3}, X_{3} + Y_{3}] .

From Corollary 2.24 followed by Corollary 2.30, we have

H [X_{1}, Y_{1}, X_{2}, Y_{2}, X_{3}, Y_{3}] = H [X_{1}, Y_{1}, X_{1} - Y_{1}] + H [X_{2}, Y_{2}, X_{2} - Y_{2}] - H [X_{1} - Y_{1}] + H [X_{3}, Y_{3}]

and thus by Lemma 3.11, Lemma 3.10, Lemma 2.2, Corollary 2.24

H [X_{1}, Y_{1}, X_{2}, Y_{2}, X_{3}, Y_{3}] = H [X] + H [Y] + H [X] + H [Y] - (\frac{1}{2} H [X] + \frac{1}{2} H [Y] + d [X; Y]) + H [X] + H [Y] .

Applying all of these estimates, the claim now follows from linear arithmetic.

Lemma 12.6 Kaimonovich–Vershik–Madiman inequality

✓

If $n \geq 0$ and $X, Y_{1}, \dots, Y_{n}$ are jointly independent $G$ -valued random variables, then

H [X + \sum_{i = 1}^{n} Y_{i}] - H [X] \leq \sum_{i = 1}^{n} (H [X + Y_{i}] - H [X]) .

Proof ▶

This is trivial for $n = 0, 1$ , while the $n = 2$ case is Lemma 3.21. Now suppose inductively that $n > 2$ , and the claim was already proven for $n - 1$ . By a further application of Lemma 3.21 one has

H [X + \sum_{i = 1}^{n} Y_{i}] - H [X + \sum_{i = 1}^{n - 1} Y_{i}] \leq H [X + Y_{n}] - H [X] .

By induction hypothesis one has

H [X + \sum_{i = 1}^{n - 1} Y_{i}] - H [X] \leq \sum_{i = 1}^{n - 1} H [X + Y_{i}] - H [X] .

Summing the two inequalities, we obtain the claim.

Lemma 12.7 Kaimonovich–Vershik–Madiman inequality, II

✓

If $n \geq 1$ and $X, Y_{1}, \dots, Y_{n}$ are jointly independent $G$ -valued random variables, then

d [X; \sum_{i = 1}^{n} Y_{i}] \leq 2 \sum_{i = 1}^{n} d [X; Y_{i}] .

Proof ▶

Applying Lemma 12.6 with all the $Y_{i}$ replaced by $- Y_{i}$ , and using Lemma 3.1 and Lemma 3.11, we obtain after some rearranging

d [X; \sum_{i = 1}^{n} Y_{i}] + \frac{1}{2} (H [\sum_{i = 1}^{n} Y_{i}] - H [X]) \leq \sum_{i = 1}^{n} (d [X; Y_{i}] + \frac{1}{2} (H [Y_{i}] - H [X])) .

From Corollary 3.5 we have

H [\sum_{i = 1}^{n} Y_{i}] \geq H [Y_{i}]

for all $i$ ; subtracting $H [X]$ and averaging, we conclude that

H [\sum_{i = 1}^{n} Y_{i}] - H [X] \geq \frac{1}{n} \sum_{i = 1}^{n} H [Y_{i}] - H [X]

and thus

d [X; \sum_{i = 1}^{n} Y_{i}] \leq \sum_{i = 1}^{n} d [X; Y_{i}] + \frac{n - 1}{2 n} (H [Y_{i}] - H [X]) .

From Lemma 3.13 we have

H [Y_{i}] - H [X] \leq 2 d [X; Y_{i}] .

Since $0 \leq \frac{n - 1}{2 n} \leq \frac{1}{2}$ , the claim follows.

Lemma 12.8 Kaimonovich–Vershik–Madiman inequality, III

✓

If $n \geq 1$ and $X, Y_{1}, \dots, Y_{n}$ are jointly independent $G$ -valued random variables, then

d [X; \sum_{i = 1}^{n} Y_{i}] \leq d [X; Y_{1}] + \frac{1}{2} (H [\sum_{i = 1}^{n} Y_{i}] - H [Y_{1}]) .

Proof ▶

From Lemma 3.21 one has

H [- X + \sum_{i = 1}^{n} Y_{i}] \leq H [- X + Y_{1}] + H [\sum_{i = 1}^{n} Y_{i}] - H [Y_{1}] .

The claim then follows from Lemma 3.11 and some elementary algebra.

Lemma 12.9 Comparing sums

Let $(X_{i})_{1 \leq i \leq m}$ and $(Y_{j})_{1 \leq j \leq l}$ be tuples of jointly independent random variables (so the $X$ ’s and $Y$ ’s are also independent of each other), and let $f : {1, \dots, l} \to {1, \dots, m}$ be a function, then

H [\sum_{j = 1}^{l} Y_{j}] \leq H [\sum_{i = 1}^{m} X_{i}] + \sum_{j = 1}^{l} (H [Y_{j} - X_{f (j)}] - H [X_{f (j)}]) .

Proof ▶

Write $W := \sum_{i = 1}^{m} X_{i}$ . From Corollary 3.5 we have

H [\sum_{j = 1}^{l} Y_{j}] \leq H [- W + \sum_{j = 1}^{l} Y_{j}]

while from Lemma 12.6 one has

H [- W + \sum_{j = 1}^{l} Y_{j}] \leq H [- W] + \sum_{j = 1}^{l} H [- W + Y_{j}] - H [- W] .

From Lemma 3.21 one has

H [- W + Y_{j}] - H [- W] \leq H [- X_{f (j)} + Y_{j}] - H [- X_{f (j)}] .

The claim now follows from Lemma 3.1 and some elementary algebra.

Lemma 12.10 Sums of dilates I

✓

Let $X, Y, X^{'}$ be independent $G$ -valued random variables, with $X^{'}$ a copy of $X$ , and let $a$ be an integer. Then

H [X - (a + 1) Y] \leq H [X - a Y] + H [X - Y - X^{'}] - H [X]

and

H [X - (a - 1) Y] \leq H [X - a Y] + H [X - Y - X^{'}] - H [X] .

Proof ▶

From Lemma 3.17 we have

H [(X - Y) - a Y] \leq H [(X - Y) - X^{'}] + H [X^{'} - a Y] - H [X^{'}]

which gives the first inequality. Similarly from Lemma 3.17 we have

H [(X + Y) - a Y] \leq H [(X + Y) - X^{'}] + H [X^{'} - a Y] - H [X^{'}]

which (when combined with Lemma 3.1) gives the second inequality.

Lemma 12.11 Sums of dilates II

✓

Let $X, Y$ be independent $G$ -valued random variables, and let $a$ be an integer. Then

H [X - a Y] - H [X] \leq 4 | a | d [X; Y] .

Proof ▶

From Lemma 3.21 one has

H [Y - X + X^{'}] - H [Y - X] \leq H [Y + X^{'}] - H [Y] = H [Y + X] - H [Y]

which by Lemma 3.11 gives

H [X - Y - X^{'}] - H [X] \leq d [X; Y] + d [X; - Y]

and hence by Lemma 12.5

H [X - Y - X^{'}] - H [X] \leq 4 d [X; Y] .

From Lemma 12.10 we then have

H [X - (a \pm 1) Y] \leq H [X - a Y] + 4 d [X; Y]

and the claim now follows by an induction on $| a |$ .

We remark that in the paper [GGMT2024] the variant estimate

H [X - a Y] - H [X] \leq (4 + 10 ⌊ \log_{2} | a | ⌋) d [X; Y]

is also proven by a similar method. This variant is superior for $| a | \geq 9$ (or $| a | = 7$ ); but we will not need this estimate here.

12.3 Multidistance

We continue to let $G$ be an abelian group.

Definition 12.12 Multidistance

✓

Let $m$ be a positive integer, and let $X_{[m]} = (X_{i})_{1 \leq i \leq m}$ be an $m$ -tuple of $G$ -valued random variables $X_{i}$ . Then we define

D [X_{[m]}] := H [\sum_{i = 1}^{m} {\tilde{X}}_{i}] - \frac{1}{m} \sum_{i = 1}^{m} H [{\tilde{X}}_{i}],

where the ${\tilde{X}}_{i}$ are independent copies of the $X_{i}$ .

Lemma 12.13 Multidistance of copy

✓

If $X_{[m]} = (X_{i})_{1 \leq i \leq m}$ and $Y_{[m]} = (Y_{i})_{1 \leq i \leq m}$ are such that $X_{i}$ and $Y_{i}$ have the same distribution for each $i$ , then $D [X_{[m]}] = D [Y_{[m]}]$ .

Proof ▶

Lemma 12.14 Multidistance of independent variables

If $X_{[m]} = (X_{i})_{1 \leq i \leq m}$ are jointly independent, then $D [X_{[m]}] = H [\sum_{i = 1}^{m} X_{i}] - \frac{1}{m} \sum_{i = 1}^{m} H [X_{i}]$ .

Proof ▶

Lemma 12.15 Nonnegativity

For any such tuple, we have $D [X_{[m]}] \geq 0$ .

Proof ▶

From Corollary 3.5 one has

H [\sum_{i = 1}^{m} {\tilde{X}}_{i}] \geq H [{\tilde{X}}_{i}]

for each $1 \leq i \leq m$ . Averaging over $i$ , we obtain the claim.

Lemma 12.16 Relabeling

✓

If $ϕ : {1, \dots, m} \to {1, \dots, m}$ is a bijection, then $D [X_{[m]}] = D [(X_{ϕ (j)})_{1 \leq j \leq m}]$ .

Proof ▶

Lemma 12.17 Multidistance and Ruzsa distance, I

Let $m \geq 2$ , and let $X_{[m]}$ be a tuple of $G$ -valued random variables. Then

\sum_{1 \leq j, k \leq m : j \neq k} d [X_{j}; - X_{k}] \leq m (m - 1) D [X_{[m]}] .

Proof ▶

By Lemma 3.10, Lemma 12.13 we may take the $X_{i}$ to be jointly independent. From Corollary 3.5, we see that for any distinct $1 \leq j, k \leq m$ , we have

H [X_{j} + X_{k}] \leq H [\sum_{i = 1}^{m} X_{i}],

and hence by Lemma 3.11

d [X_{j}; - X_{k}] \leq H [\sum_{i = 1}^{m} X_{i}] - \frac{1}{2} H [X_{j}] - \frac{1}{2} H [X_{k}] .

Summing this over all pairs $(j, k)$ , $j \neq k$ and using Lemma 12.14, we obtain the claim.

Lemma 12.18 Multidistance and Ruzsa distance, II

Let $m \geq 2$ , and let $X_{[m]}$ be a tuple of $G$ -valued random variables. Then

\sum_{j = 1}^{m} d [X_{j}; X_{j}] \leq 2 m D [X_{[m]}] .

Proof ▶

From Lemma 3.18 we have $dist X_{j} X_{j} \leq 2 dist X_{j} - X_{k}$ , and applying this to every summand in Lemma 12.17, we obtain the claim.

Lemma 12.19 Multidistance and Ruzsa distance, III

Let $m \geq 2$ , and let $X_{[m]}$ be a tuple of $G$ -valued random variables. If the $X_{i}$ all have the same distribution, then $D [X_{[m]}] \leq m d [X_{i}; X_{i}]$ for any $1 \leq i \leq m$ .

Proof ▶

By Lemma 3.10, Lemma 12.13 we may take the $X_{i}$ to be jointly independent. Let $X_{0}$ be a further independent copy of the $X_{i}$ . From Lemma 12.6, we have

H [- X_{0} + \sum_{i = 1}^{m} X_{i}] - H [- X_{0}] \leq \sum_{i = 1}^{m} H [X_{0} - X_{i}] - H [- X_{0}]

and hence by Lemma 3.1 and Lemma 3.11

H [- X_{0} + \sum_{i = 1}^{m} X_{i}] - H [X_{0}] \leq m d [X_{i}, X_{i}] .

On the other hand, by Corollary 3.5 we have

H [\sum_{i = 1}^{m} X_{i}] \leq H [- X_{0} + \sum_{i = 1}^{m} X_{i}]

and the claim follows.

Lemma 12.20 Multidistance and Ruzsa distance, IV

Let $m \geq 2$ , and let $X_{[m]}$ be a tuple of independent $G$ -valued random variables. Let $W := \sum_{i = 1}^{m} X_{i}$ . Then

d [W; - W] \leq 2 D [X_{i}] .

Proof ▶

Take $(X_{i}^{'})_{1 \leq i \leq m}$ to be further independent copies of $(X_{i})_{1 \leq i \leq m}$ (which exist by Lemma 3.7), and write $W^{'} := \sum_{i = 1}^{m} X_{i}^{'}$ . Fix any distinct $a, b \in I$ .

From Lemma 3.21 one has

H [W + W^{'}] \leq H [W] + H [X_{a} + W^{'}] - H [X_{a}]

and also

H [X_{a} + W^{'}] \leq H [X_{a} + X_{b}] + H [W^{'}] - H [X_{b}^{'}] .

Combining this with 1 and then applying Corollary 3.5 we have

\begin{aligned} H [W + W^{'}] & \leq 2 H [W] + H [X_{a} + X_{b}] - H [X_{a}] - H [X_{b}] \\ \leq 3 H [W] - H [X_{a}] - H [X_{b}] . \end{aligned}

Averaging this over all choices of $(a, b)$ gives $H [W] + 2 D [X_{[m]}]$ , and the claim follows from Lemma 3.11.

Proposition 12.21 Vanishing

If $D [X_{[m]}] = 0$ , then for each $1 \leq i \leq m$ there is a finite subgroup $H_{i} \leq G$ such that $d [X_{i}; U_{H_{i}}] = 0$ .

Proof ▶

From Lemma 12.17 and Lemma 3.15 we have $d [X_{j}; X_{- k}] = 0$ for all $1 \leq j, k \leq m$ . The claim now follows from Corollary 4.6.

With more effort one can show that $H_{i}$ is independent of $i$ , but we will not need to do so here.

12.4 The tau functional

Fix $m \geq 2$ , and a reference variable $X^{0}$ in $G$ .

Definition 12.22

η

✓

We set $η := \frac{1}{32 m^{3}}$ .

Definition 12.23

τ

-functional

✓

If $(X_{i})_{1 \leq i \leq m}$ is a tuple, we define its $τ$ -functional

τ [(X_{i})_{1 \leq i \leq m}] := D [(X_{i})_{1 \leq i \leq m}] + η \sum_{i = 1}^{m} d [X_{i}; X^{0}] .

Definition 12.24

τ

-minimizer

✓

A $τ$ -minimizer is a tuple $(X_{i})_{1 \leq i \leq m}$ that minimizes the $τ$ -functional among all tuples of $G$ -valued random variables.

Proposition 12.25 Existence of

τ

-minimizer

If $G$ is finite, then a $τ$ -minimizer exists.

Proof ▶

Proposition 12.26 Minimizer close to reference variables

✓

If $(X_{i})_{1 \leq i \leq m}$ is a $τ$ -minimizer, then $\sum_{i = 1}^{m} d [X_{i}; X^{0}] \leq \frac{2 m}{η} d [X^{0}; X^{0}]$ .

Proof ▶

By Definition 12.24 we have

τ [(X_{i})_{1 \leq i \leq m}] \leq τ [(X^{0})_{1 \leq i \leq m}]

and hence by Definition 12.23 and Lemma 12.15

η \sum_{i = 1}^{m} d [X_{i}; X^{0}] \leq D [(X^{0})_{1 \leq i \leq m}] + m d [X^{0}; X^{0}] .

The claim now follows from Lemma 12.19.

Lemma 12.27 Lower bound on multidistance

✓

If $(X_{i})_{1 \leq i \leq m}$ is a $τ$ -minimizer, and $k := D [(X_{i})_{1 \leq i \leq m}]$ , then for any other tuple $(X_{i}^{'})_{1 \leq i \leq m}$ , one has

k - D [(X_{i}^{'})_{1 \leq i \leq m}] \leq η \sum_{i = 1}^{m} d [X_{i}; X_{i}^{'}] .

Proof ▶

By Definition 12.24 we have

τ [(X_{i})_{1 \leq i \leq m}] \leq τ [(X_{i}^{'})_{1 \leq i \leq m}]

and hence by Definition 12.23

k + η \sum_{i = 1}^{m} d [X_{i}; X^{0}] \leq D [(X_{i}^{'})_{1 \leq i \leq m}] + η \sum_{i = 1}^{m} d [X_{i}^{'}; X^{0}] .

On the other hand, by Lemma 3.18 we have

d [X_{i}^{'}; X^{0}] \leq d [X_{i}; X^{0}] + d [X_{i}; X_{i}^{'}] .

The claim follows.

Definition 12.28 Conditional multidistance

✓

If $X_{[m]} = (X_{i})_{1 \leq i \leq m}$ and $Y_{[m]} = (Y_{i})_{1 \leq i \leq m}$ are tuples of random variables, with the $X_{i}$ being $G$ -valued (but the $Y_{i}$ need not be), then we define

D [X_{[m]} | Y_{[m]}] = \sum_{(y_{i})_{1 \leq i \leq m}} (\prod_{1 \leq i \leq m} p_{Y_{i}} (y_{i})) D [(X_{i} | Y_{i} = y_{i})_{1 \leq i \leq m}]

where each $y_{i}$ ranges over the support of $p_{Y_{i}}$ for $1 \leq i \leq m$ .

Lemma 12.29 Alternate form of conditional multidistance

✓

If the $(X_{i}, Y_{i})$ are independent,

D [X_{[m]} | Y_{[m]}] := H [\sum_{i = 1}^{m} X_{i} | (Y_{j})_{1 \leq j \leq m}] - \frac{1}{m} \sum_{i = 1}^{m} H [X_{i} | Y_{i}] .

Proof ▶

Lemma 12.30 Conditional multidistance nonnegative

✓

If $X_{[m]} = (X_{i})_{1 \leq i \leq m}$ and $Y_{[m]} = (Y_{i})_{1 \leq i \leq m}$ are tuples of random variables, then $D [X_{[m]} | Y_{[m]}] \geq 0$ .

Proof ▶

Clear from Lemma 12.15 and Definition 12.28, except that some care may need to be taken to deal with the $y_{i}$ where $p_{Y_{i}}$ vanish.

Lemma 12.31 Lower bound on conditional multidistance

✓

If $(X_{i})_{1 \leq i \leq m}$ is a $τ$ -minimizer, and $k := D [(X_{i})_{1 \leq i \leq m}]$ , then for any other tuples $(X_{i}^{'})_{1 \leq i \leq m}$ and $(Y_{i})_{1 \leq i \leq m}$ with the $X_{i}^{'}$ $G$ -valued, one has

k - D [(X_{i}^{'})_{1 \leq i \leq m} | (Y_{i})_{1 \leq i \leq m}] \leq η \sum_{i = 1}^{m} d [X_{i}; X_{i}^{'} | Y_{i}] .

Proof ▶

Corollary 12.32 Lower bound on conditional multidistance, II

✓

With the notation of the previous lemma, we have

k - D [X_{[m]}^{'} | Y_{[m]}] \leq η \sum_{i = 1}^{m} d [X_{σ (i)}; X_{i}^{'} | Y_{i}]

for any permutation $σ : {1, \dots, m} \to {1, \dots, m}$ .

Proof ▶

12.5 The multidistance chain rule

Lemma 12.33 Multidistance chain rule

✓

Let $π : G \to H$ be a homomorphism of abelian groups and let $X_{[m]}$ be a tuple of jointly independent $G$ -valued random variables. Then $D [X_{[m]}]$ is equal to

D [X_{[m]} | π (X_{[m]})] + D [π (X_{[m]})] + I [\sum_{i = 1}^{m} X_{i} : π (X_{[m]}) | π (\sum_{i = 1}^{m} X_{i})]

where $π (X_{[m]}) := (π (X_{i}))_{1 \leq i \leq m}$ .

Proof ▶

For notational brevity during this proof, write $S := \sum_{i = 1}^{m} X_{i}$ .

From Lemma 2.26 and Lemma 2.2, noting that $π (S)$ is determined both by $S$ and by $π (X_{[m]})$ , we have

I [S : π (X_{[m]}) | π (S)] = H [S] + H [π (X_{[m]})] - H [S, π (X_{[m]})] - H [π (S)],

and by Lemma 2.13 the right-hand side is equal to

H [S] - H [S | π (X_{[m]})] - H [π (S)] .

Therefore,

H [S] = H [S | π (X_{[m]})] + H [π (S)] + I [S : π (X_{[m]}) | π (S)] .

From a further application of Lemma 2.13 and Lemma 2.2 we have

H [X_{i}] = H [X_{i} | π (X_{i})] + H [π (X_{i})]

for all $1 \leq i \leq m$ . Averaging 7 in $i$ and subtracting this from 6, we obtain the claim from Definition 12.12.

We will need to iterate the multidistance chain rule, so it is convenient to observe a conditional version of this rule, as follows.

Lemma 12.34 Conditional multidistance chain rule

✓

Let $π : G \to H$ be a homomorphism of abelian groups. Let $I$ be a finite index set and let $X_{[m]}$ be a tuple of $G$ -valued random variables. Let $Y_{[m]}$ be another tuple of random variables (not necessarily $G$ -valued). Suppose that the pairs $(X_{i}, Y_{i})$ are jointly independent of one another (but $X_{i}$ need not be independent of $Y_{i}$ ). Then

\begin{aligned} D [X_{[m]} | Y_{[m]}] & = D [X_{[m]} | π (X_{[m]}), Y_{[m]}] + D [π (X_{[m]}) | Y_{[m]}] \\ + I [\sum_{i = 1}^{m} X_{i} : π (X_{[m]}) | π (\sum_{i = 1}^{m} X_{i}), Y_{[m]}] . \end{aligned}

Proof ▶

For each $y_{i}$ in the support of $p_{Y_{i}}$ , apply Lemma 12.33 with $X_{i}$ replaced by the conditioned random variable $(X_{i} | Y_{i} = y_{i})$ , and the claim 8 follows by averaging 5 in the $y_{i}$ using the weights $p_{Y_{i}}$ .

We can iterate the above lemma as follows.

Lemma 12.35

✓

Let $m$ be a positive integer. Suppose one has a sequence

G_{m} \to G_{m - 1} \to \dots \to G_{1} \to G_{0} = {0}

of homomorphisms between abelian groups $G_{0}, \dots, G_{m}$ , and for each $d = 0, \dots, m$ , let $π_{d} : G_{m} \to G_{d}$ be the homomorphism from $G_{m}$ to $G_{d}$ arising from this sequence by composition (so for instance $π_{m}$ is the identity homomorphism and $π_{0}$ is the zero homomorphism). Let $X_{[m]} = (X_{i})_{1 \leq i \leq m}$ be a jointly independent tuple of $G_{m}$ -valued random variables. Then

\begin{aligned} D [X_{[m]}] & = \sum_{d = 1}^{m} D [π_{d} (X_{[m]}) | π_{d - 1} (X_{[m]})] \\ + \sum_{d = 1}^{m - 1} I [\sum_{i} X_{i} : π_{d} (X_{[m]}) | π_{d} (\sum_{i} X_{i}), π_{d - 1} (X_{[m]})] . \end{aligned}

In particular, by Lemma 2.27,

\begin{aligned} D [X_{[m]}] \geq & \sum_{d = 1}^{m} D [π_{d} (X_{[m]}) | π_{d - 1} (X_{[m]})] \\ + I [\sum_{i} X_{i} : π_{1} (X_{[m]}) | π_{1} (\sum_{i} X_{i})] . \end{aligned}

Proof ▶

From Lemma 12.34 (taking $Y_{[m]} = π_{d - 1} (X_{[m]})$ and $π = π_{d}$ there, and noting that $π_{d} (X_{[m]})$ determines $Y_{[m]}$ ) we have

\begin{aligned} D [X_{[m]} | π_{d - 1} (X_{[m]})] & = D [X_{[m]} | π_{d} (X_{[m]})] + D [π_{d} (X_{[m]}) | π_{d - 1} (X_{[m]})] \\ + I [\sum_{i = 1}^{m} X_{i} : π_{d} (X_{[m]}) | π_{d} (\sum_{i = 1}^{m} X_{i}), π_{d - 1} (X_{[m]})] \end{aligned}

for $d = 1, \dots, m$ . The claim follows by telescoping series, noting that $D [X_{[m]} | π_{0} (X_{[m]})] = D [X_{[m]}]$ and that $π_{m} (X_{[m]}) = X_{[m]}$ (and also $π_{m} (\sum_{i} X_{i}) = \sum_{i} X_{i}$ ).

In our application we will need the following special case of the above lemma.

Corollary 12.36

Let $G$ be an abelian group and let $m \geq 2$ . Suppose that $X_{i, j}$ , $1 \leq i, j \leq m$ , are independent $G$ -valued random variables. Then

\begin{aligned} I [(\sum_{i = 1}^{m} X_{i, j})_{j = 1}^{m} : (\sum_{j = 1}^{m} X_{i, j})_{i = 1}^{m} | \sum_{i = 1}^{m} \sum_{j = 1}^{m} X_{i, j}] \\ \leq \sum_{j = 1}^{m - 1} (D [(X_{i, j})_{i = 1}^{m}] - D [(X_{i, j})_{i = 1}^{m} | (X_{i, j} + \dots + X_{i, m})_{i = 1}^{m}]) \\ + D [(X_{i, m})_{i = 1}^{m}] - D [(\sum_{j = 1}^{m} X_{i, j})_{i = 1}^{m}], \end{aligned}

where all the multidistances here involve the indexing set ${1, \dots, m}$ .

Proof ▶

In Lemma 12.35 we take $G_{d} := G^{d}$ with the maps $π_{d} : G^{m} \to G^{d}$ for $d = 1, \dots, m$ defined by

π_{d} (x_{1}, \dots, x_{m}) := (x_{1}, \dots, x_{d - 1}, x_{d} + \dots + x_{m})

with $π_{0} = 0$ . Since $π_{d - 1} (x)$ can be obtained from $π_{d} (x)$ by applying a homomorphism, we obtain a sequence of the form 9.

Now we apply Lemma 12.35 with $I = {1, \dots, m}$ and $X_{i} := (X_{i, j})_{j = 1}^{m}$ . Using joint independence and Corollary 2.24, we find that

D [X_{[m]}] = \sum_{j = 1}^{m} D [(X_{i, j})_{1 \leq i \leq m}] .

On the other hand, for $1 \leq j \leq m - 1$ , we see that once $π_{j} (X_{i})$ is fixed, $π_{j + 1} (X_{i})$ is determined by $X_{i, j}$ and vice versa, so

D [π_{j + 1} (X_{[m]}) | π_{j} (X_{[m]})] = D [(X_{i, j})_{1 \leq i \leq m} | π_{j} (X_{[m]})] .

Since the $X_{i, j}$ are jointly independent, we may further simplify:

D [(X_{i, j})_{1 \leq i \leq m} | π_{j} (X_{[m]})] = D [(X_{i, j})_{1 \leq i \leq m} | (X_{i, j} + \dots + X_{i, m})_{1 \leq i \leq m}] .

Putting all this into the conclusion of Lemma 12.35, we obtain

\sum_{j = 1}^{m} D [(X_{i, j})_{1 \leq i \leq m}] \geq \begin{aligned} \sum_{j = 1}^{m - 1} D [(X_{i, j})_{1 \leq i \leq m} | (X_{i, j} + \dots + X_{i, m})_{1 \leq i \leq m}] \\ + D [(\sum_{j = 1}^{m} X_{i, j})_{1 \leq i \leq m}] \\ + I [(\sum_{i = 1}^{m} X_{i, j})_{j = 1}^{m} : (\sum_{j = 1}^{m} X_{i, j})_{i = 1}^{m} | \sum_{i = 1}^{m} \sum_{j = 1}^{m} X_{i, j}] \end{aligned}

and the claim follows by rearranging.

12.6 Bounding the mutual information

As before, $G$ is an abelian group, and $m \geq 2$ . We let $X_{[m]} = (X_{i})_{i = 1}^{m}$ be a $τ$ -minimizer.

Proposition 12.37 Bounding mutual information

Suppose that $X_{i, j}$ , $1 \leq i, j \leq m$ , are jointly independent $G$ -valued random variables, such that for each $j = 1, \dots, m$ , the random variables $(X_{i, j})_{i = 1}^{m}$ coincide in distribution with some permutation of $X_{[m]}$ . Write

I := I [(\sum_{i = 1}^{m} X_{i, j})_{j = 1}^{m} : (\sum_{j = 1}^{m} X_{i, j})_{i = 1}^{m} | \sum_{i = 1}^{m} \sum_{j = 1}^{m} X_{i, j}] .

Then

I \leq 4 m^{2} η k .

Proof ▶

For each $j \in {1, \dots, m}$ we call the tuple $(X_{i, j})_{i = 1}^{m}$ a column and for each $i \in {1, \dots, m}$ we call the tuple $(X_{i, j})_{j = 1}^{m}$ a row. Hence, by hypothesis, each column is a permutation of $X_{[m]} = (X_{i})_{i = 1}^{m}$ .

From Corollary 12.36 we have

I \leq \sum_{j = 1}^{m - 1} A_{j} + B,

where

A_{j} := D [(X_{i, j})_{i = 1}^{m}] - D [(X_{i, j})_{i = 1}^{m} | (X_{i, j} + \dots + X_{i, m})_{i = 1}^{m}]

and

B := D [(X_{i, m})_{i = 1}^{m}] - D [(\sum_{j = 1}^{m} X_{i, j})_{i = 1}^{m}] .

We first consider the $A_{j}$ , for fixed $j \in {1, \dots, m - 1}$ . By Lemma 12.16 and and our hypothesis on columns, we have

D [(X_{i, j})_{i = 1}^{m}] = D [(X_{i})_{i = 1}^{m}] = k .

Let $σ = σ_{j} : I \to I$ be a permutation such that $X_{i, j} = X_{σ (i)}$ , and write $X_{i}^{'} := X_{i, j}$ and $Y_{i} := X_{i, j} + \dots + X_{i, m}$ . Corollary 12.32, we have

\begin{aligned} A_{j} & \leq η \sum_{i = 1}^{m} d [X_{i, j}; X_{i, j} | X_{i, j} + \dots + X_{i, m}] . \end{aligned}

We similarly consider $B$ . By Lemma 12.16 applied to the $m$ -th column,

D [(X_{i, m})_{i = 1}^{m}] = D [X_{[m]}] = k .

For $1 \leq i \leq m$ , denote the sum of row $i$ by

V_{i} := \sum_{j = 1}^{m} X_{i, j};

if we apply Corollary 12.32 again, now with $X_{σ (i)} = X_{i, m}$ , $X_{i}^{'} := V_{i}$ , and with the variable $Y_{i}$ being trivial, we obtain

B \leq η \sum_{i = 1}^{m} d [X_{i, m}; V_{i}] .

It remains to bound the distances appearing in 16 and 17 further using Ruzsa calculus. For $1 \leq j \leq m - 1$ and $1 \leq i \leq m$ , by Lemma 3.25 we have

\begin{aligned} d [X_{i, j}; X_{i, j} | X_{i, j} + \dots + X_{i, m}] \leq d [X_{i, j}; X_{i, j}] \\ + \frac{1}{2} (H [X_{i, j} + \dots + X_{i, m}] - H [X_{i, j + 1} + \dots + X_{i, m}]) . \end{aligned}

For each $i$ , summing over $j = 1, \dots, m - 1$ gives

\begin{aligned} \sum_{j = 1}^{m - 1} d [X_{i, j}; X_{i, j} | X_{i, j} + \dots + X_{i, m}] \\ \leq \sum_{j = 1}^{m - 1} d [X_{i, j}; X_{i, j}] + \frac{1}{2} (H [V_{i}] - H [X_{i, m}]) . \end{aligned}

On the other hand, by Lemma 12.8 (since $X_{i, m}$ appears in the sum $V_{i}$ ) we have

\begin{aligned} d [X_{i, m}; V_{i}] & \leq d [X_{i, m}; X_{i, m}] + \frac{1}{2} (H [V_{i}] - H [X_{i, m}]) . \end{aligned}

Combining 15, 16 and 17 with 18 and 19 (the latter two summed over $i$ ), we get

\begin{aligned} \frac{1}{η} I & \leq \sum_{i, j = 1}^{m} d [X_{i, j}; X_{i, j}] + \sum_{i = 1}^{m} (H [V_{i}] - H [X_{i, m}]) \\ = m \sum_{i = 1}^{m} d [X_{i}; X_{i}] + \sum_{i = 1}^{m} H [V_{i}] - \sum_{i = 1}^{m} H [X_{i}] . \end{aligned}

By Lemma 12.9 (with $f$ taking each $j$ to the index $j^{'}$ such that $X_{i, j}$ is a copy of $X_{j^{'}}$ ) we obtain the bound

H [V_{i}] \leq H [\sum_{j = 1}^{m} X_{j}] + \sum_{j = 1}^{m} d [X_{i, j}; X_{i, j}] .

Finally, summing over $i$ and using $D [X_{[m]}] = k$ gives

\begin{aligned} \sum_{i = 1}^{m} H [V_{i}] - \sum_{i = 1}^{m} H [X_{i}] & \leq \sum_{i, j = 1}^{m} d [X_{i, j}; X_{i, j}] + m k \\ = m \sum_{i = 1}^{m} d [X_{i}; X_{i}] + m k, \end{aligned}

where in the second step we used the permutation hypothesis. Combining this with 20 gives the

I \leq 2 η m (\sum_{i = 1}^{m} d [X_{i}; X_{i}]) .

The claim 14 is now immediate from Lemma 12.18.

12.7 Endgame

Now let $m \geq 2$ , let $G$ be an $m$ -torsion abelian group, and let $(X_{i})_{1 \leq i \leq m}$ be a $τ$ -minimizer.

Definition 12.38 Additional random variables

✓

By a slight abuse of notation, we identify $Z / m Z$ and ${1, \dots, m}$ in the obvious way, and let $Y_{i, j}$ be an independent copy of $X_{i}$ for $i, j \in Z / m Z$ . Then also define:

W := \sum_{i, j \in Z / m Z} Y_{i, j}

and

Z_{1} := \sum_{i, j \in Z / m Z} i Y_{i, j}, Z_{2} := \sum_{i, j \in Z / m Z} j Y_{i, j}, Z_{3} := \sum_{i, j \in Z / m Z} (- i - j) Y_{i, j} .

The addition $(- i - j)$ takes place over $Z / m Z$ . Note that, because we are assuming $G$ is $m$ -torsion, it is well-defined to multiply elements of $G$ by elements of $Z / m Z$ . We will also define for $i, j, r \in Z / m Z$ the variables

P_{i} := \sum_{j \in Z / m Z} Y_{i, j}, Q_{j} := \sum_{i \in Z / m Z} Y_{i, j}, R_{r} := \sum_{\begin{matrix} i, j \in Z / m Z \\ i + j = - r \end{matrix}} Y_{i, j} .

Lemma 12.39 Zero-sum

✓

We have

Z_{1} + Z_{2} + Z_{3} = 0

Proof ▶

Proposition 12.40 Mutual information bound

We have

I [Z_{1} : Z_{2} | W], I [Z_{2} : Z_{3} | W], I [Z_{1} : Z_{3} | W] \leq t

where

t := 4 m^{2} η k .

Proof ▶

We analyze these variables by Proposition 12.37 in several different ways. In the first application, take $X_{i, j} = Y_{i, j}$ . Note that each column $(X_{i, j})_{i = 1}^{m}$ is indeed a permutation of $X_{1}, \dots, X_{m}$ ; in fact, the trivial permutation. Note also that for each $i \in Z / m Z$ , the row sum is

\sum_{j = 1}^{m} X_{i, j} = \sum_{j \in Z / m Z} Y_{i, j} = P_{i}

and for each $j \in Z / m Z$ , the column sum is

\sum_{i = 1}^{m} X_{i, j} = \sum_{i \in Z / m Z} Y_{i, j} = Q_{j} .

Finally note that $\sum_{i, j = 1}^{m} X_{i, j} = W$ . From Proposition 12.37 we then have

I [(P_{i})_{i \in Z / m Z} : (Q_{j})_{j \in Z / m Z} | W] \leq t,

with $t$ as in 23. Since $Z_{1}$ is a function of $(P_{i})_{i \in Z / m Z}$ by 21, and similarly $Z_{2}$ is a function of $(Q_{j})_{j \in Z / m Z}$ , it follows immediately from Lemma 12.4 that

I [Z_{1} : Z_{2} | W] \leq t .

In the second application of Proposition 12.37, we instead consider $X_{i, j}^{'} = Y_{i - j, j}$ . Again, for each fixed $j$ , the tuple $(X_{i, j}^{'})_{i = 1}^{m}$ is a permutation of $X_{1}, \dots, X_{m}$ . This time the row sums for $i \in {1, \dots, m}$ are

\sum_{j = 1}^{m} X_{i, j}^{'} = \sum_{j \in Z / m Z} Y_{i - j, j} = R_{- i} .

Similarly, the column sums for $j \in {1, \dots, m}$ are

\sum_{i = 1}^{m} X_{i, j}^{'} = \sum_{i \in Z / m Z} Y_{i - j, j} = Q_{j} .

As before, $\sum_{i, j = 1}^{m} X_{i, j}^{'} = W$ . Hence, using 21 and Lemma 12.4 again, Proposition 12.37 tells us

I [Z_{3} : Z_{2} | W] \leq I [(R_{i})_{i \in Z / m Z} : (Q_{j})_{j \in Z / m Z} | W] \leq t .

In the third application ¹ of Proposition 12.37, take $X_{i, j}^{″} = Y_{i, j - i}$ . The column and row sums are respectively

\sum_{j = 1}^{m} X_{i, j}^{″} = \sum_{j \in Z / m Z} Y_{i, j - i} = P_{i}

and

\sum_{i = 1}^{m} X_{i, j}^{″} = \sum_{i \in Z / m Z} Y_{i, j - i} = R_{- j} .

Hence, Proposition 12.37 and Lemma 12.4 give

I [Z_{1} : Z_{3} | W] \leq I [(P_{i})_{i \in Z / m Z} : (R_{j})_{j \in Z / m Z} | W] \leq t,

which completes the proof.

Lemma 12.41 Entropy of

W

We have $H [W] \leq (2 m - 1) k + \frac{1}{m} \sum_{i = 1}^{m} H [X_{i}]$ .

Proof ▶

Without loss of generality, we may take $X_{1}, \dots, X_{m}$ to be independent. Write $S = \sum_{i = 1}^{m} X_{i}$ . Note that for each $j \in Z / m Z$ , the sum $Q_{j}$ from 21 above has the same distribution as $S$ . By Lemma 12.6 we have

\begin{aligned} H [W] = H [\sum_{j \in Z / m Z} Q_{j}] & \leq H [S] + \sum_{j = 2}^{m} (H [Q_{1} + Q_{j}] - H [S]) \\ = H [S] + (m - 1) d [S; - S] . \end{aligned}

By Lemma 12.20, we have

d [S; - S] \leq 2 k

and hence

H [W] \leq 2 k (m - 1) + H [S] .

From Definition 12.12 we have

H [S] = k + \frac{1}{m} \sum_{i = 1}^{m} H [X_{i}],

and the claim follows.

Lemma 12.42 Entropy of

Z_{2}

We have $H [Z_{2}] \leq (8 m^{2} - 16 m + 1) k + \frac{1}{m} \sum_{i = 1}^{m} H [X_{i}]$ .

Proof ▶

We observe

H [Z_{2}] = H [\sum_{j \in Z / m Z} j Q_{j}] .

Applying Lemma 12.6 one has

\begin{aligned} H [Z_{2}] & \leq \sum_{i = 2}^{m - 1} H [Q_{1} + i Q_{i}] - (m - 2) H [S] . \end{aligned}

Using Lemma 12.11 and 24 we get

\begin{aligned} H [Z_{2}] & \leq H [S] + 4 m (m - 2) d [S; - S] \\ \leq H [S] + 8 m (m - 2) k . \end{aligned}

Applying 25 gives the claim.

Lemma 12.43 Mutual information bound

We have $I [W : Z_{2}] \leq 2 (m - 1) k$ .

Proof ▶

From Lemma 2.16 we have $I [W : Z_{2}] = H [W] - H [W | Z_{2}]$ , and since $Z_{2} = \sum_{j = 1}^{m - 1} j Q_{j}$ and $W = \sum_{j = 1}^{m} Q_{j}$ ,

H [W | Z_{2}] \geq H [W | Q_{1}, \dots, Q_{m - 1}] = H [Q_{m}] = H [S] .

Hence, by Lemma 12.41,

I [W : Z_{2}] \leq H [W] - H [S] \leq 2 (m - 1) k,

as claimed.

Lemma 12.44 Distance bound

We have $\sum_{i = 1}^{m} d [X_{i}; Z_{2} | W] \leq 8 (m^{3} - m^{2}) k$ .

Proof ▶

For each $i \in {1, \dots, m}$ , using Lemma 12.8 (noting the sum $Z_{2}$ contains $X_{i}$ as a summand) we have

d [X_{i}; Z_{2}] \leq d [X_{i}; X_{i}] + \frac{1}{2} (H [Z_{2}] - H [X_{i}])

and using Lemma 3.24 we have

d [X_{i}; Z_{2} | W] \leq d [X_{i}; Z_{2}] + \frac{1}{2} I [W : Z_{2}] .

Combining with 26 and Lemma 12.43 gives

d [X_{i}; Z_{2} | W] \leq d [X_{i}; X_{i}] + \frac{1}{2} (H [Z_{2}] - H [X_{i}]) + (m - 1) k .

Summing over $i$ and applying Lemma 12.42 gives

\sum_{i = 1}^{m} d [X_{i}; Z_{2} | W] \leq \sum_{i = 1}^{m} d [X_{i}; X_{i}] + m (8 m^{2} - 16 m + 1) k + m (m - 1) k .

Finally, applying Lemma 12.18 (and dropping some lower order terms) gives the claim.

Lemma 12.45 Application of BSG

Let $G$ be an abelian group, let $(T_{1}, T_{2}, T_{3})$ be a $G^{3}$ -valued random variable such that $T_{1} + T_{2} + T_{3} = 0$ holds identically, and write

δ := I [T_{1} : T_{2}] + I [T_{1} : T_{3}] + I [T_{2} : T_{3}] .

Let $Y_{1}, \dots, Y_{n}$ be some further $G$ -valued random variables and let $α > 0$ be a constant. Then there exists a random variable $U$ such that

d [U; U] + α \sum_{i = 1}^{n} d [Y_{i}; U] \leq (2 + \frac{α n}{2}) δ + α \sum_{i = 1}^{n} d [Y_{i}; T_{2}] .

Proof ▶

We apply Lemma 3.23 with $X = T_{1}$ and $Y = T_{2}$ . Since $T_{1} + T_{2} = - T_{3}$ , we find that

\begin{aligned} \sum_{z} p_{T_{3}} (z) & d [T_{1} | T_{3} = z; T_{2} | T_{3} = z] \\ \leq 3 I [T_{1} : T_{2}] + 2 H [T_{3}] - H [T_{1}] - H [T_{2}] \\ = I [T_{1} : T_{2}] + I [T_{1} : T_{3}] + I [T_{2} : T_{3}] = δ, \end{aligned}

where the last line follows from Lemma 2.2 by observing

H [T_{1}, T_{2}] = H [T_{1}, T_{3}] = H [T_{2}, T_{3}] = H [T_{1}, T_{2}, T_{3}]

since any two of $T_{1}, T_{2}, T_{3}$ determine the third.

By 28 and the triangle inequality,

\sum_{z} p_{T_{3}} (z) d [T_{2} | T_{3} = z; T_{2} | T_{3} = z] \leq 2 δ

and by Lemma 3.25, for each $Y_{i}$ ,

\begin{aligned} \sum_{z} p_{T_{3}} (z) d [Y_{i}; T_{2} | T_{3} = z] \\ = d [Y_{i}; T_{2} | T_{3}] \leq d [Y_{i}; T_{2}] + \frac{1}{2} I [T_{2} : T_{3}] \leq d [Y_{i}; T_{2}] + \frac{δ}{2} . \end{aligned}

Hence,

\begin{aligned} \sum_{z} p_{T_{3}} (z) (d [T_{2} | T_{3} = z; T_{2} | T_{3} = z] + α \sum_{i = 1}^{n} d [Y_{i}; T_{2} | T_{3} = z]) \\ \leq (2 + \frac{α n}{2}) δ + α \sum_{i = 1}^{n} d [Y_{i}; T_{2}], \end{aligned}

and the result follows by setting $U = (T_{2} | T_{3} = z)$ for some $z$ such that the quantity in parentheses on the left-hand side is at most the weighted average value.

Proposition 12.46 Vanishing entropy

We have $k = 0$ .

Proof ▶

For each value $W = w$ , apply Lemma 12.45 (and Lemma 12.39) to

T_{1} = (Z_{1} | W = w), T_{2} = (Z_{2} | W = w), T_{3} = (Z_{3} | W = w)

with $Y_{i} = X_{i}$ and $α = η / m$ . Write

δ_{w} := I [T_{1} : T_{2}] + I [T_{1} : T_{3}] + I [T_{2} : T_{3}]

for this choice, and note that

\begin{aligned} δ_{*} := \sum_{w} p_{W} (w) δ_{w} & = I [Z_{1} : Z_{2} | W] + I [Z_{1} : Z_{3} | W] + I [Z_{2} : Z_{3} | W] \\ \leq 12 m^{2} η k \end{aligned}

by Proposition 12.40. Write $U_{w}$ for the random variable guaranteed to exist by Lemma 12.45, so that 27 gives

d [U_{w}; U_{w}] \leq (2 + \frac{α m}{2}) δ_{w} + α \sum_{i = 1}^{m} (d [X_{i}; T_{2}] - d [X_{i}; U_{w}]) .

Let $(U_{w})_{I}$ denote the tuple consisting of the same variable $U_{w}$ repeated $m$ times. By Lemma 12.19

D [(U_{w})_{I}] \leq m d [U_{w}; U_{w}] .

On the other hand, from Lemma 12.27 one has

D [(U_{w})_{I}] \geq k - η \sum_{i = 1}^{m} d [X_{i}; U_{w}] .

Combining 30, 31 and 32 and averaging over $w$ (with weight $p_{W} (w)$ ), and recalling the value $α = η / m$ , gives

m (2 + \frac{η}{2}) δ_{*} + η \sum_{i = 1}^{m} d [X_{i}; Z_{2} | W] \geq k

since the terms $d [X_{i}; U_{w}]$ cancel by our choice of $α$ . Substituting in Lemma 12.44 and 29, and using the fact that $2 + \frac{η}{2} < 3$ , we have

12 m^{3} (2 + \frac{η}{2}) η k + η 8 (m^{3} - m^{2}) k \geq k .

From Definition 12.22 we have we have

12 m^{3} (2 + \frac{η}{2}) η + η 8 (m^{3} - m^{2}) < 1

and hence $k \leq 0$ . The claim now follows from Lemma 12.15.

12.8 Wrapping up

Theorem 12.47 Entropy form of PFR

Suppose that $G$ is a finite abelian group of torsion $m$ . Suppose that $X$ is a $G$ -valued random variable. Then there exists a subgroup $H \leq G$ such that

d [X; U_{H}] \leq 64 m^{3} d [X; X] .

Proof ▶

Set $X^{0} := X$ . By Proposition 12.25, there exists a $τ$ -minimizer $X_{[m]} = (X_{i})_{1 \leq i \leq m}$ . By Proposition 12.46, we have $D [X_{[m]}] = 0$ . By Proposition 12.26 and the pigeonhole principle, there exists $1 \leq i \leq m$ such that $d [X_{i}; X] \leq \frac{2}{η} d [X; X]$ . By Proposition 12.21, we have $d [X_{i}; U_{H}] = 0$ for some subgroup $H \leq G$ , hence by Lemma 3.18 we have $d [U_{H}; X] \leq \frac{2}{η} d [X; X]$ . The claim then follows from Definition 12.22.

Lemma 12.48

Suppose that $G$ is a finite abelian group of torsion $m$ . If $A \subset G$ is non-empty and $| A + A | \leq K | A |$ , then $A$ can be covered by at most $K^{(64 m^{3} + 2) / 2} | A |^{1 / 2} / | H |^{1 / 2}$ translates of a subspace $H$ of $G$ with

| H | / | A | \in [K^{- 64 m^{3}}, K^{64 m^{3}}] .

Proof ▶

Theorem 12.49 PFR

✓

Suppose that $G$ is a finite abelian group of torsion $m$ . If $A \subset G$ is non-empty and $| A + A | \leq K | A |$ , then $A$ can be covered by most $m K^{96 m^{3} + 2}$ translates of a subspace $H$ of $G$ with $| H | \leq | A |$ .

Proof ▶

12 The m-torsion case

12.1 Data processing inequality

12.2 More Ruzsa distance estimates

12.3 Multidistance

12.4 The tau functional

12.5 The multidistance chain rule

12.6 Bounding the mutual information

12.7 Endgame

12.8 Wrapping up

12 The $m$ -torsion case