PFR

12 The \(m\)-torsion case

12.1 Data processing inequality

Lemma 12.1 Data processing for a single variable
#

Let \(X\) be a random variable. Then for any function \(f\) on the range of \(X\), one has \(\mathbb {H}[f(X)] \leq \mathbb {H}[X]\).

Proof

We have

\[ \mathbb {H}[X] = \mathbb {H}[X,f(X)] = \mathbb {H}[f(X)] + \mathbb {H}[X|f(X)] \]

thanks to Lemma 2.2 and Lemma 2.13, giving the claim.

Lemma 12.2 One-sided unconditional data processing inequality
#

Let \(X,Y\) be random variables. For any function \(f, g\) on the range of \(X\), we have \(\mathbb {I}[f(X) : Y] \leq \mathbb {I}[X:Y]\).

Proof

By Lemma 2.16 it suffices to show that \(\mathbb {H}[Y|X] \leq \mathbb {H}[Y|f(X)]\). But this follows from Corollary 2.20 (and Lemma 2.2).

Lemma 12.3 Unconditional data processing inequality
#

Let \(X,Y\) be random variables. For any functions \(f, g\) on the ranges of \(X, Y\) respectively, we have \(\mathbb {I}[f(X) : g(Y )] \leq \mathbb {I}[X : Y]\).

Proof

From Lemma 12.2, Lemma 2.9 we have \(\mathbb {I}[f(X) : Y] \leq \mathbb {I}[X:Y]\) and \(\mathbb {I}[f(X): g(Y)] \leq \mathbb {I}[f(X):Y]\), giving the claim.

Lemma 12.4 Data processing inequality
#

Let \(X,Y,Z\). For any functions \(f, g\) on the ranges of \(X, Y\) respectively, we have \(\mathbb {I}[f(X) : g(Y )|Z] \leq \mathbb {I}[X :Y |Z]\).

Proof

Apply Lemma 12.3 to \(X,Y\) conditioned to the event \(Z=z\), multiply by \({\bf P}[Z=z]\), and sum using Definition 2.25.

12.2 More Ruzsa distance estimates

Let \(G\) be an additive group.

Lemma 12.5 Flipping a sign
#

If \(X,Y\) are \(G\)-valued, then

\[ d[X ; -Y] \leq 3 d[X;Y]. \]
Proof

Without loss of generality (using Lemma 3.10 and Lemma 3.7) we may take \(X,Y\) to be independent. By \((X_1,Y_1)\), \((X_2,Y_2)\) be copies of \((X,Y)\) that are conditionally independent over \(X_1-Y_1=X_2-Y_2\) (this exists thanks to Lemma 3.22). By Lemma 3.7, we can also find another copy \((X_3,Y_3)\) of \((X,Y)\) that is independent of \(X_1,Y_1,X_2,Y_2\). From Corollary 2.21, one has

\[ \mathbb {H}[X_3-Y_2, X_1-Y_3, X_2, Y_1, X_3, Y_3, X_3+Y_3] + \mathbb {H}[X_3+Y_3] \leq \mathbb {H}[X_3-Y_2, X_1-Y_3, X_2, Y_1, X_3+Y_3] + \mathbb {H}[X_3, Y_3, X_3+Y_3]. \]

From Lemma 3.11, Lemma 3.1, Lemma 3.10 we have

\[ \mathbb {H}[X_3+Y_3] = \frac{1}{2} \mathbb {H}[X_3] + \frac{1}{2} \mathbb {H}[-Y_3] + d[X_3;-Y_3] = \frac{1}{2} \mathbb {H}[X] + \frac{1}{2} \mathbb {H}[Y] + d[X;-Y]. \]

Since \(X_3+Y_3\) is a function of \(X_3,Y_3\), we see from Lemma 2.2 and Corollary 2.24 that

\[ \mathbb {H}[X_3,Y_3,X_3+Y_3] = \mathbb {H}[X_3,Y_3] = \mathbb {H}[X,Y] = \mathbb {H}[X]+\mathbb {H}[Y]. \]

Because \(X_1-Y_1=X_2-Y_2\), we have

\[ X_3+Y_3 = (X_3-Y_2) - (X_1-Y_3) + (X_2+Y_1) \]

and thus by Lemma 2.2

\[ \mathbb {H}[X_3-Y_2, X_1-Y_3, X_2, Y_1, X_3+Y_3] = \mathbb {H}[X_3-Y_2, X_1-Y_3, X_2, Y_1] \]

and hence by Corollary 2.18

\[ \mathbb {H}[X_3-Y_2, X_1-Y_3, X_2, Y_1, X_3+Y_3] \leq \mathbb {H}[X_3-Y_2] + \mathbb {H}[X_1-Y_3] + \mathbb {H}[X_2] + \mathbb {H}[Y_1]. \]

Since \(X_3,Y_2\) are independent, we see from Lemma 3.11, Lemma 3.10 that

\[ \mathbb {H}[X_3-Y_2] = \frac{1}{2} \mathbb {H}[X] + \frac{1}{2} \mathbb {H}[Y] + d[X; Y]. \]

Similarly

\[ \mathbb {H}[X_1-Y_3] = \frac{1}{2} \mathbb {H}[X] + \frac{1}{2} \mathbb {H}[Y] + d[X; Y]. \]

We conclude that

\[ \mathbb {H}[X_3-Y_2, X_1-Y_3, X_2, Y_1, X_3+Y_3] \leq 2\mathbb {H}[X] + 2\mathbb {H}[Y] + 2d[X; Y]. \]

Finally, from Lemma 12.1 we have

\[ \mathbb {H}[X_1,Y_1,X_2,Y_2,X_3,Y_3] \leq \mathbb {H}[X_3-Y_2, X_1-Y_3, X_2, Y_1, X_3, Y_3, X_3+Y_3]. \]

From Corollary 2.24 followed by Corollary 2.30, we have

\[ \mathbb {H}[X_1,Y_1,X_2,Y_2,X_3,Y_3] = \mathbb {H}[X_1,Y_1,X_1-Y_1] + \mathbb {H}[X_2,Y_2,X_2-Y_2] - \mathbb {H}[X_1-Y_1] + \mathbb {H}[X_3,Y_3] \]

and thus by Lemma 3.11, Lemma 3.10, Lemma 2.2, Corollary 2.24

\[ \mathbb {H}[X_1,Y_1,X_2,Y_2,X_3,Y_3] = \mathbb {H}[X] + \mathbb {H}[Y] + \mathbb {H}[X] + \mathbb {H}[Y] -\left(\frac{1}{2}\mathbb {H}[X] + \frac{1}{2}\mathbb {H}[Y] + d[X; Y]\right) + \mathbb {H}[X] + \mathbb {H}[Y]. \]

Applying all of these estimates, the claim now follows from linear arithmetic.

Lemma 12.6 Kaimonovich–Vershik–Madiman inequality
#

If \(n \geq 0\) and \(X, Y_1, \dots , Y_n\) are jointly independent \(G\)-valued random variables, then

\[ \mathbb {H}\left[X + \sum _{i=1}^n Y_i\right] - \mathbb {H}[X] \leq \sum _{i=1}^n \left(\mathbb {H}[X+Y_i] - \mathbb {H}[X]\right). \]
Proof

This is trivial for \(n=0,1\), while the \(n=2\) case is Lemma 3.21. Now suppose inductively that \(n {\gt} 2\), and the claim was already proven for \(n-1\). By a further application of Lemma 3.21 one has

\[ \mathbb {H}\left[X + \sum _{i=1}^n Y_i\right] - \mathbb {H}\left[X + \sum _{i=1}^{n-1} Y_i\right] \leq \mathbb {H}[X+Y_n] - \mathbb {H}[X]. \]

By induction hypothesis one has

\[ \mathbb {H}\left[X + \sum _{i=1}^{n-1} Y_i\right] - \mathbb {H}[X] \leq \sum _{i=1}^{n-1} \mathbb {H}[X+Y_i] - \mathbb {H}[X]. \]

Summing the two inequalities, we obtain the claim.

Lemma 12.7 Kaimonovich–Vershik–Madiman inequality, II
#

If \(n \geq 1\) and \(X, Y_1, \dots , Y_n\) are jointly independent \(G\)-valued random variables, then

\[ d[X; \sum _{i=1}^n Y_i] \leq 2 \sum _{i=1}^n d[X; Y_i]. \]
Proof

Applying Lemma 12.6 with all the \(Y_i\) replaced by \(-Y_i\), and using Lemma 3.1 and Lemma 3.11, we obtain after some rearranging

\[ d[X; \sum _{i=1}^n Y_i] + \frac{1}{2} (\mathbb {H}[\sum _{i=1}^n Y_i] - \mathbb {H}[X]) \leq \sum _{i=1}^n \left(d[X;Y_i] + \frac{1}{2} (\mathbb {H}[Y_i] - \mathbb {H}[X])\right). \]

From Corollary 3.5 we have

\[ \mathbb {H}[\sum _{i=1}^n Y_i] \geq \mathbb {H}[Y_i] \]

for all \(i\); subtracting \(\mathbb {H}[X]\) and averaging, we conclude that

\[ \mathbb {H}[\sum _{i=1}^n Y_i] - \mathbb {H}[X] \geq \frac{1}{n} \sum _{i=1}^n \mathbb {H}[Y_i] - \mathbb {H}[X] \]

and thus

\[ d[X; \sum _{i=1}^n Y_i] \leq \sum _{i=1}^n d[X;Y_i] + \frac{n-1}{2n} (\mathbb {H}[Y_i] - \mathbb {H}[X]). \]

From Lemma 3.13 we have

\[ \mathbb {H}[Y_i] - \mathbb {H}[X] \leq 2 d[X;Y_i]. \]

Since \(0 \leq \frac{n-1}{2n} \leq \frac{1}{2}\), the claim follows.

Lemma 12.8 Kaimonovich–Vershik–Madiman inequality, III
#

If \(n \geq 1\) and \(X, Y_1, \dots , Y_n\) are jointly independent \(G\)-valued random variables, then

\[ d\left[X; \sum _{i=1}^n Y_i\right] \leq d\left[X; Y_1\right] + \frac{1}{2}\left(\mathbb {H}\left[ \sum _{i=1}^n Y_i\right] - \mathbb {H}[Y_1]\right). \]
Proof

From Lemma 3.21 one has

\[ \mathbb {H}\left[-X + \sum _{i=1}^n Y_i\right] \leq \mathbb {H}[ - X + Y_1 ] + \mathbb {H}\left[ \sum _{i=1}^n Y_i \right] - \mathbb {H}[Y_1]. \]

The claim then follows from Lemma 3.11 and some elementary algebra.

Lemma 12.9 Comparing sums
#

Let \((X_i)_{1 \leq i \leq m}\) and \((Y_j)_{1 \leq j \leq l}\) be tuples of jointly independent random variables (so the \(X\)’s and \(Y\)’s are also independent of each other), and let \(f: \{ 1,\dots ,l\} \to \{ 1,\dots ,m\} \) be a function, then

\[ \mathbb {H}[\sum _{j=1}^l Y_j] \leq \mathbb {H}[ \sum _{i=1}^m X_i ] + \sum _{j=1}^l (\mathbb {H}[ Y_j - X_{f(j)}] - \mathbb {H}[X_{f(j)}]). \]
Proof

Write \(W := \sum _{i=1}^m X_i\). From Corollary 3.5 we have

\[ \mathbb {H}[\sum _{j=1}^l Y_j] \leq \mathbb {H}[-W + \sum _{j=1}^l Y_j] \]

while from Lemma 12.6 one has

\[ \mathbb {H}[-W + \sum _{j=1}^l Y_j] \leq \mathbb {H}[-W] + \sum _{j=1}^l \mathbb {H}[-W + Y_j] - \mathbb {H}[-W]. \]

From Lemma 3.21 one has

\[ \mathbb {H}[-W + Y_j] - \mathbb {H}[-W] \leq \mathbb {H}[-X_{f(j)} + Y_j] - \mathbb {H}[-X_{f(j)}]. \]

The claim now follows from Lemma 3.1 and some elementary algebra.

Lemma 12.10 Sums of dilates I
#

Let \(X,Y,X'\) be independent \(G\)-valued random variables, with \(X'\) a copy of \(X\), and let \(a\) be an integer. Then

\[ \mathbb {H}[X-(a+1)Y] \leq \mathbb {H}[X-aY] + \mathbb {H}[X-Y-X'] - \mathbb {H}[X] \]

and

\[ \mathbb {H}[X-(a-1)Y] \leq \mathbb {H}[X-aY] + \mathbb {H}[X-Y-X'] - \mathbb {H}[X]. \]
Proof

From Lemma 3.17 we have

\[ \mathbb {H}[(X-Y)-aY] \leq \mathbb {H}[(X-Y) - X'] + \mathbb {H}[X'-aY] - \mathbb {H}[X'] \]

which gives the first inequality. Similarly from Lemma 3.17 we have

\[ \mathbb {H}[(X+Y)-aY] \leq \mathbb {H}[(X+Y) - X'] + \mathbb {H}[X'-aY] - \mathbb {H}[X'] \]

which (when combined with Lemma 3.1) gives the second inequality.

Lemma 12.11 Sums of dilates II
#

Let \(X,Y\) be independent \(G\)-valued random variables, and let \(a\) be an integer. Then

\[ \mathbb {H}[X-aY] - \mathbb {H}[X] \leq 4 |a| d[X;Y]. \]
Proof

From Lemma 3.21 one has

\[ \mathbb {H}[Y-X+X'] - \mathbb {H}[Y-X] \leq \mathbb {H}[Y+X'] - \mathbb {H}[Y] = \mathbb {H}[Y+X] - \mathbb {H}[Y] \]

which by Lemma 3.11 gives

\[ \mathbb {H}[X-Y-X'] -\mathbb {H}[X] \leq d[X;Y] + d[X;-Y] \]

and hence by Lemma 12.5

\[ \mathbb {H}[X-Y-X'] - \mathbb {H}[X] \leq 4d[X;Y]. \]

From Lemma 12.10 we then have

\[ \mathbb {H}[X-(a\pm 1)Y] \leq \mathbb {H}[X-aY] + 4 d[X;Y] \]

and the claim now follows by an induction on \(|a|\).

We remark that in the paper [GGMT2024] the variant estimate

\[ \mathbb {H}[X-aY] - \mathbb {H}[X] \leq (4 + 10 \lfloor \log _2 |a| \rfloor ) d[X;Y] \]

is also proven by a similar method. This variant is superior for \(|a| \geq 9\) (or \(|a|=7\)); but we will not need this estimate here.

12.3 Multidistance

We continue to let \(G\) be an abelian group.

Definition 12.12 Multidistance
#

Let \(m\) be a positive integer, and let \(X_{[m]} = (X_i)_{1 \leq i \leq m}\) be an \(m\)-tuple of \(G\)-valued random variables \(X_i\). Then we define

\[ D[X_{[m]}] := \mathbb {H}[\sum _{i=1}^m \tilde X_i] - \frac{1}{m} \sum _{i=1}^m \mathbb {H}[\tilde X_i], \]

where the \(\tilde X_i\) are independent copies of the \(X_i\).

Lemma 12.13 Multidistance of copy
#

If \(X_{[m]} = (X_i)_{1 \leq i \leq m}\) and \(Y_{[m]} = (Y_i)_{1 \leq i \leq m}\) are such that \(X_i\) and \(Y_i\) have the same distribution for each \(i\), then \(D[X_{[m]}] = D[Y_{[m]}]\).

Proof

Clear from Lemma 3.6.

Lemma 12.14 Multidistance of independent variables
#

If \(X_{[m]} = (X_i)_{1 \leq i \leq m}\) are jointly independent, then \(D[X_{[m]}] = \mathbb {H}[\sum _{i=1}^m X_i] - \frac{1}{m} \sum _{i=1}^m \mathbb {H}[X_i]\).

Proof

Clear from definition.

Lemma 12.15 Nonnegativity
#

For any such tuple, we have \(D[X_{[m]}] \geq 0\).

Proof

From Corollary 3.5 one has

\[ \mathbb {H}[\sum _{i =1}^m \tilde X_i] \geq \mathbb {H}[\tilde X_i] \]

for each \(1 \leq i \leq m\). Averaging over \(i\), we obtain the claim.

Lemma 12.16 Relabeling
#

If \(\phi : \{ 1,\dots ,m\} \to \{ 1,\dots ,m\} \) is a bijection, then \(D[X_{[m]}] = D[(X_{\phi (j)})_{1 \leq j \leq m}]\).

Proof

Trivial.

Lemma 12.17 Multidistance and Ruzsa distance, I

Let \(m \ge 2\), and let \(X_{[m]}\) be a tuple of \(G\)-valued random variables. Then

\[ \sum _{1 \leq j,k \leq m: j \neq k} d[X_j; -X_k] \leq m(m-1) D[X_{[m]}]. \]
Proof

By Lemma 3.10, Lemma 12.13 we may take the \(X_i\) to be jointly independent. From Corollary 3.5, we see that for any distinct \(1 \leq j,k \leq m\), we have

\[ \mathbb {H}[X_j+X_k] \leq \mathbb {H}[\sum _{i=1}^m X_i], \]

and hence by Lemma 3.11

\[ d[X_j;-X_k] \leq \mathbb {H}[\sum _{i=1}^m X_i] - \frac{1}{2} \mathbb {H}[X_j] - \frac{1}{2} \mathbb {H}[X_k]. \]

Summing this over all pairs \((j,k)\), \(j \neq k\) and using Lemma 12.14, we obtain the claim.

Lemma 12.18 Multidistance and Ruzsa distance, II

Let \(m \ge 2\), and let \(X_{[m]}\) be a tuple of \(G\)-valued random variables. Then

\[ \sum _{j=1}^m d[X_j;X_j] \leq 2 m D[X_{[m]}]. \]
Proof

From Lemma 3.18 we have \(\operatorname{dist}{X_j}{X_j} \leq 2 \operatorname{dist}{X_j}{-X_k}\), and applying this to every summand in Lemma 12.17, we obtain the claim.

Lemma 12.19 Multidistance and Ruzsa distance, III

Let \(m \ge 2\), and let \(X_{[m]}\) be a tuple of \(G\)-valued random variables. If the \(X_i\) all have the same distribution, then \(D[X_{[m]}] \leq m d[X_i;X_i]\) for any \(1 \leq i \leq m\).

Proof

By Lemma 3.10, Lemma 12.13 we may take the \(X_i\) to be jointly independent. Let \(X_0\) be a further independent copy of the \(X_i\). From Lemma 12.6, we have

\[ \mathbb {H}[-X_0 + \sum _{i=1}^m X_i] - \mathbb {H}[-X_0] \leq \sum _{i=1}^m \mathbb {H}[X_0 - X_i] - \mathbb {H}[-X_0] \]

and hence by Lemma 3.1 and Lemma 3.11

\[ \mathbb {H}[-X_0 + \sum _{i=1}^m X_i] - \mathbb {H}[X_0] \leq m d[X_i,X_i]. \]

On the other hand, by Corollary 3.5 we have

\[ \mathbb {H}[\sum _{i=1}^m X_i] \leq \mathbb {H}[-X_0 + \sum _{i=1}^m X_i] \]

and the claim follows.

Lemma 12.20 Multidistance and Ruzsa distance, IV

Let \(m \ge 2\), and let \(X_{[m]}\) be a tuple of independent \(G\)-valued random variables. Let \(W := \sum _{i=1}^m X_i\). Then

\[ d[W;-W] \leq 2 D[X_i]. \]
Proof

Take \((X'_i)_{1 \leq i \leq m}\) to be further independent copies of \((X_i)_{1 \leq i \leq m}\) (which exist by Lemma 3.7), and write \(W' := \sum _{i=1}^m X'_i\). Fix any distinct \(a,b \in I\).

From Lemma 3.21 one has

\begin{equation} \label{7922} \mathbb {H}[W + W'] \leq \mathbb {H}[W] + \mathbb {H}[X_{a} + W'] - \mathbb {H}[X_{a}] \end{equation}
1

and also

\[ \mathbb {H}[X_a + W'] \leq \mathbb {H}[X_a + X_b] + \mathbb {H}[W'] - \mathbb {H}[X'_b]. \]

Combining this with 1 and then applying Corollary 3.5 we have

\begin{align*} \mathbb {H}[W + W’] & \leq 2\mathbb {H}[W] + \mathbb {H}[X_a + X_b] - \mathbb {H}[X_a] - \mathbb {H}[X_b] \\ & \leq 3 \mathbb {H}[W] - \mathbb {H}[X_a] - \mathbb {H}[X_b]. \end{align*}

Averaging this over all choices of \((a,b)\) gives \(\mathbb {H}[W] + 2 D[X_{[m]}]\), and the claim follows from Lemma 3.11.

Proposition 12.21 Vanishing

If \(D[X_{[m]}]=0\), then for each \(1 \leq i \leq m\) there is a finite subgroup \(H_i \leq G\) such that \(d[X_i; U_{H_i}] = 0\).

Proof

From Lemma 12.17 and Lemma 3.15 we have \(d[X_j; X_{-k}]=0\) for all \(1 \leq j,k \leq m\). The claim now follows from Corollary 4.6.

With more effort one can show that \(H_i\) is independent of \(i\), but we will not need to do so here.

12.4 The tau functional

Fix \(m \geq 2\), and a reference variable \(X^0\) in \(G\).

Definition 12.22 \(\eta \)
#

We set \(\eta := \frac{1}{32m^3}\).

Definition 12.23 \(\tau \)-functional
#

If \((X_i)_{1 \leq i \leq m}\) is a tuple, we define its \(\tau \)-functional

\[ \tau [ (X_i)_{1 \leq i \leq m}] := D[(X_i)_{1 \leq i \leq m}] + \eta \sum _{i=1}^m d[X_i; X^0]. \]
Definition 12.24 \(\tau \)-minimizer
#

A \(\tau \)-minimizer is a tuple \((X_i)_{1 \leq i \leq m}\) that minimizes the \(\tau \)-functional among all tuples of \(G\)-valued random variables.

Proposition 12.25 Existence of \(\tau \)-minimizer
#

If \(G\) is finite, then a \(\tau \)-minimizer exists.

Proof

This is similar to the proof of Proposition 6.5.

Proposition 12.26 Minimizer close to reference variables
#

If \((X_i)_{1 \leq i \leq m}\) is a \(\tau \)-minimizer, then \(\sum _{i=1}^m d[X_i; X^0] \leq \frac{2m}{\eta } d[X^0; X^0]\).

Proof

By Definition 12.24 we have

\[ \tau [ (X_i)_{1 \leq i \leq m}] \leq \tau [ (X^0)_{1 \leq i \leq m}] \]

and hence by Definition 12.23 and Lemma 12.15

\[ \eta \sum _{i=1}^m d[X_i; X^0] \leq D[(X^0)_{1 \leq i \leq m}] + m d[X^0; X^0]. \]

The claim now follows from Lemma 12.19.

Lemma 12.27 Lower bound on multidistance
#

If \((X_i)_{1 \leq i \leq m}\) is a \(\tau \)-minimizer, and \(k := D[(X_i)_{1 \leq i \leq m}]\), then for any other tuple \((X'_i)_{1 \leq i \leq m}\), one has

\[ k - D[(X'_i)_{1 \leq i \leq m}] \leq \eta \sum _{i=1}^m d[X_i; X'_i]. \]
Proof

By Definition 12.24 we have

\[ \tau [ (X_i)_{1 \leq i \leq m}] \leq \tau [ (X'_i)_{1 \leq i \leq m}] \]

and hence by Definition 12.23

\[ k + \eta \sum _{i=1}^m d[X_i; X^0] \leq D[(X'_i)_{1 \leq i \leq m}] + \eta \sum _{i=1}^m d[X'_i; X^0]. \]

On the other hand, by Lemma 3.18 we have

\[ d[X'_i; X^0] \leq d[X_i; X^0] + d[X_i; X'_i]. \]

The claim follows.

Definition 12.28 Conditional multidistance
#

If \(X_{[m]} = (X_i)_{1 \leq i \leq m}\) and \(Y_{[m]} = (Y_i)_{1 \leq i \leq m}\) are tuples of random variables, with the \(X_i\) being \(G\)-valued (but the \(Y_i\) need not be), then we define

\begin{equation} \label{multi-def-cond-alt} D[ X_{[m]} | Y_{[m]} ] = \sum _{(y_i)_{1 \leq i \leq m}} \biggl(\prod _{1 \leq i \leq m} p_{Y_i}(y_i)\biggr) D[ (X_i \, |\, Y_i \mathop{=}y_i)_{1 \leq i \leq m}] \end{equation}
2

where each \(y_i\) ranges over the support of \(p_{Y_i}\) for \(1 \leq i \leq m\).

Lemma 12.29 Alternate form of conditional multidistance
#

If the \((X_i,Y_i)\) are independent,

\begin{equation} \label{multi-def-cond} D[ X_{[m]} | Y_{[m]}] := \mathbb {H}[\sum _{i=1}^m X_i \big| (Y_j)_{1 \leq j \leq m} ] - \frac{1}{m} \sum _{i=1}^m \mathbb {H}[ X_i | Y_i]. \end{equation}
3

Proof

This is routine from Definition 2.11 and Definitions 12.12 and 12.28.

Lemma 12.30 Conditional multidistance nonnegative
#

If \(X_{[m]} = (X_i)_{1 \leq i \leq m}\) and \(Y_{[m]} = (Y_i)_{1 \leq i \leq m}\) are tuples of random variables, then \(D[ X_{[m]} | Y_{[m]} ] \geq 0\).

Proof

Clear from Lemma 12.15 and Definition 12.28, except that some care may need to be taken to deal with the \(y_i\) where \(p_{Y_i}\) vanish.

Lemma 12.31 Lower bound on conditional multidistance
#

If \((X_i)_{1 \leq i \leq m}\) is a \(\tau \)-minimizer, and \(k := D[(X_i)_{1 \leq i \leq m}]\), then for any other tuples \((X'_i)_{1 \leq i \leq m}\) and \((Y_i)_{1 \leq i \leq m}\) with the \(X'_i\) \(G\)-valued, one has

\[ k - D[(X'_i)_{1 \leq i \leq m} | (Y_i)_{1 \leq i \leq m}] \leq \eta \sum _{i=1}^m d[X_i; X'_i|Y_i]. \]
Proof

Immediate from Lemma 12.27, Lemma 12.29, and Definition 3.19.

Corollary 12.32 Lower bound on conditional multidistance, II
#

With the notation of the previous lemma, we have

\begin{equation} \label{5.3-conv} k - D[ X'_{[m]} | Y_{[m]} ] \leq \eta \sum _{i=1}^m d[X_{\sigma (i)};X'_i|Y_i] \end{equation}
4

for any permutation \(\sigma : \{ 1,\dots ,m\} \rightarrow \{ 1,\dots ,m\} \).

Proof

This follows from Lemma 12.31 and Lemma 12.16.

12.5 The multidistance chain rule

Lemma 12.33 Multidistance chain rule

Let \(\pi \colon G \to H\) be a homomorphism of abelian groups and let \(X_{[m]}\) be a tuple of jointly independent \(G\)-valued random variables. Then \(D[X_{[m]}]\) is equal to

\begin{equation} D[ X_{[m]} | \pi (X_{[m]}) ] +D[ \pi (X_{[m]}) ] + \mathbb {I}[ \sum _{i=1}^m X_i : \pi (X_{[m]}) \; \big| \; \pi \bigl(\sum _{i=1}^m X_i\bigr) ] \label{chain-eq} \end{equation}
5

where \(\pi (X_{[m]}) := (\pi (X_i))_{1 \leq i \leq m}\).

Proof

For notational brevity during this proof, write \(S := \sum _{i=1}^m X_i\).

From Lemma 2.26 and Lemma 2.2, noting that \(\pi (S)\) is determined both by \(S\) and by \(\pi (X_{[m]})\), we have

\begin{equation*} \mathbb {I}[S:\pi (X_{[m]})|\pi (S)] = \mathbb {H}[S]+\mathbb {H}[\pi (X_{[m]})]-\mathbb {H}[S,\pi (X_{[m]})]-\mathbb {H}[\pi (S)], \end{equation*}

and by Lemma 2.13 the right-hand side is equal to

\begin{equation*} \mathbb {H}[S]-\mathbb {H}[S|\pi (X_{[m]})]-\mathbb {H}[\pi (S)]. \end{equation*}

Therefore,

\begin{equation} \label{chain-1} \mathbb {H}[S]=\mathbb {H}[S|\pi (X_{[m]})]+\mathbb {H}[\pi (S)]+\mathbb {I}[S:\pi (X_{[m]})|\pi (S)]. \end{equation}
6

From a further application of Lemma 2.13 and Lemma 2.2 we have

\begin{equation} \label{chain-2} \mathbb {H}[X_i] = \mathbb {H}[X_i \, | \, \pi (X_i)] + \mathbb {H}[\pi (X_i)] \end{equation}
7

for all \(1 \leq i \leq m\). Averaging 7 in \(i\) and subtracting this from 6, we obtain the claim from Definition 12.12.

We will need to iterate the multidistance chain rule, so it is convenient to observe a conditional version of this rule, as follows.

Lemma 12.34 Conditional multidistance chain rule
#

Let \(\pi \colon G \to H\) be a homomorphism of abelian groups. Let \(I\) be a finite index set and let \(X_{[m]}\) be a tuple of \(G\)-valued random variables. Let \(Y_{[m]}\) be another tuple of random variables (not necessarily \(G\)-valued). Suppose that the pairs \((X_i, Y_i)\) are jointly independent of one another (but \(X_i\) need not be independent of \(Y_i\)). Then

\begin{align} \nonumber D[ X_{[m]} | Y_{[m]} ] & = D[ X_{[m]} \, |\, \pi (X_{[m]}), Y_{[m]}] + D[ \pi (X_{[m]}) \, |\, Y_{[m]}] \\ & \quad \qquad + \mathbb {I}[ \sum _{i=1}^m X_i : \pi (X_{[m]}) \; \big| \; \pi \bigl(\sum _{i=1}^m X_i \bigr), Y_{[m]} ].\label{chain-eq-cond} \end{align}
Proof

For each \(y_i\) in the support of \(p_{Y_i}\), apply Lemma 12.33 with \(X_i\) replaced by the conditioned random variable \((X_i|Y_i=y_i)\), and the claim 8 follows by averaging 5 in the \(y_i\) using the weights \(p_{Y_i}\).

We can iterate the above lemma as follows.

Lemma 12.35

Let \(m\) be a positive integer. Suppose one has a sequence

\begin{equation} \label{g-seq} G_m \to G_{m-1} \to \dots \to G_1 \to G_0 = \{ 0\} \end{equation}
9

of homomorphisms between abelian groups \(G_0,\dots ,G_m\), and for each \(d=0,\dots ,m\), let \(\pi _d : G_m \to G_d\) be the homomorphism from \(G_m\) to \(G_d\) arising from this sequence by composition (so for instance \(\pi _m\) is the identity homomorphism and \(\pi _0\) is the zero homomorphism). Let \(X_{[m]} = (X_i)_{1 \leq i \leq m}\) be a jointly independent tuple of \(G_m\)-valued random variables. Then

\begin{equation} \begin{split} D[ X_{[m]} ] & = \sum _{d=1}^m D[ \pi _d(X_{[m]}) \, |\, \pi _{d-1}(X_{[m]})] \\ & \quad + \sum _{d=1}^{m-1} \mathbb {I}[ \sum _i X_i : \pi _d(X_{[m]}) \; \big| \; \pi _d\big(\sum _i X_i\big), \pi _{d-1}(X_{[m]}) ]. \end{split}\label{chain-eq-cond'} \end{equation}
10

In particular, by Lemma 2.27,

\begin{align} \nonumber D[ X_{[m]} ] \geq & \sum _{d=1}^m D[ \pi _d(X_{[m]})|\pi _{d-1}(X_{[m]}) ] \\ & + \mathbb {I}[ \sum _i X_i : \pi _1(X_{[m]}) \; \big| \; \pi _1\bigl(\sum _i X_i\bigr) ].\label{chain-eq-cond''} \end{align}
Proof

From Lemma 12.34 (taking \(Y_{[m]} = \pi _{d-1}(X_{[m]})\) and \(\pi = \pi _d\) there, and noting that \(\pi _d(X_{[m]})\) determines \(Y_{[m]}\)) we have

\begin{align*} D[ X_{[m]} \, |\, \pi _{d-1}(X_{[m]}) ] & = D[ X_{[m]} \, |\, \pi _d(X_{[m]}) ] + D[ \pi _d(X_{[m]})\, |\, \pi _{d-1}(X_{[m]}) ] \\ & \quad + \mathbb {I}[ \sum _{i=1}^m X_i : \pi _d(X_{[m]}) \; \big| \; \pi _d\bigl(\sum _{i=1}^m X_i\bigr), \pi _{d-1}(X_{[m]}) ] \end{align*}

for \(d=1,\dots ,m\). The claim follows by telescoping series, noting that \(D[X_{[m]} | \pi _0(X_{[m]})] = D[X_{[m]}]\) and that \(\pi _m(X_{[m]})=X_{[m]}\) (and also \(\pi _m( \sum _i X_i ) = \sum _i X_i\)).

In our application we will need the following special case of the above lemma.

Corollary 12.36
#

Let \(G\) be an abelian group and let \(m \geq 2\). Suppose that \(X_{i,j}\), \(1 \leq i, j \leq m\), are independent \(G\)-valued random variables. Then

\begin{align*} & \mathbb {I}[ \bigl(\sum _{i=1}^m X_{i,j}\bigr)_{j =1}^{m} : \bigl(\sum _{j=1}^m X_{i,j}\bigr)_{i = 1}^m \; \big| \; \sum _{i=1}^m \sum _{j = 1}^m X_{i,j} ] \\ & \quad \leq \sum _{j=1}^{m-1} \Bigl(D[(X_{i, j})_{i = 1}^m] - D[ (X_{i, j})_{i = 1}^m \; \big| \; (X_{i,j} + \cdots + X_{i,m})_{i =1}^m ]\Bigr) \\ & \qquad \qquad \qquad \qquad + D[(X_{i,m})_{i=1}^m] - D[ \bigl(\sum _{j=1}^m X_{i,j}\bigr)_{i=1}^m ], \end{align*}

where all the multidistances here involve the indexing set \(\{ 1,\dots , m\} \).

Proof

In Lemma 12.35 we take \(G_d := G^d\) with the maps \(\pi _d \colon G^m \to G^d\) for \(d=1,\dots ,m\) defined by

\[ \pi _d(x_1,\dots ,x_m) := (x_1,\dots ,x_{d-1}, x_d + \cdots + x_m) \]

with \(\pi _0=0\). Since \(\pi _{d-1}(x)\) can be obtained from \(\pi _{d}(x)\) by applying a homomorphism, we obtain a sequence of the form 9.

Now we apply Lemma 12.35 with \(I = \{ 1,\dots , m\} \) and \(X_i := (X_{i,j})_{j = 1}^m\). Using joint independence and Corollary 2.24, we find that

\[ D[ X_{[m]} ] = \sum _{j=1}^m D[ (X_{i,j})_{1 \leq i \leq m} ]. \]

On the other hand, for \(1 \leq j \leq m-1\), we see that once \(\pi _{j}(X_i)\) is fixed, \(\pi _{j+1}(X_i)\) is determined by \(X_{i, j}\) and vice versa, so

\[ D[ \pi _{j+1}(X_{[m]}) \; | \; \pi _{j}(X_{[m]}) ] = D[ (X_{i, j})_{1 \leq i \leq m} \; | \; \pi _{j}(X_{[m]} )]. \]

Since the \(X_{i,j}\) are jointly independent, we may further simplify:

\[ D[ (X_{i, j})_{1 \leq i \leq m} \; | \; \pi _{j}(X_{[m]})] = D[ (X_{i,j})_{1 \leq i \leq m} \; | \; ( X_{i, j} + \cdots + X_{i, m})_{1 \leq i \leq m} ]. \]

Putting all this into the conclusion of Lemma 12.35, we obtain

\[ \sum _{j=1}^{m} D[ (X_{i,j})_{1 \leq i \leq m} ] \geq \begin{aligned} [t] & \sum _{j=1}^{m-1} D[ (X_{i,j})_{1 \leq i \leq m} \; | \; (X_{i,j} + \cdots + X_{i,m})_{1 \leq i \leq m} ] \\ & \! \! \! + D[ \bigl(\sum _{j=1}^m X_{i,j}\bigr)_{1 \leq i \leq m}] \\ & \! \! \! +\mathbb {I}[ \bigl(\sum _{i=1}^m X_{i,j}\bigr)_{j =1}^{m} : \bigl(\sum _{j=1}^m X_{i,j}\bigr)_{i = 1}^m \; \big| \; \sum _{i=1}^m \sum _{j = 1}^m X_{i,j} ] \end{aligned} \]

and the claim follows by rearranging.

12.6 Bounding the mutual information

As before, \(G\) is an abelian group, and \(m\geq 2\). We let \(X_{[m]} = (X_i)_{i =1}^m\) be a \(\tau \)-minimizer.

Proposition 12.37 Bounding mutual information
#

Suppose that \(X_{i,j}\), \(1 \leq i,j \leq m\), are jointly independent \(G\)-valued random variables, such that for each \(j = 1,\dots ,m\), the random variables \((X_{i,j})_{i = 1}^m\) coincide in distribution with some permutation of \(X_{[m]}\). Write

\[ {\mathcal I} := \mathbb {I}[ \bigl(\sum _{i=1}^m X_{i,j}\bigr)_{j =1}^{m} : \bigl(\sum _{j=1}^m X_{i,j}\bigr)_{i = 1}^m \; \big| \; \sum _{i=1}^m \sum _{j = 1}^m X_{i,j} ]. \]

Then

\begin{equation} \label{I-ineq} {\mathcal I} \leq 4 m^2 \eta k. \end{equation}
14

Proof

For each \(j \in \{ 1,\dots ,m\} \) we call the tuple \((X_{i,j})_{i = 1}^m\) a column and for each \(i \in \{ 1,\dots , m\} \) we call the tuple \((X_{i,j})_{j = 1}^m\) a row. Hence, by hypothesis, each column is a permutation of \(X_{[m]} = (X_i)_{i=1}^m\).

From Corollary 12.36 we have

\begin{equation} \label{441} {\mathcal I} \leq \sum _{j=1}^{m-1} A_j + B,\end{equation}
15

where

\[ A_j := D[ (X_{i, j})_{i = 1}^m] - D[ (X_{i, j})_{i = 1}^m \; \big| \; (X_{i,j} + \cdots + X_{i,m})_{i =1}^m ] \]

and

\[ B := D[ (X_{i,m})_{i=1}^m ] - D[ \bigl(\sum _{j=1}^m X_{i,j}\bigr)_{i=1}^m ]. \]

We first consider the \(A_j\), for fixed \(j \in \{ 1,\dots , m-1\} \). By Lemma 12.16 and and our hypothesis on columns, we have

\[ D[ (X_{i, j})_{i = 1}^m ]= D[ (X_i)_{i=1}^m ] = k. \]

Let \(\sigma = \sigma _j \colon I \to I\) be a permutation such that \(X_{i,j} = X_{\sigma (i)}\), and write \(X'_i := X_{i,j}\) and \(Y_i := X_{i,j} + \cdots + X_{i,m}\). Corollary 12.32, we have

\begin{align} A_j & \leq \eta \sum _{i = 1}^m d[X_{i,j}; X_{i, j}|X_{i, j} + \cdots + X_{i,m}].\label{54a} \end{align}

We similarly consider \(B\). By Lemma 12.16 applied to the \(m\)-th column,

\[ D[ (X_{i, m})_{i = 1}^m ] = D[X_{[m]}] = k. \]

For \(1 \leq i \leq m\), denote the sum of row \(i\) by

\[ V_i := \sum _{j=1}^m X_{i,j}; \]

if we apply Corollary 12.32 again, now with \(X_{\sigma (i)} = X_{i,m}\), \(X'_i := V_i\), and with the variable \(Y_i\) being trivial, we obtain

\begin{equation} \label{55a} B \leq \eta \sum _{i = 1}^m d[X_{i,m}; V_i]. \end{equation}
17

It remains to bound the distances appearing in 16 and 17 further using Ruzsa calculus. For \(1 \leq j \leq m-1\) and \(1 \leq i \leq m\), by Lemma 3.25 we have

\begin{align*} & d[ X_{i,j}; X_{i,j}| X_{i,j}+\cdots +X_{i,m}] \leq d[X_{i,j}; X_{i,j}] \\ & \quad + \tfrac {1}{2} \bigl(\mathbb {H}[X_{i,j}+\cdots +X_{i,m}] - \mathbb {H}[X_{i,{j+1}}+\cdots +X_{i,m}]\bigr). \end{align*}

For each \(i\), summing over \(j = 1,\dots , m-1\) gives

\begin{align} \nonumber & \sum _{j=1}^{m-1} d[X_{i,j}; X_{i,j}| X_{i,j}+\cdots +X_{i,m}] \\ & \qquad \leq \sum _{j=1}^{m-1} d[X_{i,j}; X_{i,j}] + \frac12 \bigl( \mathbb {H}[V_i] - \mathbb {H}[X_{i,m}] \bigr). \label{eq:distbnd1} \end{align}

On the other hand, by Lemma 12.8 (since \(X_{i,m}\) appears in the sum \(V_i\)) we have

\begin{align} d[X_{i,m}; V_i] & \leq d[X_{i,m}; X_{i,m}] + \frac12 \bigl( \mathbb {H}[V_i] - \mathbb {H}[X_{i,m}] \bigr). \label{eq:distbnd2} \end{align}

Combining 1516 and 17 with 18 and 19 (the latter two summed over \(i\)), we get

\begin{align} \nonumber \frac1{\eta } {\mathcal I} & \leq \sum _{i,j=1}^m d[X_{i,j};X_{i,j}] + \sum _{i=1}^m (\mathbb {H}[V_i] - \mathbb {H}[X_{i,m}]) \\ & = m \sum _{i=1}^m d[X_i; X_i] + \sum _{i=1}^m \mathbb {H}[V_i] - \sum _{i=1}^m \mathbb {H}[X_i]. \label{eq:distbnd3} \end{align}

By Lemma 12.9 (with \(f\) taking each \(j\) to the index \(j'\) such that \(X_{i,j}\) is a copy of \(X_{j'}\)) we obtain the bound

\[ \mathbb {H}[V_i] \leq \mathbb {H}[\sum _{j=1}^m X_j] + \sum _{j=1}^m d[X_{i,j}; X_{i,j}]. \]

Finally, summing over \(i\) and using \(D[X_{[m]}] = k\) gives

\begin{align*} \sum _{i=1}^m \mathbb {H}[V_i] - \sum _{i=1}^m \mathbb {H}[X_i] & \leq \sum _{i,j=1}^m d[X_{i,j}; X_{i,j}] + m k \\ & = m\sum _{i = 1}^m d[X_i; X_i] + mk, \end{align*}

where in the second step we used the permutation hypothesis. Combining this with 20 gives the

\[ {\mathcal I} \leq 2\eta m \biggl( \sum _{i=1}^m d[X_i;X_i] \biggr). \]

The claim 14 is now immediate from Lemma 12.18.

12.7 Endgame

Now let \(m \geq 2\), let \(G\) be an \(m\)-torsion abelian group, and let \((X_i)_{1 \leq i \leq m}\) be a \(\tau \)-minimizer.

Definition 12.38 Additional random variables
#

By a slight abuse of notation, we identify \(\mathbb {Z}/m\mathbb {Z}\) and \(\{ 1,\dots ,m\} \) in the obvious way, and let \(Y_{i,j}\) be an independent copy of \(X_i\) for \(i,j \in \mathbb {Z}/m\mathbb {Z}\). Then also define:

\[ W := \sum _{i,j \in \mathbb {Z}/m\mathbb {Z}} Y_{i,j} \]

and

\[ Z_1 := \sum _{i,j \in \mathbb {Z}/m\mathbb {Z}} i Y_{i,j},\ \ Z_2 := \sum _{i,j \in \mathbb {Z}/m\mathbb {Z}} j Y_{i,j},\ \ Z_3 := \sum _{i,j \in \mathbb {Z}/m\mathbb {Z}} (-i-j) Y_{i,j}. \]

The addition \((-i-j)\) takes place over \(\mathbb {Z}/m\mathbb {Z}\). Note that, because we are assuming \(G\) is \(m\)-torsion, it is well-defined to multiply elements of \(G\) by elements of \(\mathbb {Z}/m\mathbb {Z}\). We will also define for \(i,j,r \in \mathbb {Z}/m\mathbb {Z}\) the variables

\begin{equation} \label{pqr-defs} P_i := \sum _{j \in \mathbb {Z}/m\mathbb {Z}} Y_{i,j} , \quad Q_j := \sum _{i \in \mathbb {Z}/m\mathbb {Z}} Y_{i,j} , \quad R_r := \sum _{\substack {i,j \in \mathbb {Z}/m\mathbb {Z}\\ i+j=-r}} Y_{i,j} .\end{equation}
21

Lemma 12.39 Zero-sum
#

We have

\begin{equation} \label{eq:sum-zero} Z_1+Z_2+Z_3= 0 \end{equation}
22

Proof

Clear from definition.

Proposition 12.40 Mutual information bound

We have

\[ \mathbb {I}[Z_1 : Z_2\, |\, W], \mathbb {I}[Z_2 : Z_3\, |\, W], \mathbb {I}[Z_1 : Z_3\, |\, W] \leq t \]

where

\begin{equation} \label{t-def} t := 4m^2 \eta k. \end{equation}
23

Proof

We analyze these variables by Proposition 12.37 in several different ways. In the first application, take \(X_{i,j}=Y_{i,j}\). Note that each column \((X_{i,j})_{i=1}^m\) is indeed a permutation of \(X_1,\dots ,X_m\); in fact, the trivial permutation. Note also that for each \(i \in \mathbb {Z}/m\mathbb {Z}\), the row sum is

\[ \sum _{j=1}^m X_{i,j} = \sum _{j \in \mathbb {Z}/m\mathbb {Z}} Y_{i,j} = P_i \]

and for each \(j \in \mathbb {Z}/m\mathbb {Z}\), the column sum is

\[ \sum _{i=1}^m X_{i,j} = \sum _{i \in \mathbb {Z}/m\mathbb {Z}} Y_{i,j} = Q_j. \]

Finally note that \(\sum _{i,j=1}^m X_{i,j} = W\). From Proposition 12.37 we then have

\[ \mathbb {I}[ (P_i)_{i \in \mathbb {Z}/m\mathbb {Z}} : (Q_j)_{j \in \mathbb {Z}/m\mathbb {Z}} \, \big|\, W ] \leq t, \]

with \(t\) as in 23. Since \(Z_1\) is a function of \((P_i)_{i \in \mathbb {Z}/m\mathbb {Z}}\) by 21, and similarly \(Z_2\) is a function of \((Q_j)_{j \in \mathbb {Z}/m\mathbb {Z}}\), it follows immediately from Lemma 12.4 that

\[ \mathbb {I}[ Z_1 : Z_2 \, |\, W ] \leq t. \]

In the second application of Proposition 12.37, we instead consider \(X'_{i,j} = Y_{i-j,j}\). Again, for each fixed \(j\), the tuple \((X'_{i,j})_{i=1}^m\) is a permutation of \(X_1,\dots ,X_m\). This time the row sums for \(i \in \{ 1,\dots , m\} \) are

\[ \sum _{j=1}^m X'_{i,j} = \sum _{j \in \mathbb {Z}/m\mathbb {Z}} Y_{i-j,j} = R_{-i}. \]

Similarly, the column sums for \(j \in \{ 1,\dots , m\} \) are

\[ \sum _{i=1}^m X'_{i,j} = \sum _{i \in \mathbb {Z}/m\mathbb {Z}} Y_{i-j,j} = Q_j. \]

As before, \(\sum _{i,j=1}^m X'_{i,j} = W\). Hence, using 21 and Lemma 12.4 again, Proposition 12.37 tells us

\[ \mathbb {I}[ Z_3 : Z_2 \, |\, W] \leq \mathbb {I}[ (R_i)_{i \in \mathbb {Z}/m\mathbb {Z}} : (Q_j)_{j \in \mathbb {Z}/m\mathbb {Z}} \, \big|\, W ] \leq t. \]

In the third application 1 of Proposition 12.37, take \(X''_{i,j} = Y_{i,j-i}\). The column and row sums are respectively

\[ \sum _{j=1}^m X''_{i,j} = \sum _{j \in \mathbb {Z}/m\mathbb {Z}} Y_{i,j-i} = P_i \]

and

\[ \sum _{i=1}^m X''_{i,j} = \sum _{i \in \mathbb {Z}/m\mathbb {Z}} Y_{i,j-i} = R_{-j}. \]

Hence, Proposition 12.37 and Lemma 12.4 give

\[ \mathbb {I}[ Z_1 : Z_3 \, |\, W] \leq \mathbb {I}[ (P_i)_{i \in \mathbb {Z}/m\mathbb {Z}} : (R_j)_{j \in \mathbb {Z}/m\mathbb {Z}} \, \big|\, W ] \leq t, \]

which completes the proof.

Lemma 12.41 Entropy of \(W\)

We have \(\mathbb {H}[W] \leq (2m-1)k + \frac1m \sum _{i=1}^m \mathbb {H}[X_i]\).

Proof

Without loss of generality, we may take \(X_1,\dots ,X_m\) to be independent. Write \(S = \sum _{i=1}^m X_i\). Note that for each \(j \in \mathbb {Z}/m\mathbb {Z}\), the sum \(Q_j\) from 21 above has the same distribution as \(S\). By Lemma 12.6 we have

\begin{align*} \mathbb {H}[W] = \mathbb {H}[\sum _{j \in \mathbb {Z}/m\mathbb {Z}} Q_j] & \leq \mathbb {H}[S] + \sum _{j=2}^m (\mathbb {H}[Q_1+Q_j] - \mathbb {H}[S]) \\ & = \mathbb {H}[S] + (m-1) d[S;-S]. \end{align*}

By Lemma 12.20, we have

\begin{equation} \label{eq:s-bound} d[S; -S] \leq 2 k \end{equation}
24

and hence

\[ \mathbb {H}[W] \leq 2 k (m-1) + \mathbb {H}[S]. \]

From Definition 12.12 we have

\begin{equation} \label{eq:ent-s} \mathbb {H}[S] = k + \frac1m \sum _{i=1}^m \mathbb {H}[X_i], \end{equation}
25

and the claim follows.

Lemma 12.42 Entropy of \(Z_2\)

We have \(\mathbb {H}[Z_2] \leq (8m^2-16m+1) k + \frac{1}{m} \sum _{i=1}^m \mathbb {H}[X_i]\).

Proof

We observe

\[ \mathbb {H}[Z_2] = \mathbb {H}[\sum _{j \in \mathbb {Z}/m\mathbb {Z}} j Q_j]. \]

Applying Lemma 12.6 one has

\begin{align*} \mathbb {H}[Z_2] & \leq \sum _{i=2}^{m-1} \mathbb {H}[Q_1 + i Q_i] - (m-2) \mathbb {H}[S]. \end{align*}

Using Lemma 12.11 and 24 we get

\begin{align*} \mathbb {H}[Z_2] & \leq \mathbb {H}[S] + 4m (m-2) d[S;-S] \\ & \leq \mathbb {H}[S] + 8m (m-2) k. \end{align*}

Applying 25 gives the claim.

Lemma 12.43 Mutual information bound
#

We have \(\mathbb {I}[W : Z_2] \leq 2 (m-1) k\).

Proof

From Lemma 2.16 we have \(\mathbb {I}[W : Z_2] = \mathbb {H}[W] - \mathbb {H}[W | Z_2]\), and since \(Z_2 = \sum _{j=1}^{m-1} j Q_j\) and \(W = \sum _{j=1}^m Q_j\),

\[ \mathbb {H}[W | Z_2] \geq \mathbb {H}[W \, |\, Q_1,\dots ,Q_{m-1}] = \mathbb {H}[Q_m] = \mathbb {H}[S]. \]

Hence, by Lemma 12.41,

\[ \mathbb {I}[W : Z_2] \leq \mathbb {H}[W] - \mathbb {H}[S] \leq 2 (m-1) k, \]

as claimed.

Lemma 12.44 Distance bound
#

We have \(\sum _{i=1}^m d[X_i;Z_2|W] \leq 8(m^3-m^2) k\).

Proof

For each \(i \in \{ 1,\dots , m\} \), using Lemma 12.8 (noting the sum \(Z_2\) contains \(X_i\) as a summand) we have

\begin{equation} \label{in-a-bit-6} d[X_i;Z_2] \leq d[X_i;X_i] + \tfrac 12 (\mathbb {H}[Z_2] - \mathbb {H}[X_i]) \end{equation}
26

and using Lemma 3.24 we have

\[ d[X_i;Z_2 | W] \leq d[X_i;Z_2] + \tfrac 12 \mathbb {I}[W : Z_2]. \]

Combining with 26 and Lemma 12.43 gives

\[ d[X_i;Z_2 | W] \leq d[X_i;X_i] + \tfrac 12 (\mathbb {H}[Z_2] - \mathbb {H}[X_i]) + (m-1)k. \]

Summing over \(i\) and applying Lemma 12.42 gives

\[ \sum _{i = 1}^m d[X_i;Z_2 | W] \leq \sum _{i = 1}^m d[X_i;X_i] + m(8m^2-16m+1) k + m(m-1) k. \]

Finally, applying Lemma 12.18 (and dropping some lower order terms) gives the claim.

Lemma 12.45 Application of BSG
#

Let \(G\) be an abelian group, let \((T_1,T_2,T_3)\) be a \(G^3\)-valued random variable such that \(T_1+T_2+T_3=0\) holds identically, and write

\[ \delta := \mathbb {I}[T_1 : T_2] + \mathbb {I}[T_1 : T_3] + \mathbb {I}[T_2 : T_3]. \]

Let \(Y_1,\dots ,Y_n\) be some further \(G\)-valued random variables and let \(\alpha {\gt}0\) be a constant. Then there exists a random variable \(U\) such that

\begin{equation} \label{eq:get-better} d[U;U] + \alpha \sum _{i=1}^n d[Y_i;U] \leq \Bigl(2 + \frac{\alpha n}{2} \Bigr) \delta + \alpha \sum _{i=1}^n d[Y_i;T_2]. \end{equation}
27

Proof

We apply Lemma 3.23 with \(X=T_1\) and \(Y=T_2\). Since \(T_1+T_2=-T_3\), we find that

\begin{align} \nonumber \sum _{z} p_{T_3}(z) & d[T_1 \, |\, T_3 \mathop{=} z;T_2 \, |\, T_3 \mathop{=} z] \\ \nonumber & \leq 3 \mathbb {I}[T_1 : T_2] + 2 \mathbb {H}[T_3] - \mathbb {H}[T_1] - \mathbb {H}[T_2] \\ & \ = \mathbb {I}[T_1 : T_2] + \mathbb {I}[T_1 : T_3] + \mathbb {I}[T_2 : T_3] = \delta ,\label{514a} \end{align}

where the last line follows from Lemma 2.2 by observing

\[ \mathbb {H}[T_1,T_2] = \mathbb {H}[T_1,T_3] = \mathbb {H}[T_2,T_3] = \mathbb {H}[T_1,T_2,T_3] \]

since any two of \(T_1,T_2,T_3\) determine the third.

By 28 and the triangle inequality,

\[ \sum _z p_{T_3}(z) d[T_2 \, |\, T_3 \mathop{=} z; T_2 \, |\, T_3\mathop{=}z] \leq 2 \delta \]

and by Lemma 3.25, for each \(Y_i\),

\begin{align*} & \sum _z p_{T_3}(z) d[Y_i; T_2 \, |\, T_3 \mathop{=} z] \\ & \qquad = d[Y_i; T_2 \, |\, T_3] \leq d[Y_i;T_2] + \frac12 \mathbb {I}[T_2 : T_3] \leq d[Y_i;T_2] + \frac{\delta }{2}. \end{align*}

Hence,

\begin{align*} & \sum _z p_{T_3}(z) \bigg( d[T_2 \, |\, T_3 \mathop{=} z; T_2 \, |\, T_3 \mathop{=} z] + \alpha \sum _{i=1}^n d[Y_i;T_2 \, |\, T_3 \mathop{=} z] \bigg) \\ & \qquad \leq \Bigl(2 + \frac{\alpha n}{2} \Bigr) \delta + \alpha \sum _{i=1}^n d[Y_i; T_2], \end{align*}

and the result follows by setting \(U=(T_2 \, |\, T_3 \mathop{=} z)\) for some \(z\) such that the quantity in parentheses on the left-hand side is at most the weighted average value.

Proposition 12.46 Vanishing entropy

We have \(k = 0\).

Proof

For each value \(W=w\), apply Lemma 12.45 (and Lemma 12.39) to

\[ T_1 = (Z_1 \, |\, W \mathop{=} w), T_2 = (Z_2 \, |\, W \mathop{=} w), T_3 = (Z_3 \, |\, W \mathop{=} w) \]

with \(Y_i=X_i\) and \(\alpha =\eta /m\). Write

\[ \delta _w := \mathbb {I}[T_1 : T_2] + \mathbb {I}[T_1 : T_3] + \mathbb {I}[T_2 : T_3] \]

for this choice, and note that

\begin{align} \label{eq:delta-w} \nonumber \delta _\ast := \sum _w p_W(w) \delta _w & = \mathbb {I}[Z_1 : Z_2 \, |\, W] + \mathbb {I}[Z_1 : Z_3 \, |\, W] + \mathbb {I}[Z_2 : Z_3 \, |\, W] \\ & \leq 12 m^2 \eta k \end{align}

by Proposition 12.40. Write \(U_w\) for the random variable guaranteed to exist by Lemma 12.45, so that 27 gives

\begin{equation} \label{eq:uw1} d[U_w; U_w] \leq \Bigl( 2 + \frac{\alpha m}{2} \Bigr) \delta _w + \alpha \sum _{i=1}^m \bigl( d[X_i; T_2] - d[X_i; U_w] \bigr). \end{equation}
30

Let \((U_w)_I\) denote the tuple consisting of the same variable \(U_w\) repeated \(m\) times. By Lemma 12.19

\begin{equation} \label{eq:uw2} D[(U_w)_I] \leq m d[U_w; U_w]. \end{equation}
31

On the other hand, from Lemma 12.27 one has

\begin{equation} \label{eq:uw3} D[(U_w)_I] \geq k - \eta \sum _{i=1}^m d[X_i;U_w]. \end{equation}
32

Combining 3031 and 32 and averaging over \(w\) (with weight \(p_W(w)\)), and recalling the value \(\alpha =\eta /m\), gives

\[ m \Bigl( 2 + \frac{\eta }{2} \Bigr) \delta _\ast + \eta \sum _{i=1}^m d[X_i; Z_2 | W] \geq k \]

since the terms \(d[X_i; U_w]\) cancel by our choice of \(\alpha \). Substituting in Lemma 12.44 and 29, and using the fact that \(2 + \frac{\eta }{2} {\lt} 3\), we have

\[ 12 m^3 (2+\frac{\eta }{2}) \eta k + \eta 8(m^3-m^2) k \geq k. \]

From Definition 12.22 we have we have

\[ 12 m^3 (2+\frac{\eta }{2}) \eta + \eta 8(m^3-m^2) {\lt} 1 \]

and hence \(k \leq 0\). The claim now follows from Lemma 12.15.

12.8 Wrapping up

Theorem 12.47 Entropy form of PFR
#

Suppose that \(G\) is a finite abelian group of torsion \(m\). Suppose that \(X\) is a \(G\)-valued random variable. Then there exists a subgroup \(H \leq G\) such that

\[ d[X;U_H] \leq 64 m^3 d[X;X]. \]
Proof

Set \(X^0 := X\). By Proposition 12.25, there exists a \(\tau \)-minimizer \(X_{[m]} = (X_i)_{1 \leq i \leq m}\). By Proposition 12.46, we have \(D[X_{[m]}]=0\). By Proposition 12.26 and the pigeonhole principle, there exists \(1 \leq i \leq m\) such that \(d[X_i; X] \leq \frac{2}{\eta } d[X;X]\). By Proposition 12.21, we have \(d[X_i;U_H]=0\) for some subgroup \(H \leq G\), hence by Lemma 3.18 we have \(d[U_H; X] \leq \frac{2}{\eta } d[X;X]\). The claim then follows from Definition 12.22.

Lemma 12.48
#

Suppose that \(G\) is a finite abelian group of torsion \(m\). If \(A \subset G\) is non-empty and \(|A+A| \leq K|A|\), then \(A\) can be covered by at most \(K ^{(64m^3+2)/2}|A|^{1/2}/|H|^{1/2}\) translates of a subspace \(H\) of \(G\) with

\begin{equation} \label{ah2} |H|/|A| \in [K^{-64m^3}, K^{64m^3}]. \end{equation}
33

Proof

Repeat the proof of Lemma 7.2, but with Theorem 12.47 in place of Theorem 6.24.

Theorem 12.49 PFR
#

Suppose that \(G\) is a finite abelian group of torsion \(m\). If \(A \subset G\) is non-empty and \(|A+A| \leq K|A|\), then \(A\) can be covered by most \(mK^{96m^3+2}\) translates of a subspace \(H\) of \(G\) with \(|H| \leq |A|\).

Proof

Repeat the proof of Theorem 7.3, but with Lemma 12.48 in place of Lemma 7.2.

  1. In fact, by permuting the variables \((Y_{i,j})_{i,j \in \mathbb {Z}/m\mathbb {Z}}\), one can see that the random variables \((W, Z_1, Z_2)\) and \((W, Z_1, Z_3)\) have the same distribution, so this is in some sense identical to – and can be deduced from – the first application.