6 Entropy version of PFR
Throughout this chapter, \(G = \mathbb {F}_2^n\), and \(X^0_1, X^0_2\) are \(G\)-valued random variables.
Definition
6.2
\(\tau \) functional
If \(X_1,X_2\) are two \(G\)-valued random variables, then
\[ \tau [X_1; X_2] := d[X_1; X_2] + \eta d[X^0_1; X_1] + \eta d[X^0_2; X_2]. \]
Lemma
6.3
\(\tau \) depends only on distribution
If \(X'_1, X'_2\) are copies of \(X_1,X_2\), then \(\tau [X'_1;X'_2] = \tau [X_1;X_2]\).
Definition
6.4
\(\tau \)-minimizer
A pair of \(G\)-valued random variables \(X_1, X_2\) are said to be a \(\tau \)-minimizer if one has
\[ \tau [X_1;X_2] \leq \tau [X'_1;X'_2] \]
for all \(G\)-valued random variables \(X'_1, X'_2\).
Proposition
6.5
\(\tau \) has minimum
A pair \(X_1, X_2\) of \(\tau \)-minimizers exist.
Proof
▶
By Lemma 6.3, \(\tau \) only depends on the probability distributions of \(X_1, X_2\). This ranges over a compact space, and \(\tau \) is continuous. So \(\tau \) has a minimum.
6.1 Basic facts about minimizers
In this section we assume that \(X_1,X_2\) are \(\tau \)-minimizers. We also write \(k := d[X_1;X_2]\).
Lemma
6.6
Distance lower bound
For any \(G\)-valued random variables \(X'_1,X'_2\), one has
\[ d[X'_1;X'_2] \geq k - \eta (d[X^0_1;X'_1] - d[X^0_1;X_1] ) - \eta (d[X^0_2;X'_2] - d[X^0_2;X_2] ). \]
Lemma
6.7
Conditional distance lower bound
For any \(G\)-valued random variables \(X'_1,X'_2\) and random variables \(Z,W\), one has
\[ d[X'_1|Z;X'_2|W] \geq k - \eta (d[X^0_1;X'_1|Z] - d[X^0_1;X_1] ) - \eta (d[X^0_2;X'_2|W] - d[X^0_2;X_2] ). \]
Proof
▶
Apply Lemma 6.6 to conditioned random variables and then average.
6.2 First estimate
We continue the assumptions from the preceding section.
Let \(X_1, X_2, \tilde X_1, \tilde X_2\) be independent random variables, with \(X_1,\tilde X_1\) copies of \(X_1\) and \(X_2,\tilde X_2\) copies of \(X_2\). (This is possible thanks to Lemma 3.7.)
We also define the quantity
\[ I_1 := I [X_1+X_2 : \tilde X_1 + X_2 | X_1+X_2+\tilde X_1+\tilde X_2]. \]
Lemma
6.8
Fibring identity for first estimate
We have
\begin{align*} & d[X_1+\tilde X_2;X_2+\tilde X_1] + d[X_1|X_1+\tilde X_2; X_2|X_2+\tilde X_1] \\ & \quad + \mathbb {I}[X_1+ X_2 : \tilde X_1 + X_2 \, |\, X_1 + X_2 + \tilde X_1 + \tilde X_2] = 2k. \end{align*}
Lemma
6.9
Lower bound on distances
We have
\begin{align*} d[X_1+\tilde X_2; X_2+\tilde X_1] \geq k & - \eta (d[X^0_1; X_1+\tilde X_2] - d[X^0_1; X_1]) \\ & \qquad - \eta (d[X^0_2; X_2+\tilde X_1] - d[X^0_2; X_2]) \end{align*}
Lemma
6.10
Lower bound on conditional distances
We have
\begin{align*} & d[X_1|X_1+\tilde X_2; X_2|X_2+\tilde X_1] \\ & \qquad \quad \geq k - \eta (d[X^0_1; X_1 | X_1 + \tilde X_2] - d[X^0_1; X_1]) \\ & \qquad \qquad \qquad \qquad - \eta (d[X^0_2; X_2 | X_2 + \tilde X_1] - d[X^0_2; X_2]). \end{align*}
Lemma
6.11
Upper bound on distance differences
We have
\begin{align*} d[X^0_1; X_1+\tilde X_2] - d[X^0_1; X_1] & \leq \tfrac {1}{2} k + \tfrac {1}{4} \mathbb {H}[X_2] - \tfrac {1}{4} \mathbb {H}[X_1]\\ d[X^0_2;X_2+\tilde X_1] - d[X^0_2; X_2] & \leq \tfrac {1}{2} k + \tfrac {1}{4} \mathbb {H}[X_1] - \tfrac {1}{4} \mathbb {H}[X_2], \\ d[X_1^0;X_1|X_1+\tilde X_2] - d[X_1^0;X_1] & \leq \tfrac {1}{2} k + \tfrac {1}{4} \mathbb {H}[X_1] - \tfrac {1}{4} \mathbb {H}[X_2] \\ d[X_2^0; X_2|X_2+\tilde X_1] - d[X_2^0; X_2] & \leq \tfrac {1}{2}k + \tfrac {1}{4} \mathbb {H}[X_2] - \tfrac {1}{4} \mathbb {H}[X_1]. \end{align*}
Proof
▶
Immediate from Lemma 3.25 (and recalling that \(k\) is defined to be \(d[X_1;X_2]\)).
Lemma
6.12
First estimate
We have \(I_1 \leq 2 \eta k\).
One can also extract the following useful inequality from the proof of the above lemma.
Lemma
6.13
Entropy bound on quadruple sum
With the same notation, we have
\begin{equation} \label{HS-bound} \mathbb {H}[X_1+X_2+\tilde X_1+\tilde X_2] \le \tfrac {1}{2} \mathbb {H}[X_1]+\tfrac {1}{2} \mathbb {H}[X_2] + (2 + \eta ) k - I_1. \end{equation}
1
Proof
▶
Subtracting Lemma 6.10 from Lemma 6.8, and combining the resulting inequality with Lemma 6.11 gives the bound
\[ d[X_1+\tilde X_2;X_2+\tilde X_1] \le (1 + \eta ) k - I_1, \]
and the claim follows from Lemma 3.11 and the definition of \(k\).
6.3 Second estimate
We continue the assumptions from the preceding section. We introduce the quantity
\[ I_2 := \mathbb {I}[X_1+X_2 : X_1 + \tilde X_1 | X_1+X_2+\tilde X_1+\tilde X_2]. \]
Lemma
6.14
Distance between sums
We have
\[ d[X_1+\tilde X_1; X_2+\tilde X_2] \geq k - \frac{\eta }{2} ( d[X_1; X_1] + d[X_2;X_2] ). \]
Proof
▶
From Lemma 6.6 one has
\begin{align*} d[X_1+\tilde X_1; X_2+\tilde X_2] \geq k & - \eta (d[X^0_1;X_1] - d[X^0_1;X_1+\tilde X_1]) \\ & - \eta (d[X^0_2;X_2] - d[X^0_2;X_2+\tilde X_2]). \end{align*}
Now Lemma 3.25 gives
\[ d[X^0_1;X_1+\tilde X_1] - d[X^0_1;X_1] \leq \tfrac {1}{2} d[X_1;X_1] \]
and
\[ d[X^0_2;X_2+\tilde X_2] - d[X^0_2;X_2] \leq \tfrac {1}{2} d[X_2;X_2], \]
and the claim follows.
Lemma
6.15
We have
\[ d[X_1;X_1] + d[X_2;X_2] \leq 2 k + \frac{2(2 \eta k - I_1)}{1-\eta }. \]
Proof
▶
We may use Lemma 3.11 to expand
\begin{align*} & d[X_1+\tilde X_1;X_2+\tilde X_2] \\ & = \mathbb {H}[X_1+\tilde X_1 + X_2 + \tilde X_2] - \tfrac {1}{2} \mathbb {H}[X_1+\tilde X_1] - \tfrac {1}{2} \mathbb {H}[X_2+\tilde X_2] \\ & = \mathbb {H}[X_1+\tilde X_1 + X_2 + \tilde X_2] - \tfrac {1}{2} \mathbb {H}[X_1] - \tfrac {1}{2} \mathbb {H}[X_2] \\ & \qquad \qquad \qquad - \tfrac {1}{2} \left( d[X_1;X_1] + d[X_2; X_2] \right), \end{align*}
and hence by Lemma 6.13
\[ d[X_1+\tilde X_1; X_2+\tilde X_2] \leq (2+\eta ) k - \tfrac {1}{2} \left( d[X_1;X_1] + d[X_2;X_2] \right) - I_1. \]
Combining this bound with Lemma 6.14 we obtain the result.
Lemma
6.16
Second estimate
We have
\[ I_2 \leq 2 \eta k + \frac{2 \eta (2 \eta k - I_1)}{1 - \eta }. \]
Proof
▶
We apply Corollary 5.3, but now with the choice
\[ (Y_1,Y_2,Y_3,Y_4) := (X_2, X_1, \tilde X_2, \tilde X_1). \]
Now Corollary 5.3 can be rewritten as
\begin{align*} & d[X_1+\tilde X_1;X_2+\tilde X_2] + d[X_1|X_1+\tilde X_1; X_2|X_2+\tilde X_2] \\ & \quad + \mathbb {I}[X_1+X_2 : X_1 + \tilde X_1 \, |\, X_1+X_2+\tilde X_1+\tilde X_2] = 2k, \end{align*}
recalling once again that \(k := d[X_1;X_2]\). From Lemma 6.7 one has
\begin{align*} d[X_1|X_1+\tilde X_1; X_2|X_2+\tilde X_2] \geq k & - \eta (d[X^0_1;X_1] - d[X^0_1;X_1|X_1+\tilde X_1]) \\ & - \eta (d[X^0_2;X_2] - d[X^0_2;X_2|X_2+\tilde X_2]) . \end{align*}
while from Lemma 3.25 we have
\[ d[X^0_1;X_1|X_1+\tilde X_1] - d[X^0_1;X_1] \leq \tfrac {1}{2} d[X_1;X_1], \]
and
\[ d[X^0_2;X_2|X_2+\tilde X_2] - d[X^0_2;X_2] \leq \tfrac {1}{2} d[X_1;X_2]. \]
Combining all these inequalities with Lemma 6.14, we have
\begin{equation} \label{combined} \mathbb {I}[X_1+X_2 : X_1 + \tilde X_1 | X_1+X_2+\tilde X_1+\tilde X_2] \leq \eta ( d[X_1; X_1] + d[X_2; X_2] ). \end{equation}
2
Together with Lemma 6.15, this gives the conclusion.
6.4 Endgame
Let \(X_1,X_2,\tilde X_1,\tilde X_2\) be as before, and introduce the random variables
\[ U := X_1 + X_2, \qquad V := \tilde X_1 + X_2, \qquad W := X_1 + \tilde X_1 \]
and
\[ S := X_1 + X_2 + \tilde X_1 + \tilde X_2. \]
Lemma
6.17
Symmetry identity
We have
\[ I(U:W | S) = I(V:W | S). \]
Lemma
6.18
Bound on conditional mutual informations
We have
\[ I(U : V \, | \, S) + I(V : W \, | \, S) + I(W : U \, | \, S) \leq 6 \eta k - \frac{1 - 5 \eta }{1-\eta } (2 \eta k - I_1). \]
Proof
▶
From the definitions of \(I_1,I_2\) and Lemma 6.17, we see that
\[ I_1 = I(U : V \, | \, S), \qquad I_2 = I(W : U \, | \, S), \qquad I_2 = I(V : W \, | \, S). \]
Applying Lemma 6.12 and Lemma 6.16 we have the inequalities
\[ I_2 \leq 2 \eta k + \frac{2\eta (2 \eta k - I_1)}{1-\eta } . \]
We conclude that
\[ I_1 + I_2 + I_2 \leq I_1+4\eta k+ \frac{4\eta (2 \eta k - I_1)}{1-\eta } \]
and the claim follows from some calculation.
Lemma
6.19
Bound on distance increments
We have
\begin{align*} \sum _{i=1}^2 \sum _{A\in \{ U,V,W\} } \big(d[X^0_i;A|S] & - d[X^0_i;X_i]\big) \\ & \leq (6 - 3\eta ) k + 3(2 \eta k - I_1). \end{align*}
Proof
▶
By Lemma 3.26 (taking \(X = X_1^0\), \(Y = X_1\), \(Z = X_2\) and \(Z' = \tilde X_1 + \tilde X_2\), so that \(Y + Z = U\) and \(Y + Z + Z' = S\)) we have, noting that \(\mathbb {H}[Y+ Z] = \mathbb {H}[Z']\),
\[ d[X^0_1;U|S] - d[X^0_1;X_1] \leq \tfrac {1}{2} (\mathbb {H}[S] - \mathbb {H}[X_1]). \]
Further applications of Lemma 3.26 give
\begin{align*} d[X^0_2;U|S] - d[X^0_2; X_2] & \leq \tfrac {1}{2} (\mathbb {H}[S] - \mathbb {H}[X_2]) \\ d[X^0_1;V|S] - d[X^0_1;X_1] & \leq \tfrac {1}{2} (\mathbb {H}[S] - \mathbb {H}[X_1])\\ d[X^0_2;V|S] - d[X^0_2;X_2] & \leq \tfrac {1}{2} (\mathbb {H}[S] - \mathbb {H}[X_2]) \end{align*}
and
\[ d[X^0_1;W|S] - d[X^0_1;X_1] \leq \tfrac {1}{2} (\mathbb {H}[S] + \mathbb {H}[W] - \mathbb {H}[X_1] - \mathbb {H}[W']), \]
where \(W' := X_2 + \tilde X_2\). To treat \(d[X^0_2;W|S]\), first note that this equals \(d[X^0_2;W'|S]\), since for a fixed choice \(s\) of \(S\) we have \(W' = W + s\) (here we need some helper lemma about Ruzsa distance). Now we may apply Lemma 3.26 to obtain
\[ d[X^0_2;W'|S] - d[X^0_2;X_2] \leq \tfrac {1}{2} (\mathbb {H}[S] + \mathbb {H}[W'] - \mathbb {H}[X_2] - \mathbb {H}[W]). \]
Summing these six estimates and using Lemma 6.13, we conclude that
\begin{align*} \sum _{i=1}^2 \sum _{A\in \{ U,V,W\} } \big(d[X^0_i;A|S] & - d[X^0_i;X_i]\big) \\ & \leq 3 \mathbb {H}[S] - \tfrac {3}{2} \mathbb {H}[X_1] - \tfrac {3}{2} \mathbb {H}[X_2]\\ & \leq (6 - 3\eta ) k + 3(2 \eta k - I_1) \end{align*}
as required.
Proof
▶
Obvious because we are in characteristic two.
For the next two lemmas, let \((T_1,T_2,T_3)\) be a \(G^3\)-valued random variable such that \(T_1+T_2+T_3=0\) holds identically. Set
\begin{equation} \label{delta-t1t2t3-def} \delta := \sum _{1 \leq i {\lt} j \leq 3} \mathbb {I}[T_i;T_j]. \end{equation}
3
Lemma
6.21
Constructing good variables, I
One has
\begin{align*} k \leq \delta + \eta (& d[X^0_1;T_1]-d[X^0_1;X_1]) + \eta (d[X^0_2;T_2]-d[X^0_2;X_2]) \\ & + \tfrac 12 \eta \mathbb {I}[T_1:T_3] + \tfrac 12 \eta \mathbb {I}[T_2:T_3]. \end{align*}
(Note: in the paper, this lemma was phrased in a more intuitive formulation that is basically the contrapositive of the one here. Similarly for the next two lemmas.)
Proof
▶
We apply Lemma 3.23 with \((A,B) = (T_1, T_2)\) there. Since \(T_1 + T_2 = T_3\), the conclusion is that
\begin{align} \nonumber \sum _{t_3} \mathbb {P}[T_3 = t_3] & d[(T_1 | T_3 = t_3); (T_2 | T_3 = t_3)] \\ & \leq 3 \mathbb {I}[T_1 : T_2] + 2 \mathbb {H}[T_3] - \mathbb {H}[T_1] - \mathbb {H}[T_2].\label{bsg-t1t2} \end{align}
The right-hand side in 4 can be rearranged as
\begin{align*} & 2( \mathbb {H}[T_1] + \mathbb {H}[T_2] + \mathbb {H}[T_3]) - 3 \mathbb {H}[T_1,T_2] \\ & = 2(\mathbb {H}[T_1] + \mathbb {H}[T_2] + \mathbb {H}[T_3]) - \mathbb {H}[T_1,T_2] - \mathbb {H}[T_2,T_3] - \mathbb {H}[T_1, T_3] = \delta ,\end{align*}
using the fact (from Lemma 2.2) that all three terms \(\mathbb {H}[T_i,T_j]\) are equal to \(\mathbb {H}[T_1,T_2,T_3]\) and hence to each other. We also have
\begin{align*} & \sum _{t_3} P[T_3 = t_3] \bigl(d[X^0_1; (T_1 | T_3=t_3)] - d[X^0_1;X_1]\bigr) \\ & \quad = d[X^0_1; T_1 | T_3] - d[X^0_1;X_1] \leq d[X^0_1;T_1] - d[X^0_1;X_1] + \tfrac {1}{2} \mathbb {I}[T_1 : T_3] \end{align*}
by Lemma 3.24, and similarly
\begin{align*} & \sum _{t_3} \mathbb {P}[T_3 = t_3] (d[X^0_2;(T_2 | T_3=t_3)] - d[X^0_2; X_2]) \\ & \quad \quad \quad \quad \quad \quad \leq d[X^0_2;T_2] - d[X^0_2;X_2] + \tfrac {1}{2} \mathbb {I}[T_2 : T_3]. \end{align*}
Putting the above observations together, we have
\begin{align*} \sum _{t_3} \mathbb {P}[T_3=t_3] \psi [(T_1 | T_3=t_3); (T_2 | T_3=t_3)] \leq \delta + \eta (d[X^0_1;T_1]-d[X^0_1;X_1]) \\ + \eta (d[X^0_2;T_2]-d[X^0_2;X_2]) + \tfrac 12 \eta \mathbb {I}[T_1:T_3] + \tfrac 12 \eta \mathbb {I}[T_2:T_3] \end{align*}
where we introduce the notation
\[ \psi [Y_1; Y_2] := d[Y_1;Y_2] + \eta (d[X_1^0;Y_1] - d[X_1^0;X_1]) + \eta (d[X_2^0;Y_2] - d[X_2^0;X_2]). \]
On the other hand, from Lemma 6.6 we have \(k \leq \psi [Y_1;Y_2]\), and the claim follows.
Lemma
6.22
Constructing good variables, II
One has
\begin{align*} k & \leq \delta + \frac{\eta }{3} \biggl( \delta + \sum _{i=1}^2 \sum _{j = 1}^3 (d[X^0_i;T_j] - d[X^0_i; X_i]) \biggr). \end{align*}
Proof
▶
Average Lemma 6.21 over all six permutations of \(T_1,T_2,T_3\).
Theorem
6.23
\(\tau \)-decrement
Let \(X_1, X_2\) be tau-minimizers. Then \(d[X_1;X_2] = 0\).
Proof
▶
Set \(k := d[X_1;X_2]\). Applying Lemma 6.22 with any random variables \((T_1,T_2,T_3)\) such that \(T_1+T_2+T_3=0\) holds identically, we deduce that
\[ k \leq \delta + \frac{\eta }{3} \biggl( \delta + \sum _{i=1}^2 \sum _{j = 1}^3 (d[X^0_1;T_j] - d[X^0_i;X_i]) \biggr). \]
Note that \(\delta \) is still defined by 3 and thus depends on \(T_1,T_2,T_3\). In particular we may apply this for
\[ T_1 = (U | S = s),\qquad T_2 = (V | S = s), \qquad T_3 = (W | S = s) \]
for \(s\) in the range of \(S\) (which is a valid choice by Lemma 6.20) and then average over \(s\) with weights \(p_S(s)\), to obtain
\[ k \leq \tilde\delta + \frac{\eta }{3} \biggl( \tilde\delta + \sum _{i=1}^2 \sum _{A\in \{ U,V,W\} } \bigl( d[X^0_i;A|S] - d[X^0_i;X_i]\bigr) \biggr), \]
where
\[ \tilde\delta := \mathbb {I}[U : V | S] + \mathbb {I}[V : W | S] + \mathbb {I}[W : U | S]. \]
Putting this together with Lemma 6.18 and Lemma 6.19, we conclude that
\begin{align*} k & \leq \Bigl(1+\frac{\eta }{3}\Bigr)\Bigl(6\eta k-\frac{1-5\eta }{1-\eta }(2\eta k-I_1)\Bigr)+\frac{\eta }{3}\Bigl((6-3\eta )k+3(2\eta k-I_1)\Bigr)\\ & = (8\eta + \eta ^2) k - \biggl( \frac{1 - 5 \eta }{1-\eta }\Bigl(1 + \frac{\eta }{3}\Bigr) - \eta \biggr)(2 \eta k - I_1)\\ & \leq (8 \eta + \eta ^2) k \end{align*}
since the quantity \(2 \eta k - I_1\) is non-negative (by Lemma 6.12), and its coefficient in the above expression is non-positive provided that \(\eta (2\eta + 17) \le 3\), which is certainly the case with Definition 6.1. Moreover, from Definition 6.1 we have \(8 \eta + \eta ^2 {\lt} 1\). It follows that \(k=0\), as desired.
6.5 Conclusion
Theorem
6.24
Entropy version of PFR
Let \(G = \mathbb {F}_2^n\), and suppose that \(X^0_1, X^0_2\) are \(G\)-valued random variables. Then there is some subgroup \(H \leq G\) such that
\[ d[X^0_1;U_H] + d[X^0_2;U_H] \le 11 d[X^0_1;X^0_2], \]
where \(U_H\) is uniformly distributed on \(H\). Furthermore, both \(d[X^0_1;U_H]\) and \(d[X^0_2;U_H]\) are at most \(6 d[X^0_1;X^0_2]\).
Proof
▶
Let \(X_1, X_2\) be the \(\tau \)-minimizer from Proposition 6.5. From Theorem 6.23, \(d[X_1;X_2]=0\). From Corollary 4.6, \(d[X_1;U_H] = d[X_2; U_H] = 0\). Also from \(\tau \)-minimization we have \(\tau [X_1;X_2] \leq \tau [X^0_2;X^0_1]\). Using this and the Ruzsa triangle inequality we can conclude.
Note: a “stretch goal” for this project would be to obtain a ‘decidable‘ analogue of this result (see the remark at the end of Section 2 for some related discussion).