8 Improving the exponents
The arguments here are due to Jyun-Jie Liao.
Definition
8.1
New definition of \(\eta \)
\(\eta \) is a real parameter with \(\eta {\gt} 0\).
Previously in Definition 6.1 we had set \(\eta =1/9\). To implement this chapter, one should refactor the previous arguments so that \(\eta \) is now free to be a positive number, though the specific hypothesis \(\eta =1/9\) would now need to be added to Theorem 6.23.
Let \(X^0_1, X^0_2\) be \(G\)-valued random variables, and let \(X_1, X_2\) be \(\tau \)-minimizers as defined in Definition 6.4.
For the next two lemmas, let \((T_1,T_2,T_3)\) be a \(G^3\)-valued random variable such that \(T_1+T_2+T_3=0\) holds identically. Let \(\delta \) be the quantity in 3.
We have the following variant of Lemma 6.21:
Lemma
8.2
Constructing good variables, I’
One has
\begin{align*} k \leq \delta + \eta (& d[X^0_1;T_1|T_3]-d[X^0_1;X_1]) + \eta (d[X^0_2;T_2|T_3]-d[X^0_2;X_2]). \end{align*}
Proof
▶
We apply Lemma 3.23 with \((A,B) = (T_1, T_2)\) there. Since \(T_1 + T_2 = T_3\), the conclusion is that
\begin{align} \nonumber \sum _{t_3} \mathbb {P}[T_3 = t_3] & d[(T_1 | T_3 = t_3); (T_2 | T_3 = t_3)] \\ & \leq 3 \mathbb {I}[T_1 : T_2] + 2 \mathbb {H}[T_3] - \mathbb {H}[T_1] - \mathbb {H}[T_2].\label{bsg-t1t2'} \end{align}
The right-hand side in 1 can be rearranged as
\begin{align*} & 2( \mathbb {H}[T_1] + \mathbb {H}[T_2] + \mathbb {H}[T_3]) - 3 \mathbb {H}[T_1,T_2] \\ & = 2(\mathbb {H}[T_1] + \mathbb {H}[T_2] + \mathbb {H}[T_3]) - \mathbb {H}[T_1,T_2] - \mathbb {H}[T_2,T_3] - \mathbb {H}[T_1, T_3] = \delta ,\end{align*}
using the fact (from Lemma 2.2) that all three terms \(\mathbb {H}[T_i,T_j]\) are equal to \(\mathbb {H}[T_1,T_2,T_3]\) and hence to each other. We also have
\begin{align*} & \sum _{t_3} P[T_3 = t_3] \bigl(d[X^0_1; (T_1 | T_3=t_3)] - d[X^0_1;X_1]\bigr) \\ & \quad = d[X^0_1; T_1 | T_3] - d[X^0_1;X_1] \end{align*}
and similarly
\begin{align*} & \sum _{t_3} \mathbb {P}[T_3 = t_3] (d[X^0_2;(T_2 | T_3=t_3)] - d[X^0_2; X_2]) \\ & \quad \quad \quad \quad \quad \quad \leq d[X^0_2;T_2|T_3] - d[X^0_2;X_2]. \end{align*}
Putting the above observations together, we have
\begin{align*} \sum _{t_3} \mathbb {P}[T_3=t_3] \psi [(T_1 | T_3=t_3); (T_2 | T_3=t_3)] \leq \delta + \eta (d[X^0_1;T_1|T_3]-d[X^0_1;X_1]) \\ + \eta (d[X^0_2;T_2|T_3]-d[X^0_2;X_2]) \end{align*}
where we introduce the notation
\[ \psi [Y_1; Y_2] := d[Y_1;Y_2] + \eta (d[X_1^0;Y_1] - d[X_1^0;X_1]) + \eta (d[X_2^0;Y_2] - d[X_2^0;X_2]). \]
On the other hand, from Lemma 6.6 we have \(k \leq \psi [Y_1;Y_2]\), and the claim follows.
(One could in fact refactor Lemma 6.21 to follow from Lemma 8.2 and Lemma 3.24).
Lemma
8.3
Constructing good variables, II’
One has
\begin{align*} k & \leq \delta + \frac{\eta }{6} \sum _{i=1}^2 \sum _{1 \leq j,l \leq 3; j \neq l} (d[X^0_i;T_j|T_l] - d[X^0_i; X_i]) \end{align*}
Proof
▶
Average Lemma 8.2 over all six permutations of \(T_1,T_2,T_3\).
Now let \(X_1, X_2, \tilde X_1, \tilde X_2\) be independent copies of \(X_1, X_2, X_1, X_2\), and set
\[ U := X_1 + X_2, \qquad V := \tilde X_1 + X_2, \qquad W := X_1 + \tilde X_1 \]
and
\[ S := X_1 + X_2 + \tilde X_1 + \tilde X_2 \]
and introduce the quantities
\[ k = d[X_1;X_2] \]
and
\[ I_1 = I(U : V \, | \, S). \]
Lemma
8.4
Constructing good variables, III’
One has
\begin{align*} k & \leq I(U : V \, | \, S) + I(V : W \, | \, S) + I(W : U \, | \, S) + \frac{\eta }{6} \sum _{i=1}^2 \sum _{A,B \in \{ U,V,W\} : A \neq B} (d[X^0_i;A|B,S] - d[X^0_i; X_i]). \end{align*}
Proof
▶
For each \(s\) in the range of \(S\), apply Lemma 8.3 with \(T_1,T_2,T_3\) equal to \((U|S=s)\), \((V|S=s)\), \((W|S=s)\) respectively (which works thanks to Lemma 6.20), multiply by \(\mathbb {P}[S=s]\), and sum in \(s\) to conclude.
To control the expressions in the right-hand side of this lemma we need a general entropy inequality.
Lemma
8.5
General inequality
Let \(X_1, X_2, X_3, X_4\) be independent \(G\)-valued random variables, and let \(Y\) be another \(G\)-valued random variable. Set \(S := X_1+X_2+X_3+X_4\). Then
\begin{align*} & d[Y; X_1+X_2|X_1 + X_3, S] - d[Y; X_1] \\ & \quad \leq \tfrac {1}{4} (d[X_1;X_2] + 2d[X_1;X_3] + d[X_2;X_4])\\ & \qquad \qquad + \tfrac {1}{4} (d[X_1|X_1+X_3;X_2|X_2+X_4] - d[X_3|X_3+X_4; X_1|X_1+X_2])\\ & \qquad \qquad + \tfrac {1}{8} (\mathbb {H}[X_1+X_2] - \mathbb {H}[X_3+X_4] + \mathbb {H}[X_2] - \mathbb {H}[X_3]\\ & \qquad \qquad \qquad + \mathbb {H}[X_2|X_2+X_4] - \mathbb {H}[X_1|X_1+X_3]). \end{align*}
Proof
▶
On the one hand, by Lemma 3.24 and two applications of Lemma 3.25 we have
\begin{align*} & d[Y;X_1+X_2|X_1 + X_3, S] \\ & \quad \leq d[Y;X_1+X_2|S] + \tfrac {1}{2} \mathbb {I}[X_1 + X_2 : X_1 + X_3|S] \\ & \quad \leq d[Y;X_1+X_2]\\ & \qquad + \tfrac {1}{2} (d[X_1+X_2;X_3+X_4] + \mathbb {I}[X_1 + X_2 : X_1 + X_3|S])\\ & \qquad + \tfrac {1}{4} (\mathbb {H}[X_1+X_2] - \mathbb {H}[X_3+X_4])\\ & \quad \leq d[Y;X_1] \\ & \qquad + \tfrac {1}{2} (d[X_1;X_2] + d[X_1+X_2;X_3+X_4] + \mathbb {I}[X_1 + X_2 : X_1 + X_3|S])\\ & \qquad + \tfrac {1}{4} (\mathbb {H}[X_1+X_2] - \mathbb {H}[X_3+X_4] + \mathbb {H}[X_2] - \mathbb {H}[X_1]). \end{align*}
From Corollary 5.3 (with \(Y_1,Y_2,Y_3,Y_4\) set equal to \(X_3, X_1, X_4, X_2\) respectively) one has
\[ d[X_3+X_4; X_1+X_2] + d[X_3|X_3+X_4; X_1|X_1+X_2] \]
\[ + \mathbb {I}[X_3 + X_1 : X_1 + X_2|S] = d[X_3;X_1] + d[X_4;X_2]. \]
Rearranging the mutual information and Ruzsa distances slightly, we conclude that
\begin{align*} & d[Y;X_1+X_2|X_1 + X_3, S] \\ & \quad \leq d[Y;X_1] \\ & \qquad + \tfrac {1}{2} (d[X_1;X_2] + d[X_1;X_3] + d[X_2;X_4] - d[X_3|X_3+X_4; X_1|X_1+X_2])\\ & \qquad + \tfrac {1}{4} (\mathbb {H}[X_1+X_2] - \mathbb {H}[X_3+X_4] + \mathbb {H}[X_2] - \mathbb {H}[X_1]). \end{align*}
On the other hand, \((X_1+X_2|X_1 + X_3, S)\) has an identical distribution to the independent sum of \((X_1|X_1+X_3)\) and \((X_2|X_2+X_4)\). We may therefore apply Lemma 3.25 to conditioned variables \((X_1|X_1+X_3=s)\) and \((X_2|X_2+X_4=t)\) and average in \(s,t\) to obtain the alternative bound
\begin{align*} & d[Y;X_1+X_2|X_1 + X_3, S] \\ & \quad \leq d[Y;X_1|X_1+X_3] + \tfrac {1}{2} d[X_1|X_1+X_3; X_2|X_2+X_4] \\ & \qquad + \tfrac {1}{4} (\mathbb {H}[X_2|X_2+X_4] - \mathbb {H}[X_1|X_1+X_3]) \\ & \quad \leq d[Y;X_1] \\ & \qquad + \tfrac {1}{2} (d[X_1;X_3] + d[X_1|X_1+X_3;X_2|X_2+X_4])\\ & \qquad + \tfrac {1}{4} (\mathbb {H}[X_2|X_2+X_4] - \mathbb {H}[X_1|X_1+X_3] + \mathbb {H}[X_1] - \mathbb {H}[X_3]). \end{align*}
If one takes the arithmetic mean of these two bounds and simplifies using Corollary 5.3, one obtains the claim.
Returning to our specific situation, we now have
Lemma
8.6
Bound on distance differences
We have
\begin{align*} & \sum _{i=1}^2 \sum _{A,B \in \{ U,V,W\} : A \neq B} d[X_i^0;A|B, S] - d[X_i^0;X_i]\\ & \qquad \leq 12 k + \frac{4(2 \eta k - I_1)}{1-\eta }. \end{align*}
Proof
▶
If we apply Lemma 8.5 with \(X_1:=X_1\), \(Y:=X_1^0\) and \((X_2,X_3,X_4)\) equal to the \(3!\) permutations of \((X_2,\tilde X_1,\tilde X_2)\), and sums (using the symmetry \(\mathbb {H}[X|X+Y] = \mathbb {H}[Y|X+Y]\), which follows from Lemma 2.12), we can bound
\[ \sum _{A,B \in \{ U,V,W\} : A \neq B} d[X_1^0;A|B, S] - d[X_1^0;X_1] \]
by
\begin{align*} & \quad \tfrac {1}{4} (6d[X_1;X_2] + 6d[X_1;\tilde X_2]\\ & \qquad + 6d[X_1;\tilde X_1] + 2d[\tilde X_1;\tilde X_2] + 2 d[\tilde X_1;X_2] + 2d[X_2;\tilde X_2])\\ & \quad + \tfrac {1}{8} (2\mathbb {H}[X_1+X_2] + 2\mathbb {H}[X_1+\tilde X_1] + 2 \mathbb {H}[X_1+\tilde X_2] \\ & \qquad - 2\mathbb {H}[\tilde X_1+X_2] - 2\mathbb {H}[X_2+\tilde X_2] - 2\mathbb {H}[\tilde X_1+\tilde X_2])\\ & \qquad \qquad + \tfrac {1}{4} (\mathbb {H}[X_2|X_2+\tilde X_2] + \mathbb {H}[\tilde X_1|\tilde X_1+\tilde X_2] + \mathbb {H}[\tilde X_1|X_1+\tilde X_2] \\ & \qquad \qquad \qquad - \mathbb {H}[X_1|X_1+\tilde X_1] - \mathbb {H}[X_1|X_1+X_2] - \mathbb {H}[X_1|X_1+\tilde X_2]), \end{align*}
which simplifies to
\begin{align*} & \quad \tfrac {1}{4} (16k + 6d[X_1;X_1] + 2d[X_2;X_2])\\ & \qquad \qquad + \tfrac {1}{4} (H[X_1+\tilde X_1] - H[X_2+\tilde X_2] + d[X_2|X_2+\tilde X_2] - d[X_1|X_1+\tilde X_1]). \end{align*}
A symmetric argument also bounds
\[ \sum _{A,B \in \{ U,V,W\} : A \neq B} d[X_2^0;A|B, S] - d[X_2^0;X_2] \]
by
\begin{align*} & \quad \tfrac {1}{4} (16k + 6d[X_2;X_2] + 2d[X_1;X_1])\\ & \qquad \qquad + \tfrac {1}{4} (H[X_2+\tilde X_2] - H[X_1+\tilde X_1] + d[X_1|X_1+\tilde X_1] - d[X_2|X_2+\tilde X_2]). \end{align*}
On the other hand, from Lemma 6.15 one has
\[ d[X_1;X_1] + d[X_2;X_2] \leq 2 k + \frac{2(2 \eta k - I_1)}{1-\eta }. \]
Summing the previous three estimates, we obtain the claim.
Theorem
8.7
Improved \(\tau \)-decrement
Suppose \(0 {\lt} \eta {\lt} 1/8\). Let \(X_1, X_2\) be tau-minimizers. Then \(d[X_1;X_2] = 0\).
Proof
▶
From Lemma 8.4, Lemma 8.6, and Lemma 6.18 one has
\[ k \leq 8\eta k - \frac{(1 -5 \eta - \frac{4}{6} \eta )(2 \eta k - I_1)}{(1-\eta )}. \]
For any \(\eta {\lt} 1/8\), we see from Lemma 6.12 that the expression \(\frac{(1 -5 \eta - \frac{4}{6} \eta )(2 \eta k - I_1)}{(1-\eta )}\) is nonnegative, and hence \(k = 0\) as required.
Theorem
8.8
Limiting improved \(\tau \)-decrement
For \(\eta = 1/8\), there exist tau-minimizers \(X_1, X_2\) satisfying \(d[X_1;X_2] = 0\).
Proof
▶
For each \(\eta {\lt}1/8\), consider minimizers \(X_1^\eta \) and \(X_2^\eta \) from Proposition 6.5. By Theorem 8.7, they satisfy \(d[X_1^\eta ; X_2^\eta ]=0\). By compactness of the space of probability measures on \(G\), one may extract a converging subsequence of the distributions of \(X_1^\eta \) and \(X_2^\eta \) as \(\eta \to 1/8\). By continuity of all the involved quantities, the limit is a pair of tau-minimizers for \(1/8\) satisfying additionally \(d[X_1;X_2] = 0\).
Theorem
8.9
Improved entropy version of PFR
Let \(G = \mathbb {F}_2^n\), and suppose that \(X^0_1, X^0_2\) are \(G\)-valued random variables. Then there is some subgroup \(H \leq G\) such that
\[ d[X^0_1;U_H] + d[X^0_2;U_H] \le 10 d[X^0_1;X^0_2], \]
where \(U_H\) is uniformly distributed on \(H\). Furthermore, both \(d[X^0_1;U_H]\) and \(d[X^0_2;U_H]\) are at most \(6 d[X^0_1;X^0_2]\).
Proof
▶
Let \(X_1, X_2\) be the good \(\tau \)-minimizer from Theorem 8.8. By construction, \(d[X_1;X_2]=0\). From Corollary 4.6, \(d[X_1;U_H] = d[X_2; U_H] = 0\). Also from \(\tau \)-minimization we have \(\tau [X_1;X_2] \leq \tau [X^0_2;X^0_1]\). Using this and the Ruzsa triangle inequality we can conclude.
One can then replace Lemma 7.2 with
Lemma
8.10
If \(A \subset {\bf F}_2^n\) is non-empty and \(|A+A| \leq K|A|\), then \(A\) can be covered by at most \(K^6 |A|^{1/2}/|H|^{1/2}\) translates of a subspace \(H\) of \({\bf F}_2^n\) with
\[ |H|/|A| \in [K^{-10}, K^{10}]. \]
Proof
▶
By repeating the proof of Lemma 7.2 and using Theorem 8.9 one can obtain the claim with \(13/2\) replaced with \(6\) and \(11\) replaced by \(10\).
This implies the following improved version of Theorem 7.3:
Theorem
8.11
Improved PFR
If \(A \subset {\bf F}_2^n\) is non-empty and \(|A+A| \leq K|A|\), then \(A\) can be covered by most \(2K^{11}\) translates of a subspace \(H\) of \({\bf F}_2^n\) with \(|H| \leq |A|\).
Proof
▶
By repeating the proof of Theorem 7.3 and using Lemma 8.10 one can obtain the claim with \(11\) replaced by \(10\).
Of course, by replacing Theorem 7.3 with Theorem 8.11 we may also improve constants in downstream theorems in a straightforward manner.