PFR

13 Further improvement to exponent

13.1 Kullback–Leibler divergence

In the definitions below, \(G\) is a set.

Definition 13.1 Kullback–Leibler divergence
#

If \(X,Y\) are two \(G\)-valued random variables, the Kullback–Leibler divergence is defined as

\[ D_{KL}(X\Vert Y) := \sum _x \mathbf{P}(X=x) \log \frac{\mathbf{P}(X=x)}{\mathbf{P}(Y=x)}. \]
Lemma 13.2 Kullback–Leibler divergence of copy
#

If \(X'\) is a copy of \(X\), and \(Y'\) is a copy of \(Y\), then \(D_{KL}(X'\Vert Y') = D_{KL}(X\Vert Y)\).

Proof

Clear from definition.

Lemma 13.3 Gibbs inequality
#

\(D_{KL}(X\Vert Y) \geq 0\).

Proof

Apply Lemma 1.4 on the definition.

Lemma 13.4 Converse Gibbs inequality
#

If \(D_{KL}(X\Vert Y) = 0\), then \(Y\) is a copy of \(X\).

Proof

Apply Lemma 1.5.

Lemma 13.5 Convexity of Kullback–Leibler
#

If \(S\) is a finite set, \(\sum _{s \in S} w_s = 1\) for some non-negative \(w_s\), and \({\bf P}(X=x) = \sum _{s\in S} w_s {\bf P}(X_s=x)\), \({\bf P}(Y=x) = \sum _{s\in S} w_s {\bf P}(Y_s=x)\) for all \(x\), then

\[ D_{KL}(X\Vert Y) \le \sum _{s\in S} w_s D_{KL}(X_s\Vert Y_s). \]
Proof

For each \(x\), replace \(\log \frac{\mathbf{P}(X_s=x)}{\mathbf{P}(Y_s=x)}\) in the definition with \(\log \frac{w_s\mathbf{P}(X_s=x)}{w_s\mathbf{P}(Y_s=x)}\) for each \(s\), and apply Lemma 1.4.

Lemma 13.6 Kullback–Leibler and injections
#

If \(f:G \to H\) is an injection, then \(D_{KL}(f(X)\Vert f(Y)) = D_{KL}(X\Vert Y)\).

Proof

Clear from definition.

Now let \(G\) be an additive group.

Lemma 13.7 Kullback–Leibler and sums
#

If \(X, Y, Z\) are independent \(G\)-valued random variables, then

\[ D_{KL}(X+Z\Vert Y+Z) \leq D_{KL}(X\Vert Y). \]
Proof

For each \(z\), \(D_{KL}(X+z\Vert Y+z)=D_{KL}(X\Vert Y)\) by Lemma 13.6. Then apply Lemma 13.5 with \(w_z=\mathbf{P}(Z=z)\).

Definition 13.8 Conditional Kullback–Leibler divergence

If \(X,Y,Z\) are random variables, with \(X,Z\) defined on the same sample space, we define

\[ D_{KL}(X|Z \Vert Y) := \sum _z \mathbf{P}(Z=z) D_{KL}( (X|Z=z) \Vert Y). \]
Lemma 13.9 Kullback–Leibler and conditioning
#

If \(X, Y\) are independent \(G\)-valued random variables, and \(Z\) is another random variable defined on the same sample space as \(X\), then

\[ D_{KL}((X|Z)\Vert Y) = D_{KL}(X\Vert Y) + \mathbb {H}[X] - \mathbb {H}[X|Z]. \]
Proof

Compare the terms correspond to each \(x\in G\) on both sides.

Lemma 13.10 Conditional Gibbs inequality
#

\(D_{KL}((X|W)\Vert Y) \geq 0\).

Proof

Clear from Definition 13.8 and Lemma 13.3.

13.2 Rho functionals

Let \(G\) be an additive group, and let \(A\) be a non-empty subset of \(G\).

Definition 13.11 Rho minus
#

For any \(G\)-valued random variable \(X\), we define \(\rho ^-(X)\) to be the infimum of \(D_{KL}(X \Vert U_A + T)\), where \(U_A\) is uniform on \(A\) and \(T\) ranges over \(G\)-valued random variables independent of \(U_A\).

Definition 13.12 Rho plus
#

For any \(G\)-valued random variable \(X\), we define \(\rho ^+(X) := \rho ^-(X) + \mathbb {H}(X) - \mathbb {H}(U_A)\).

Lemma 13.13 Rho minus non-negative

We have \(\rho ^-(X) \geq 0\).

Proof

Clear from Lemma 13.10.

Lemma 13.14 Rho minus of subgroup
#

If \(H\) is a finite subgroup of \(G\), then \(\rho ^-(U_H) = \log |A| - \log \max _t |A \cap (H+t)|\).

Proof

For every \(G\)-valued random variable \(T\) that is independent of \(Y\),

\[ D_{KL}(U_H \Vert U_A+T) = \sum _{h\in H} \frac{1}{|H|}\log \frac{1/|H|}{\mathbf{P}[U_A+T=h]}\ge -\log (\mathbf{P}[U_A+T\in H]), \]

by Lemma 1.4. Then observe that

\[ -\log (\mathbf{P}[U_A+T\in H])=-\log (\mathbf{P}[U_A\in H-T])\ge -\log (\max _{t\in G} \mathbf{P}[U_A\in H+t]). \]

This proves \(\ge \).

To get the equality, let \(t^*:=\arg \max _t |A \cap (H+t)|\) and observe that

\[ \rho ^-(U_H)\le D_{KL}(U_H \Vert U_A+(U_H-t^*))= \log |A| - \log \max _t|A \cap (H+t)|. \]
Corollary 13.15 Rho plus of subgroup
#

If \(H\) is a finite subgroup of \(G\), then \(\rho ^+(U_H) = \log |H| - \log \max _t |A \cap (H+t)|\).

Proof

Straightforward by definition and Lemma 13.14.

Definition 13.16 Rho functional
#

We define \(\rho (X) := (\rho ^+(X) + \rho ^-(X))/2\).

Lemma 13.17

We have \(\rho (U_A) = 0\).

Proof

\(\rho ^-(U_A)\le 0\) by the choice \(T=0\). The claim then follows from Lemma 13.13.

Lemma 13.18 Rho of subgroup

If \(H\) is a finite subgroup of \(G\), and \(\rho (U_H) \leq r\), then there exists \(t\) such that \(|A \cap (H+t)| \geq 2^{-r} \sqrt{|A||H|}\), and \(|H|/|A|\in [2^{-2r},2^{2r}]\).

Proof

The first claim is a direct corollary of Lemma 13.14 and Corollary 13.15. To see the second claim, observe that Lemma 13.13 and Corollary 13.15 imply \(\rho ^-(U_H),\rho ^+(U_H)\ge 0\). Therefore

\[ |H(U_A)-H(U_H)|=|\rho ^+(U_H)-\rho ^-(U_H)|\le \rho ^-(U_H)+\rho ^+(U_H)= 2\rho (U_H)\le 2r, \]

which implies the second claim.

Lemma 13.19 Rho invariant
#

For any \(s \in G\), \(\rho (X+s) = \rho (X)\).

Proof

Observe that by Lemma 13.6,

\[ \inf _T D_{KL}(X\Vert U_A+T)=\inf _T D_{KL}(X+s\Vert U_A+T+s)=\inf _{T'} D_{KL}(X+s\Vert U_A+T'). \]
Lemma 13.20 Rho continuous
#

\(\rho (X)\) depends continuously on the distribution of \(X\).

Proof

Clear from definition.

Lemma 13.21 Rho and sums
#

If \(X,Y\) are independent, one has

\[ \rho ^-(X+Y) \leq \rho ^-(X) \]
\[ \rho ^+(X+Y) \leq \rho ^+(X) + \mathbb {H}[X+Y] - \mathbb {H}[X] \]

and

\[ \rho (X+Y) \leq \rho (X) + \frac{1}{2}( \mathbb {H}[X+Y] - \mathbb {H}[X] ). \]
Proof

The first inequality follows from Lemma 13.7. The second and third inequalities are direct corollaries of the first.

Definition 13.22 Conditional Rho functional
#

We define \(\rho (X|Y) := \sum _y {\bf P}(Y=y) \rho (X|Y=y)\).

Lemma 13.23
#

For any \(s\in G\), \(\rho (X+s|Y)=\rho (X|Y)\).

Proof

Direct corollary of Lemma 13.19.

Lemma 13.24
#

If \(f\) is injective, then \(\rho (X|f(Y))=\rho (X|Y)\).

Proof

Clear from the definition.

Lemma 13.25 Rho and conditioning
#

If \(X,Z\) are defined on the same space, one has

\[ \rho ^-(X|Z) \leq \rho ^-(X) + \mathbb {H}[X] - \mathbb {H}[X|Z] \]
\[ \rho ^+(X|Z) \leq \rho ^+(X) \]

and

\[ \rho (X|Z) \leq \rho (X) + \frac{1}{2}( \mathbb {H}[X] - \mathbb {H}[X|Z] ). \]
Proof

The first inequality follows from Lemma 13.9. The second and third inequalities are direct corollaries of the first.

The following lemmas hold for \(G=\mathbb {F}_2^n\).

Lemma 13.26 Rho and sums, symmetrized
#

If \(X,Y\) are independent, then

\[ \rho (X+Y) \leq \frac{1}{2}(\rho (X)+\rho (Y) + d[X;Y]). \]
Proof

Apply Lemma 13.21 for \((X,Y)\) and \((Y,X)\) and take their average.

Lemma 13.27 Rho and conditioning, symmetrized
#

If \(X,Y\) are independent, then

\[ \rho (X | X+Y) \leq \frac{1}{2}(\rho (X)+\rho (Y) + d[X;Y]). \]
Proof

First apply Lemma 13.25 to get \(\rho (X|X+Y)\le \rho (X) + \frac{1}{2}(\mathbb {H}[X+Y]-\mathbb {H}[Y])\), and \(\rho (Y|X+Y)\le \rho (Y)+\frac{1}{2}(\mathbb {H}[X+Y]-\mathbb {H}[X])\). Then apply Lemma 13.19 to get \(\rho (Y|X+Y)=\rho (X|X+Y)\) and take the average of the two inequalities.

13.3 Studying a minimizer

Set \(\eta {\lt} 1/8\). In this section, consider \(G=\mathbb {F}_2^n\).

Definition 13.28
#

Given \(G\)-valued random variables \(X,Y\), define

\[ \phi [X;Y] := d[X;Y] + \eta (\rho (X) + \rho (Y)) \]

and define a \(\phi \)-minimizer to be a pair of random variables \(X,Y\) which minimizes \(\phi [X;Y]\).

Lemma 13.29 \(\phi \)-minimizers exist
#

There exists a \(\phi \)-minimizer.

Proof

Clear from compactness.

Let \((X_1, X_2)\) be a \(\phi \)-minimizer, and \(\tilde X_1, \tilde X_2\) be independent copies of \(X_1,X_2\) respectively. Similar to the original proof we define

\[ I_1 := I [X_1+X_2 : \tilde X_1 + X_2 | X_1+X_2+\tilde X_1+\tilde X_2], I_2 := \mathbb {I}[X_1+X_2 : X_1 + \tilde X_1 | X_1+X_2+\tilde X_1+\tilde X_2]. \]

First we need the \(\phi \)-minimizer variants of Lemma 6.12 and Lemma 6.16.

Lemma 13.30
#

\(I_1\le 2\eta d[X_1;X_2]\)

Proof

Similar to Lemma 6.12: get upper bounds for \(d[X_1;X_2]\) by \(\phi [X_1;X_2]\le \phi [X_1+X_2;\tilde X_1+\tilde X_2]\) and \(\phi [X_1;X_2]\le \phi [X_1|X_1+X_2;\tilde X_2|\tilde X_1+\tilde X_2]\), and then apply Lemma 6.8 to get an upper bound for \(I_1\).

Lemma 13.31
#

\(d[X_1;X_1]+d[X_2;X_2]= 2d[X_1;X_2]+(I_2-I_1)\).

Proof

Compare Lemma 6.8 with the identity obtained from applying Corollary 5.3 on \((X_1,\tilde X_1, X_2, \tilde X_2)\).

Lemma 13.32
#

\(I_2\le 2\eta d[X_1;X_2] + \frac{\eta }{1-\eta }(2\eta d[X_1;X_2]-I_1)\).

Proof

First of all, by \(\phi [X_1;X_2]\le \phi [X_1+\tilde X_1;X_2+\tilde X_2]\), \(\phi [X_1;X_2]\le \phi [X_1|X_1+\tilde X_1;X_2|X_2+\tilde X_2]\), and the fibring identity obtained by applying Corollary 5.3 on \((X_1,X_2,\tilde X_1,\tilde X_2)\), we have \(I_2\le \eta (d[X_1;X_2]+d[X_2;X_2])\). Then apply Lemma 13.31 to get \(I_2\le 2\eta d[X_1;X_2] +\eta (I_2-I_1)\), and rearrange.

Next we need some inequalities for the endgame.

Lemma 13.33
#

If \(G\)-valued random variables \(T_1,T_2,T_3\) satisfy \(T_1+T_2+T_3=0\), then

\[ d[X_1;X_2]\le 3\mathbb {I}[T_1:T_2\mid T_3] + (2\mathbb {H}[T_3]-\mathbb {H}[T_1]-\mathbb {H}[T_2])+ \eta (\rho (T_1|T_3)+\rho (T_2|T_3)-\rho (X_1)-\rho (X_2)). \]
Proof

Conditioned on every \(T_3=t\), \(d[X_1;X_2]\le d[T_1|T_3=t;T_2|T_3=t]+\eta (\rho (T_1|T_3=t)+\rho (T_2|T_3=t)-\rho (X_1)-\rho (X_2))\) by Definition 13.28. Then take the weighted average with weight \(\mathbf{P}(T_3=t)\) and then apply Lemma 3.23 to bound the RHS.

Lemma 13.34
#

If \(G\)-valued random variables \(T_1,T_2,T_3\) satisfy \(T_1+T_2+T_3=0\), then

\[ d[X_1;X_2] \leq \sum _{1 \leq i{\lt}j \leq 3} \mathbb {I}[T_i:T_j] + \frac{\eta }{3} \sum _{1 \leq i{\lt}j \leq 3} (\rho (T_i|T_j) + \rho (T_j|T_i) -\rho (X_1)-\rho (X_2)) \]
Proof

Take the average of Lemma 13.33 over all \(6\) permutations of \(T_1,T_2,T_3\).

Lemma 13.35
#

For independent random variables \(Y_1,Y_2,Y_3,Y_4\) over \(G\), define \(S:=Y_1+Y_2+Y_3+Y_4\), \(T_1:=Y_1+Y_2\), \(T_2:=Y_1+Y_3\). Then

\[ \rho (T_1|T_2,S)+\rho (T_2|T_1,S) - \frac{1}{2}\sum _{i} \rho (Y_i)\le \frac{1}{2}(d[Y_1;Y_2]+d[Y_3;Y_4]+d[Y_1;Y_3]+d[Y_2;Y_4]). \]
Proof

Let \(T_1':=Y_3+Y_4\), \(T_2':=Y_2+Y_4\). First note that

\begin{align*} \rho (T_1|T_2,S) & \le \rho (T_1|S) + \frac{1}{2}\mathbb {I}(T_1:T_2\mid S) \textrm{ (by \Cref{rho-cond})}\\ & \le \frac{1}{2}(\rho (T_1)+\rho (T_1’))+\frac{1}{2}(d[T_1;T_1’]+\mathbb {I}(T_1:T_2\mid S)) \textrm{ (by \Cref{rho-cond-sym})}\\ & \le \frac{1}{4} \sum _{i} \rho (Y_i) +\frac{1}{4}(d[Y_1;Y_2]+d[Y_3;Y_4]) + \frac{1}{2}(d[T_1;T_1’]+\mathbb {I}(T_1:T_2\mid S)). \textrm{ (by \Cref{rho-sums-sym})} \end{align*}

On the other hand, observe that

\begin{align*} \rho (T_1|T_2,S) & =\rho (Y_1+Y_2|T_2,T_2’) \textrm{ (by \Cref{rho-cond-relabeled})}\\ & \le \frac{1}{2}(\rho (Y_1|T_2)+\rho (Y_2|T_2’))+\frac{1}{2}(d[Y_1|T_2;Y_2|T_2’]) \textrm{ (by \Cref{rho-sums-sym})}\\ & \le \frac{1}{4} \sum _{i} \rho (Y_i) +\frac{1}{4}(d[Y_1;Y_3]+d[Y_2;Y_4]) + \frac{1}{2}(d[Y_1|T_2;Y_2|T_2’]). \textrm{ (by \Cref{rho-cond-sym})}. \end{align*}

By replacing \((Y_1,Y_2,Y_3,Y_4)\) with \((Y_1,Y_3,Y_2,Y_4)\) in the above inequalities, one has

\[ \rho (T_2|T_1,S) \le \frac{1}{4} \sum _{i} \rho (Y_i) +\frac{1}{4}(d[Y_1;Y_3]+d[Y_2;Y_4]) + \frac{1}{2}(d[T_2;T_2']+\mathbb {I}(T_1:T_2\mid S)) \]

and

\[ \rho (T_2|T_1,S) \le \frac{1}{4} \sum _{i} \rho (Y_i) +\frac{1}{4}(d[Y_1;Y_2]+d[Y_3;Y_4]) + \frac{1}{2}(d[Y_1|T_1;Y_3|T_1']). \]

Finally, take the sum of all four inequalities, apply Corollary 5.3 on \((Y_1,Y_2,Y_3,Y_4)\) and \((Y_1,Y_3,Y_2,Y_4)\) to rewrite the sum of last terms in the four inequalities, and divide the result by \(2\).

Lemma 13.36
#

For independent random variables \(Y_1,Y_2,Y_3,Y_4\) over \(G\), define \(T_1:=Y_1+Y_2,T_2:=Y_1+Y_3,T_3:=Y_2+Y_3\) and \(S:=Y_1+Y_2+Y_3+Y_4\). Then

\[ \sum _{1 \leq i{\lt}j \leq 3} (\rho (T_i|T_j,S) + \rho (T_j|T_i,S) - \frac{1}{2}\sum _{i} \rho (Y_i))\le \sum _{1\leq i {\lt} j \leq 4}d[Y_i;Y_j] \]
Proof

Apply Lemma 13.35 on \((Y_i,Y_j,Y_k,Y_4)\) for \((i,j,k)=(1,2,3),(2,3,1),(1,3,2)\), and take the sum.

Proposition 13.37
#

If \(X_1,X_2\) is a \(\phi \)-minimizer, then \(d[X_1;X_2] = 0\).

Proof

Consider \(T_1:=X_1+X_2,T_2:=X_1+\tilde X_1, T_3:=\tilde X_1 + X_2\), and \(S=X_1+X_2+\tilde X_1 + \tilde X_2\). Note that \(T_1+T_2+T_3=0\). First apply Lemma 13.34 on \((T_1,T_2,T_3)\) when conditioned on \(S\) to get

\begin{align} \label{eq:further-bsg} d[X_1;X_2] & \leq \sum _{1 \leq i{\lt}j \leq 3} \mathbb {I}[T_i:T_j\mid S] + \frac{\eta }{3} \sum _{1 \leq i{\lt}j \leq 3} (\rho (T_i|T_j,S) + \rho (T_j|T_i,S) -\rho (X_1)-\rho (X_2))\nonumber \\ & = (I_1+2I_2) + \frac{\eta }{3} \sum _{1 \leq i{\lt}j \leq 3} (\rho (T_i|T_j,S) + \rho (T_j|T_i,S) -\rho (X_1)-\rho (X_2)). \end{align}

Then apply Lemma 13.36 on \((X_1,X_2,\tilde X_1,\tilde X_2)\) and get

\[ \sum _{1 \leq i{\lt}j \leq 3} (\rho (T_i|T_j,S) + \rho (T_j|T_i,S) - \rho (X_1)-\rho (X_2))\le (4d[X_1;X_2]+d[X_1;X_2]+d[X_2;X_2])= 6 d[X_1;X_2]+(I_2-I_1) \]

by Lemma 13.31. Plug in the inequality above to (1), we get

\[ d[X_1;X_2] \le (I_1+2I_2)+2\eta d[X_1;X_2]+\frac{\eta }{3}(I_2-I_1). \]

By Lemma 13.32 we can conclude that

\[ d[X_1;X_2] \le 8\eta d[X_1;X_2]+\frac{3+4\eta }{3-3\eta } (2\eta d[X_1;X_2]-I_1). \]

Finally by Lemma 13.30 and \(\eta {\lt}1\) we get that the second term is \(\le 0\), and thus \(d[X_1;X_2] \le 8\eta d[X_1;X_2]\). By the choice \(\eta {\lt}1/8\) and the non-negativity of \(d\) we have \(d[X_1;X_2]=0\).

Proposition 13.38
#

For any random variables \(Y_1,Y_2\), there exist a subgroup \(H\) such that

\[ 2\rho (U_H) \leq \rho (Y_1) + \rho (Y_2) + 8 d[Y_1;Y_2]. \]
Proof

Let \(X_1,X_2\) be a \(\phi \)-minimizer. By Proposition 13.37 \(d[X_1;X_2]=0\), which by Definition 13.28 implies \(\rho (X_1)+\rho (X_2)\le \rho (Y_1) + \rho (Y_2) + \frac{1}{\eta } d[Y_1;Y_2]\) for every \(\eta {\lt}1/8\). Take the limit at \(\eta =1/8\) to get \(\rho (X_1)+\rho (X_2)\le \rho (Y_1) + \rho (Y_2) + 8 d[Y_1;Y_2]\). By Lemma 3.18 and Lemma 3.15 we have \(d[X_1;X_1]=d[X_2;X_2]=0\), and by Lemma 4.4 there are \(H_1:=\mathrm{Sym}[X_1],H_2:=\mathrm{Sym}[X_2]\) such that \(X_1=U_{H_1}+x_1\) and \(X_2=U_{H_2}+x_2\) for some \(x_2\). By Lemma 13.19 we get \(\rho (U_{H_1})+\rho (U_{H_2})\le \rho (Y_1) + \rho (Y_2) + 8 d[Y_1;Y_2]\), and thus the claim holds for \(H=H_1\) or \(H=H_2\).

Corollary 13.39
#

If \(|A+A| \leq K|A|\), then there exists a subgroup \(H\) and \(t\in G\) such that \(|A \cap (H+t)| \geq K^{-4} \sqrt{|A||V|}\), and \(|H|/|A|\in [K^{-8},K^8]\).

Proof

Apply Proposition 13.38 on \(U_A,U_A\) to get a subspace such that \(2\rho (U_H)\le 2\rho (U_A)+8d[U_A;U_A]\). Recall that \(d[U_A;U_A]\le \log K\) as proved in Lemma 7.2, and \(\rho (U_A)=0\) by Lemma 13.17. Therefore \(\rho (U_H)\le 4\log (K)\). The claim then follows from Lemma 13.18.

Theorem 13.40 PFR with \(C=9\)
#

If \(A \subset {\bf F}_2^n\) is finite non-empty with \(|A+A| \leq K|A|\), then there exists a subgroup \(H\) of \({\bf F}_2^n\) with \(|H| \leq |A|\) such that \(A\) can be covered by at most \(2K^9\) translates of \(H\).

Proof

Apply Corollary 13.39 and Lemma 7.1 to get a variant of Lemma 7.2. The remaining part is the same as Theorem 7.3.