13 Further improvement to exponent
13.1 Kullback–Leibler divergence
In the definitions below, \(G\) is a set.
If \(X,Y\) are two \(G\)-valued random variables, the Kullback–Leibler divergence is defined as
If \(X'\) is a copy of \(X\), and \(Y'\) is a copy of \(Y\), then \(D_{KL}(X'\Vert Y') = D_{KL}(X\Vert Y)\).
Clear from definition.
\(D_{KL}(X\Vert Y) \geq 0\).
Apply Lemma 1.4 on the definition.
If \(D_{KL}(X\Vert Y) = 0\), then \(Y\) is a copy of \(X\).
Apply Lemma 1.5.
If \(S\) is a finite set, \(\sum _{s \in S} w_s = 1\) for some non-negative \(w_s\), and \({\bf P}(X=x) = \sum _{s\in S} w_s {\bf P}(X_s=x)\), \({\bf P}(Y=x) = \sum _{s\in S} w_s {\bf P}(Y_s=x)\) for all \(x\), then
For each \(x\), replace \(\log \frac{\mathbf{P}(X_s=x)}{\mathbf{P}(Y_s=x)}\) in the definition with \(\log \frac{w_s\mathbf{P}(X_s=x)}{w_s\mathbf{P}(Y_s=x)}\) for each \(s\), and apply Lemma 1.4.
If \(f:G \to H\) is an injection, then \(D_{KL}(f(X)\Vert f(Y)) = D_{KL}(X\Vert Y)\).
Clear from definition.
Now let \(G\) be an additive group.
If \(X, Y, Z\) are independent \(G\)-valued random variables, then
For each \(z\), \(D_{KL}(X+z\Vert Y+z)=D_{KL}(X\Vert Y)\) by Lemma 13.6. Then apply Lemma 13.5 with \(w_z=\mathbf{P}(Z=z)\).
If \(X,Y,Z\) are random variables, with \(X,Z\) defined on the same sample space, we define
If \(X, Y\) are independent \(G\)-valued random variables, and \(Z\) is another random variable defined on the same sample space as \(X\), then
Compare the terms correspond to each \(x\in G\) on both sides.
\(D_{KL}((X|W)\Vert Y) \geq 0\).
13.2 Rho functionals
Let \(G\) be an additive group, and let \(A\) be a non-empty subset of \(G\).
For any \(G\)-valued random variable \(X\), we define \(\rho ^-(X)\) to be the infimum of \(D_{KL}(X \Vert U_A + T)\), where \(U_A\) is uniform on \(A\) and \(T\) ranges over \(G\)-valued random variables independent of \(U_A\).
For any \(G\)-valued random variable \(X\), we define \(\rho ^+(X) := \rho ^-(X) + \mathbb {H}(X) - \mathbb {H}(U_A)\).
We have \(\rho ^-(X) \geq 0\).
Clear from Lemma 13.10.
If \(H\) is a finite subgroup of \(G\), then \(\rho ^-(U_H) = \log |A| - \log \max _t |A \cap (H+t)|\).
For every \(G\)-valued random variable \(T\) that is independent of \(Y\),
by Lemma 1.4. Then observe that
This proves \(\ge \).
To get the equality, let \(t^*:=\arg \max _t |A \cap (H+t)|\) and observe that
If \(H\) is a finite subgroup of \(G\), then \(\rho ^+(U_H) = \log |H| - \log \max _t |A \cap (H+t)|\).
Straightforward by definition and Lemma 13.14.
We define \(\rho (X) := (\rho ^+(X) + \rho ^-(X))/2\).
We have \(\rho (U_A) = 0\).
\(\rho ^-(U_A)\le 0\) by the choice \(T=0\). The claim then follows from Lemma 13.13.
If \(H\) is a finite subgroup of \(G\), and \(\rho (U_H) \leq r\), then there exists \(t\) such that \(|A \cap (H+t)| \geq e^{-r} \sqrt{|A||H|}\), and \(|H|/|A|\in [e^{-2r},e^{2r}]\).
The first claim is a direct corollary of Lemma 13.14 and Corollary 13.15. To see the second claim, observe that Lemma 13.13 and Corollary 13.15 imply \(\rho ^-(U_H),\rho ^+(U_H)\ge 0\). Therefore
which implies the second claim.
For any \(s \in G\), \(\rho (X+s) = \rho (X)\).
Observe that by Lemma 13.6,
\(\rho (X)\) depends continuously on the distribution of \(X\).
Clear from definition.
If \(X,Y\) are independent, one has
and
The first inequality follows from Lemma 13.7. The second and third inequalities are direct corollaries of the first.
We define \(\rho (X|Y) := \sum _y {\bf P}(Y=y) \rho (X|Y=y)\).
For any \(s\in G\), \(\rho (X+s|Y)=\rho (X|Y)\).
Direct corollary of Lemma 13.19.
If \(f\) is injective, then \(\rho (X|f(Y))=\rho (X|Y)\).
Clear from the definition.
If \(X,Z\) are defined on the same space, one has
and
The first inequality follows from Lemma 13.9. The second and third inequalities are direct corollaries of the first.
The following lemmas hold for \(G=\mathbb {F}_2^n\).
If \(X,Y\) are independent, then
Apply Lemma 13.21 for \((X,Y)\) and \((Y,X)\) and take their average.
If \(X,Y\) are independent, then
First apply Lemma 13.25 to get \(\rho (X|X+Y)\le \rho (X) + \frac{1}{2}(\mathbb {H}[X+Y]-\mathbb {H}[Y])\), and \(\rho (Y|X+Y)\le \rho (Y)+\frac{1}{2}(\mathbb {H}[X+Y]-\mathbb {H}[X])\). Then apply Lemma 13.19 to get \(\rho (Y|X+Y)=\rho (X|X+Y)\) and take the average of the two inequalities.
13.3 Studying a minimizer
Set \(\eta {\lt} 1/8\). In this section, consider \(G=\mathbb {F}_2^n\).
Given \(G\)-valued random variables \(X,Y\), define
and define a \(\phi \)-minimizer to be a pair of random variables \(X,Y\) which minimizes \(\phi [X;Y]\).
There exists a \(\phi \)-minimizer.
Clear from compactness.
Let \((X_1, X_2)\) be a \(\phi \)-minimizer, and \(\tilde X_1, \tilde X_2\) be independent copies of \(X_1,X_2\) respectively. Similar to the original proof we define
First we need the \(\phi \)-minimizer variants of Lemma 6.12 and Lemma 6.16.
\(I_1\le 2\eta d[X_1;X_2]\)
Similar to Lemma 6.12: get upper bounds for \(d[X_1;X_2]\) by \(\phi [X_1;X_2]\le \phi [X_1+X_2;\tilde X_1+\tilde X_2]\) and \(\phi [X_1;X_2]\le \phi [X_1|X_1+X_2;\tilde X_2|\tilde X_1+\tilde X_2]\), and then apply Lemma 6.8 to get an upper bound for \(I_1\).
\(d[X_1;X_1]+d[X_2;X_2]= 2d[X_1;X_2]+(I_2-I_1)\).
Compare Lemma 6.8 with the identity obtained from applying Corollary 5.3 on \((X_1,\tilde X_1, X_2, \tilde X_2)\).
\(I_2\le 2\eta d[X_1;X_2] + \frac{\eta }{1-\eta }(2\eta d[X_1;X_2]-I_1)\).
First of all, by \(\phi [X_1;X_2]\le \phi [X_1+\tilde X_1;X_2+\tilde X_2]\), \(\phi [X_1;X_2]\le \phi [X_1|X_1+\tilde X_1;X_2|X_2+\tilde X_2]\), and the fibring identity obtained by applying Corollary 5.3 on \((X_1,X_2,\tilde X_1,\tilde X_2)\), we have \(I_2\le \eta (d[X_1;X_1]+d[X_2;X_2])\). Then apply Lemma 13.31 to get \(I_2\le 2\eta d[X_1;X_2] +\eta (I_2-I_1)\), and rearrange.
Next we need some inequalities for the endgame.
If \(G\)-valued random variables \(T_1,T_2,T_3\) satisfy \(T_1+T_2+T_3=0\), then
Conditioned on every \(T_3=t\), \(d[X_1;X_2]\le d[T_1|T_3=t;T_2|T_3=t]+\eta (\rho (T_1|T_3=t)+\rho (T_2|T_3=t)-\rho (X_1)-\rho (X_2))\) by Definition 13.28. Then take the weighted average with weight \(\mathbf{P}(T_3=t)\) and then apply Lemma 3.23 to bound the RHS.
If \(G\)-valued random variables \(T_1,T_2,T_3\) satisfy \(T_1+T_2+T_3=0\), then
Take the average of Lemma 13.33 over all \(6\) permutations of \(T_1,T_2,T_3\).
For independent random variables \(Y_1,Y_2,Y_3,Y_4\) over \(G\), define \(S:=Y_1+Y_2+Y_3+Y_4\), \(T_1:=Y_1+Y_2\), \(T_2:=Y_1+Y_3\). Then
Let \(T_1':=Y_3+Y_4\), \(T_2':=Y_2+Y_4\). First note that
by Lemma 13.25, Lemma 13.27, Lemma 13.26 respectively. On the other hand, observe that
by Lemma 13.24, Lemma 13.26, Lemma 13.27 respectively. By replacing \((Y_1,Y_2,Y_3,Y_4)\) with \((Y_1,Y_3,Y_2,Y_4)\) in the above inequalities, one has
and
Finally, take the sum of all four inequalities, apply Corollary 5.3 on \((Y_1,Y_2,Y_3,Y_4)\) and \((Y_1,Y_3,Y_2,Y_4)\) to rewrite the sum of last terms in the four inequalities, and divide the result by \(2\).
For independent random variables \(Y_1,Y_2,Y_3,Y_4\) over \(G\), define \(T_1:=Y_1+Y_2,T_2:=Y_1+Y_3,T_3:=Y_2+Y_3\) and \(S:=Y_1+Y_2+Y_3+Y_4\). Then
Apply Lemma 13.35 on \((Y_i,Y_j,Y_k,Y_4)\) for \((i,j,k)=(1,2,3),(2,3,1),(1,3,2)\), and take the sum.
If \(X_1,X_2\) is a \(\phi \)-minimizer, then \(d[X_1;X_2] = 0\).
Consider \(T_1:=X_1+X_2,T_2:=X_1+\tilde X_1, T_3:=\tilde X_1 + X_2\), and \(S=X_1+X_2+\tilde X_1 + \tilde X_2\). Note that \(T_1+T_2+T_3=0\). First apply Lemma 13.34 on \((T_1,T_2,T_3)\) when conditioned on \(S\) to get
Then apply Lemma 13.36 on \((X_1,X_2,\tilde X_1,\tilde X_2)\) and get
by Lemma 13.31. Plug in the inequality above to (1), we get
By Lemma 13.32 we can conclude that
Finally by Lemma 13.30 and \(\eta {\lt}1\) we get that the second term is \(\le 0\), and thus \(d[X_1;X_2] \le 8\eta d[X_1;X_2]\). By the choice \(\eta {\lt}1/8\) and the non-negativity of \(d\) we have \(d[X_1;X_2]=0\).
For any random variables \(Y_1,Y_2\), there exist a subgroup \(H\) such that
Let \(X_1,X_2\) be a \(\phi \)-minimizer. By Proposition 13.37 \(d[X_1;X_2]=0\), which by Definition 13.28 implies \(\rho (X_1)+\rho (X_2)\le \rho (Y_1) + \rho (Y_2) + \frac{1}{\eta } d[Y_1;Y_2]\) for every \(\eta {\lt}1/8\). Take the limit at \(\eta =1/8\) to get \(\rho (X_1)+\rho (X_2)\le \rho (Y_1) + \rho (Y_2) + 8 d[Y_1;Y_2]\). By Lemma 3.18 and Lemma 3.15 we have \(d[X_1;X_1]=d[X_2;X_2]=0\), and by Lemma 4.4 there are \(H_1:=\mathrm{Sym}[X_1],H_2:=\mathrm{Sym}[X_2]\) such that \(X_1=U_{H_1}+x_1\) and \(X_2=U_{H_2}+x_2\) for some \(x_2\). By Lemma 13.19 we get \(\rho (U_{H_1})+\rho (U_{H_2})\le \rho (Y_1) + \rho (Y_2) + 8 d[Y_1;Y_2]\), and thus the claim holds for \(H=H_1\) or \(H=H_2\).
If \(|A+A| \leq K|A|\), then there exists a subgroup \(H\) and \(t\in G\) such that \(|A \cap (H+t)| \geq K^{-4} \sqrt{|A||H|}\), and \(|H|/|A|\in [K^{-8},K^8]\).
Apply Proposition 13.38 on \(U_A,U_A\) to get a subspace such that \(2\rho (U_H)\le 2\rho (U_A)+8d[U_A;U_A]\). Recall that \(d[U_A;U_A]\le \log K\) as proved in Lemma 7.2, and \(\rho (U_A)=0\) by Lemma 13.17. Therefore \(\rho (U_H)\le 4\log (K)\). The claim then follows from Lemma 13.18.
If \(|A+A| \leq K|A|\), then there exist a subgroup \(H\) and a subset \(c\) of \(G\) with \(A \subseteq c + H\), such that \(|c| \leq K^{5} |A|^{1/2}/|H|^{1/2}\) and \(|H|/|A|\in [K^{-8},K^8]\).
Apply Corollary 13.39 and Lemma 7.1 to get the result, as in the proof of Lemma 7.2.
If \(A \subset {\bf F}_2^n\) is finite non-empty with \(|A+A| \leq K|A|\), then there exists a subgroup \(H\) of \({\bf F}_2^n\) with \(|H| \leq |A|\) such that \(A\) can be covered by at most \(2K^9\) translates of \(H\).
Given Corollary 13.40, the proof is the same as that of Theorem 7.3.