13 Further improvement to exponent

13.1 Kullback–Leibler divergence

In the definitions below, $G$ is a set.

Definition 13.1 Kullback–Leibler divergence

✓

If $X, Y$ are two $G$ -valued random variables, the Kullback–Leibler divergence is defined as

D_{K L} (X ‖ Y) := \sum_{x} P (X = x) \log \frac{P (X = x)}{P (Y = x)} .

Lemma 13.2 Kullback–Leibler divergence of copy

✓

If $X^{'}$ is a copy of $X$ , and $Y^{'}$ is a copy of $Y$ , then $D_{K L} (X^{'} ‖ Y^{'}) = D_{K L} (X ‖ Y)$ .

Proof ▶

Lemma 13.3 Gibbs inequality

✓

$D_{K L} (X ‖ Y) \geq 0$ .

Proof ▶

Lemma 13.4 Converse Gibbs inequality

✓

If $D_{K L} (X ‖ Y) = 0$ , then $Y$ is a copy of $X$ .

Proof ▶

Lemma 13.5 Convexity of Kullback–Leibler

✓

If $S$ is a finite set, $\sum_{s \in S} w_{s} = 1$ for some non-negative $w_{s}$ , and $P (X = x) = \sum_{s \in S} w_{s} P (X_{s} = x)$ , $P (Y = x) = \sum_{s \in S} w_{s} P (Y_{s} = x)$ for all $x$ , then

D_{K L} (X ‖ Y) \leq \sum_{s \in S} w_{s} D_{K L} (X_{s} ‖ Y_{s}) .

Proof ▶

For each $x$ , replace $\log \frac{P (X_{s} = x)}{P (Y_{s} = x)}$ in the definition with $\log \frac{w_{s} P (X_{s} = x)}{w_{s} P (Y_{s} = x)}$ for each $s$ , and apply Lemma 1.4.

Lemma 13.6 Kullback–Leibler and injections

✓

If $f : G \to H$ is an injection, then $D_{K L} (f (X) ‖ f (Y)) = D_{K L} (X ‖ Y)$ .

Proof ▶

Now let $G$ be an additive group.

Lemma 13.7 Kullback–Leibler and sums

✓

If $X, Y, Z$ are independent $G$ -valued random variables, then

D_{K L} (X + Z ‖ Y + Z) \leq D_{K L} (X ‖ Y) .

Proof ▶

For each $z$ , $D_{K L} (X + z ‖ Y + z) = D_{K L} (X ‖ Y)$ by Lemma 13.6. Then apply Lemma 13.5 with $w_{z} = P (Z = z)$ .

Definition 13.8 Conditional Kullback–Leibler divergence

✓

If $X, Y, Z$ are random variables, with $X, Z$ defined on the same sample space, we define

D_{K L} (X | Z ‖ Y) := \sum_{z} P (Z = z) D_{K L} ((X | Z = z) ‖ Y) .

Lemma 13.9 Kullback–Leibler and conditioning

✓

If $X, Y$ are independent $G$ -valued random variables, and $Z$ is another random variable defined on the same sample space as $X$ , then

D_{K L} ((X | Z) ‖ Y) = D_{K L} (X ‖ Y) + H [X] - H [X | Z] .

Proof ▶

Compare the terms correspond to each $x \in G$ on both sides.

Lemma 13.10 Conditional Gibbs inequality

✓

$D_{K L} ((X | W) ‖ Y) \geq 0$ .

Proof ▶

13.2 Rho functionals

Let $G$ be an additive group, and let $A$ be a non-empty subset of $G$ .

Definition 13.11 Rho minus

✓

For any $G$ -valued random variable $X$ , we define $ρ^{-} (X)$ to be the infimum of $D_{K L} (X ‖ U_{A} + T)$ , where $U_{A}$ is uniform on $A$ and $T$ ranges over $G$ -valued random variables independent of $U_{A}$ .

Definition 13.12 Rho plus

✓

For any $G$ -valued random variable $X$ , we define $ρ^{+} (X) := ρ^{-} (X) + H (X) - H (U_{A})$ .

Lemma 13.13 Rho minus non-negative

✓

We have $ρ^{-} (X) \geq 0$ .

Proof ▶

Lemma 13.14 Rho minus of subgroup

✓

If $H$ is a finite subgroup of $G$ , then $ρ^{-} (U_{H}) = \log | A | - \log max_{t} | A \cap (H + t) |$ .

Proof ▶

For every $G$ -valued random variable $T$ that is independent of $Y$ ,

D_{K L} (U_{H} ‖ U_{A} + T) = \sum_{h \in H} \frac{1}{| H |} \log \frac{1 / | H |}{P [U_{A} + T = h]} \geq - \log (P [U_{A} + T \in H]),

by Lemma 1.4. Then observe that

- \log (P [U_{A} + T \in H]) = - \log (P [U_{A} \in H - T]) \geq - \log (max_{t \in G} P [U_{A} \in H + t]) .

This proves $\geq$ .

To get the equality, let $t^{*} := \arg max_{t} | A \cap (H + t) |$ and observe that

ρ^{-} (U_{H}) \leq D_{K L} (U_{H} ‖ U_{A} + (U_{H} - t^{*})) = \log | A | - \log max_{t} | A \cap (H + t) | .

Corollary 13.15 Rho plus of subgroup

✓

If $H$ is a finite subgroup of $G$ , then $ρ^{+} (U_{H}) = \log | H | - \log max_{t} | A \cap (H + t) |$ .

Proof ▶

Definition 13.16 Rho functional

✓

We define $ρ (X) := (ρ^{+} (X) + ρ^{-} (X)) / 2$ .

Lemma 13.17

✓

We have $ρ (U_{A}) = 0$ .

Proof ▶

$ρ^{-} (U_{A}) \leq 0$ by the choice $T = 0$ . The claim then follows from Lemma 13.13.

Lemma 13.18 Rho of subgroup

✓

If $H$ is a finite subgroup of $G$ , and $ρ (U_{H}) \leq r$ , then there exists $t$ such that $| A \cap (H + t) | \geq e^{- r} \sqrt{| A | | H |}$ , and $| H | / | A | \in [e^{- 2 r}, e^{2 r}]$ .

Proof ▶

The first claim is a direct corollary of Lemma 13.14 and Corollary 13.15. To see the second claim, observe that Lemma 13.13 and Corollary 13.15 imply $ρ^{-} (U_{H}), ρ^{+} (U_{H}) \geq 0$ . Therefore

| H (U_{A}) - H (U_{H}) | = | ρ^{+} (U_{H}) - ρ^{-} (U_{H}) | \leq ρ^{-} (U_{H}) + ρ^{+} (U_{H}) = 2 ρ (U_{H}) \leq 2 r,

which implies the second claim.

Lemma 13.19 Rho invariant

✓

For any $s \in G$ , $ρ (X + s) = ρ (X)$ .

Proof ▶

Observe that by Lemma 13.6,

inf_{T} D_{K L} (X ‖ U_{A} + T) = inf_{T} D_{K L} (X + s ‖ U_{A} + T + s) = inf_{T^{'}} D_{K L} (X + s ‖ U_{A} + T^{'}) .

Lemma 13.20 Rho continuous

✓

$ρ (X)$ depends continuously on the distribution of $X$ .

Proof ▶

Lemma 13.21 Rho and sums

✓

If $X, Y$ are independent, one has

ρ^{-} (X + Y) \leq ρ^{-} (X)

ρ^{+} (X + Y) \leq ρ^{+} (X) + H [X + Y] - H [X]

and

ρ (X + Y) \leq ρ (X) + \frac{1}{2} (H [X + Y] - H [X]) .

Proof ▶

Definition 13.22 Conditional Rho functional

✓

We define $ρ (X | Y) := \sum_{y} P (Y = y) ρ (X | Y = y)$ .

Lemma 13.23 Conditional rho and translation

✓

For any $s \in G$ , $ρ (X + s | Y) = ρ (X | Y)$ .

Proof ▶

Lemma 13.24 Conditional rho and relabeling

✓

If $f$ is injective, then $ρ (X | f (Y)) = ρ (X | Y)$ .

Proof ▶

Lemma 13.25 Rho and conditioning

✓

If $X, Z$ are defined on the same space, one has

ρ^{-} (X | Z) \leq ρ^{-} (X) + H [X] - H [X | Z]

ρ^{+} (X | Z) \leq ρ^{+} (X)

and

ρ (X | Z) \leq ρ (X) + \frac{1}{2} (H [X] - H [X | Z]) .

Proof ▶

The following lemmas hold for $G = F_{2}^{n}$ .

Lemma 13.26 Rho and sums, symmetrized

✓

If $X, Y$ are independent, then

ρ (X + Y) \leq \frac{1}{2} (ρ (X) + ρ (Y) + d [X; Y]) .

Proof ▶

Apply Lemma 13.21 for $(X, Y)$ and $(Y, X)$ and take their average.

Lemma 13.27 Rho and conditioning, symmetrized

✓

If $X, Y$ are independent, then

ρ (X | X + Y) \leq \frac{1}{2} (ρ (X) + ρ (Y) + d [X; Y]) .

Proof ▶

First apply Lemma 13.25 to get $ρ (X | X + Y) \leq ρ (X) + \frac{1}{2} (H [X + Y] - H [Y])$ , and $ρ (Y | X + Y) \leq ρ (Y) + \frac{1}{2} (H [X + Y] - H [X])$ . Then apply Lemma 13.19 to get $ρ (Y | X + Y) = ρ (X | X + Y)$ and take the average of the two inequalities.

13.3 Studying a minimizer

Set $η < 1 / 8$ . In this section, consider $G = F_{2}^{n}$ .

Definition 13.28

✓

Given $G$ -valued random variables $X, Y$ , define

ϕ [X; Y] := d [X; Y] + η (ρ (X) + ρ (Y))

and define a $ϕ$ -minimizer to be a pair of random variables $X, Y$ which minimizes $ϕ [X; Y]$ .

Lemma 13.29

ϕ

-minimizers exist

✓

There exists a $ϕ$ -minimizer.

Proof ▶

Let $(X_{1}, X_{2})$ be a $ϕ$ -minimizer, and ${\tilde{X}}_{1}, {\tilde{X}}_{2}$ be independent copies of $X_{1}, X_{2}$ respectively. Similar to the original proof we define

I_{1} := I [X_{1} + X_{2} : {\tilde{X}}_{1} + X_{2} | X_{1} + X_{2} + {\tilde{X}}_{1} + {\tilde{X}}_{2}], I_{2} := I [X_{1} + X_{2} : X_{1} + {\tilde{X}}_{1} | X_{1} + X_{2} + {\tilde{X}}_{1} + {\tilde{X}}_{2}] .

First we need the $ϕ$ -minimizer variants of Lemma 6.12 and Lemma 6.16.

Lemma 13.30

✓

$I_{1} \leq 2 η d [X_{1}; X_{2}]$

Proof ▶

Similar to Lemma 6.12: get upper bounds for $d [X_{1}; X_{2}]$ by $ϕ [X_{1}; X_{2}] \leq ϕ [X_{1} + X_{2}; {\tilde{X}}_{1} + {\tilde{X}}_{2}]$ and $ϕ [X_{1}; X_{2}] \leq ϕ [X_{1} | X_{1} + X_{2}; {\tilde{X}}_{2} | {\tilde{X}}_{1} + {\tilde{X}}_{2}]$ , and then apply Lemma 6.8 to get an upper bound for $I_{1}$ .

Lemma 13.31

✓

$d [X_{1}; X_{1}] + d [X_{2}; X_{2}] = 2 d [X_{1}; X_{2}] + (I_{2} - I_{1})$ .

Proof ▶

Compare Lemma 6.8 with the identity obtained from applying Corollary 5.3 on $(X_{1}, {\tilde{X}}_{1}, X_{2}, {\tilde{X}}_{2})$ .

Lemma 13.32

✓

$I_{2} \leq 2 η d [X_{1}; X_{2}] + \frac{η}{1 - η} (2 η d [X_{1}; X_{2}] - I_{1})$ .

Proof ▶

First of all, by $ϕ [X_{1}; X_{2}] \leq ϕ [X_{1} + {\tilde{X}}_{1}; X_{2} + {\tilde{X}}_{2}]$ , $ϕ [X_{1}; X_{2}] \leq ϕ [X_{1} | X_{1} + {\tilde{X}}_{1}; X_{2} | X_{2} + {\tilde{X}}_{2}]$ , and the fibring identity obtained by applying Corollary 5.3 on $(X_{1}, X_{2}, {\tilde{X}}_{1}, {\tilde{X}}_{2})$ , we have $I_{2} \leq η (d [X_{1}; X_{1}] + d [X_{2}; X_{2}])$ . Then apply Lemma 13.31 to get $I_{2} \leq 2 η d [X_{1}; X_{2}] + η (I_{2} - I_{1})$ , and rearrange.

Next we need some inequalities for the endgame.

Lemma 13.33

✓

If $G$ -valued random variables $T_{1}, T_{2}, T_{3}$ satisfy $T_{1} + T_{2} + T_{3} = 0$ , then

d [X_{1}; X_{2}] \leq 3 I [T_{1} : T_{2}] + (2 H [T_{3}] - H [T_{1}] - H [T_{2}]) + η (ρ (T_{1} | T_{3}) + ρ (T_{2} | T_{3}) - ρ (X_{1}) - ρ (X_{2})) .

Proof ▶

Conditioned on every $T_{3} = t$ , $d [X_{1}; X_{2}] \leq d [T_{1} | T_{3} = t; T_{2} | T_{3} = t] + η (ρ (T_{1} | T_{3} = t) + ρ (T_{2} | T_{3} = t) - ρ (X_{1}) - ρ (X_{2}))$ by Definition 13.28. Then take the weighted average with weight $P (T_{3} = t)$ and then apply Lemma 3.23 to bound the RHS.

Lemma 13.34

✓

If $G$ -valued random variables $T_{1}, T_{2}, T_{3}$ satisfy $T_{1} + T_{2} + T_{3} = 0$ , then

d [X_{1}; X_{2}] \leq \sum_{1 \leq i < j \leq 3} I [T_{i} : T_{j}] + \frac{η}{3} \sum_{1 \leq i < j \leq 3} (ρ (T_{i} | T_{j}) + ρ (T_{j} | T_{i}) - ρ (X_{1}) - ρ (X_{2}))

Proof ▶

Take the average of Lemma 13.33 over all $6$ permutations of $T_{1}, T_{2}, T_{3}$ .

Lemma 13.35

✓

For independent random variables $Y_{1}, Y_{2}, Y_{3}, Y_{4}$ over $G$ , define $S := Y_{1} + Y_{2} + Y_{3} + Y_{4}$ , $T_{1} := Y_{1} + Y_{2}$ , $T_{2} := Y_{1} + Y_{3}$ . Then

ρ (T_{1} | T_{2}, S) + ρ (T_{2} | T_{1}, S) - \frac{1}{2} \sum_{i} ρ (Y_{i}) \leq \frac{1}{2} (d [Y_{1}; Y_{2}] + d [Y_{3}; Y_{4}] + d [Y_{1}; Y_{3}] + d [Y_{2}; Y_{4}]) .

Proof ▶

Let $T_{1}^{'} := Y_{3} + Y_{4}$ , $T_{2}^{'} := Y_{2} + Y_{4}$ . First note that

\begin{aligned} ρ (T_{1} | T_{2}, S) & \leq ρ (T_{1} | S) + \frac{1}{2} I (T_{1} : T_{2} ∣ S) \\ \leq \frac{1}{2} (ρ (T_{1}) + ρ (T_{1}^{'})) + \frac{1}{2} (d [T_{1}; T_{1}^{'}] + I (T_{1} : T_{2} ∣ S)) \\ \leq \frac{1}{4} \sum_{i} ρ (Y_{i}) + \frac{1}{4} (d [Y_{1}; Y_{2}] + d [Y_{3}; Y_{4}]) + \frac{1}{2} (d [T_{1}; T_{1}^{'}] + I (T_{1} : T_{2} ∣ S)) . \end{aligned}

by Lemma 13.25, Lemma 13.27, Lemma 13.26 respectively. On the other hand, observe that

\begin{aligned} ρ (T_{1} | T_{2}, S) & = ρ (Y_{1} + Y_{2} | T_{2}, T_{2}^{'}) \\ \leq \frac{1}{2} (ρ (Y_{1} | T_{2}) + ρ (Y_{2} | T_{2}^{'})) + \frac{1}{2} (d [Y_{1} | T_{2}; Y_{2} | T_{2}^{'}]) \\ \leq \frac{1}{4} \sum_{i} ρ (Y_{i}) + \frac{1}{4} (d [Y_{1}; Y_{3}] + d [Y_{2}; Y_{4}]) + \frac{1}{2} (d [Y_{1} | T_{2}; Y_{2} | T_{2}^{'}]) . \end{aligned}

by Lemma 13.24, Lemma 13.26, Lemma 13.27 respectively. By replacing $(Y_{1}, Y_{2}, Y_{3}, Y_{4})$ with $(Y_{1}, Y_{3}, Y_{2}, Y_{4})$ in the above inequalities, one has

ρ (T_{2} | T_{1}, S) \leq \frac{1}{4} \sum_{i} ρ (Y_{i}) + \frac{1}{4} (d [Y_{1}; Y_{3}] + d [Y_{2}; Y_{4}]) + \frac{1}{2} (d [T_{2}; T_{2}^{'}] + I (T_{1} : T_{2} ∣ S))

and

ρ (T_{2} | T_{1}, S) \leq \frac{1}{4} \sum_{i} ρ (Y_{i}) + \frac{1}{4} (d [Y_{1}; Y_{2}] + d [Y_{3}; Y_{4}]) + \frac{1}{2} (d [Y_{1} | T_{1}; Y_{3} | T_{1}^{'}]) .

Finally, take the sum of all four inequalities, apply Corollary 5.3 on $(Y_{1}, Y_{2}, Y_{3}, Y_{4})$ and $(Y_{1}, Y_{3}, Y_{2}, Y_{4})$ to rewrite the sum of last terms in the four inequalities, and divide the result by $2$ .

Lemma 13.36

✓

For independent random variables $Y_{1}, Y_{2}, Y_{3}, Y_{4}$ over $G$ , define $T_{1} := Y_{1} + Y_{2}, T_{2} := Y_{1} + Y_{3}, T_{3} := Y_{2} + Y_{3}$ and $S := Y_{1} + Y_{2} + Y_{3} + Y_{4}$ . Then

\sum_{1 \leq i < j \leq 3} (ρ (T_{i} | T_{j}, S) + ρ (T_{j} | T_{i}, S) - \frac{1}{2} \sum_{i} ρ (Y_{i})) \leq \sum_{1 \leq i < j \leq 4} d [Y_{i}; Y_{j}]

Proof ▶

Apply Lemma 13.35 on $(Y_{i}, Y_{j}, Y_{k}, Y_{4})$ for $(i, j, k) = (1, 2, 3), (2, 3, 1), (1, 3, 2)$ , and take the sum.

Proposition 13.37

✓

If $X_{1}, X_{2}$ is a $ϕ$ -minimizer, then $d [X_{1}; X_{2}] = 0$ .

Proof ▶

Consider $T_{1} := X_{1} + X_{2}, T_{2} := X_{1} + {\tilde{X}}_{1}, T_{3} := {\tilde{X}}_{1} + X_{2}$ , and $S = X_{1} + X_{2} + {\tilde{X}}_{1} + {\tilde{X}}_{2}$ . Note that $T_{1} + T_{2} + T_{3} = 0$ . First apply Lemma 13.34 on $(T_{1}, T_{2}, T_{3})$ when conditioned on $S$ to get

\begin{aligned} d [X_{1}; X_{2}] & \leq \sum_{1 \leq i < j \leq 3} I [T_{i} : T_{j} ∣ S] + \frac{η}{3} \sum_{1 \leq i < j \leq 3} (ρ (T_{i} | T_{j}, S) + ρ (T_{j} | T_{i}, S) - ρ (X_{1}) - ρ (X_{2})) \\ = (I_{1} + 2 I_{2}) + \frac{η}{3} \sum_{1 \leq i < j \leq 3} (ρ (T_{i} | T_{j}, S) + ρ (T_{j} | T_{i}, S) - ρ (X_{1}) - ρ (X_{2})) . \end{aligned}

Then apply Lemma 13.36 on $(X_{1}, X_{2}, {\tilde{X}}_{1}, {\tilde{X}}_{2})$ and get

\sum_{1 \leq i < j \leq 3} (ρ (T_{i} | T_{j}, S) + ρ (T_{j} | T_{i}, S) - ρ (X_{1}) - ρ (X_{2})) \leq (4 d [X_{1}; X_{2}] + d [X_{1}; X_{2}] + d [X_{2}; X_{2}]) = 6 d [X_{1}; X_{2}] + (I_{2} - I_{1})

by Lemma 13.31. Plug in the inequality above to (1), we get

d [X_{1}; X_{2}] \leq (I_{1} + 2 I_{2}) + 2 η d [X_{1}; X_{2}] + \frac{η}{3} (I_{2} - I_{1}) .

By Lemma 13.32 we can conclude that

d [X_{1}; X_{2}] \leq 8 η d [X_{1}; X_{2}] - \frac{3 - 10 η}{3 - 3 η} (2 η d [X_{1}; X_{2}] - I_{1}) .

Finally by Lemma 13.30 and $η < 1$ we get that the second term is $\leq 0$ , and thus $d [X_{1}; X_{2}] \leq 8 η d [X_{1}; X_{2}]$ . By the choice $η < 1 / 8$ and the non-negativity of $d$ we have $d [X_{1}; X_{2}] = 0$ .

Proposition 13.38

✓

For any random variables $Y_{1}, Y_{2}$ , there exist a subgroup $H$ such that

2 ρ (U_{H}) \leq ρ (Y_{1}) + ρ (Y_{2}) + 8 d [Y_{1}; Y_{2}] .

Proof ▶

Let $X_{1}, X_{2}$ be a $ϕ$ -minimizer. By Proposition 13.37 $d [X_{1}; X_{2}] = 0$ , which by Definition 13.28 implies $ρ (X_{1}) + ρ (X_{2}) \leq ρ (Y_{1}) + ρ (Y_{2}) + \frac{1}{η} d [Y_{1}; Y_{2}]$ for every $η < 1 / 8$ . Take the limit at $η = 1 / 8$ to get $ρ (X_{1}) + ρ (X_{2}) \leq ρ (Y_{1}) + ρ (Y_{2}) + 8 d [Y_{1}; Y_{2}]$ . By Lemma 3.18 and Lemma 3.15 we have $d [X_{1}; X_{1}] = d [X_{2}; X_{2}] = 0$ , and by Lemma 4.4 there are $H_{1} := Sym [X_{1}], H_{2} := Sym [X_{2}]$ such that $X_{1} = U_{H_{1}} + x_{1}$ and $X_{2} = U_{H_{2}} + x_{2}$ for some $x_{2}$ . By Lemma 13.19 we get $ρ (U_{H_{1}}) + ρ (U_{H_{2}}) \leq ρ (Y_{1}) + ρ (Y_{2}) + 8 d [Y_{1}; Y_{2}]$ , and thus the claim holds for $H = H_{1}$ or $H = H_{2}$ .

Corollary 13.39

✓

If $| A + A | \leq K | A |$ , then there exists a subgroup $H$ and $t \in G$ such that $| A \cap (H + t) | \geq K^{- 4} \sqrt{| A | | H |}$ , and $| H | / | A | \in [K^{- 8}, K^{8}]$ .

Proof ▶

Apply Proposition 13.38 on $U_{A}, U_{A}$ to get a subspace such that $2 ρ (U_{H}) \leq 2 ρ (U_{A}) + 8 d [U_{A}; U_{A}]$ . Recall that $d [U_{A}; U_{A}] \leq \log K$ as proved in Lemma 7.2, and $ρ (U_{A}) = 0$ by Lemma 13.17. Therefore $ρ (U_{H}) \leq 4 \log (K)$ . The claim then follows from Lemma 13.18.

Corollary 13.40

✓

If $| A + A | \leq K | A |$ , then there exist a subgroup $H$ and a subset $c$ of $G$ with $A \subseteq c + H$ , such that $| c | \leq K^{5} | A |^{1 / 2} / | H |^{1 / 2}$ and $| H | / | A | \in [K^{- 8}, K^{8}]$ .

Proof ▶

Theorem 13.41 PFR with

C = 9

✓

If $A \subset F_{2}^{n}$ is finite non-empty with $| A + A | \leq K | A |$ , then there exists a subgroup $H$ of $F_{2}^{n}$ with $| H | \leq | A |$ such that $A$ can be covered by at most $2 K^{9}$ translates of $H$ .

Proof ▶