PFR

12 The m-torsion case

12.1 Data processing inequality

Lemma 12.1 Data processing for a single variable
#

Let X be a random variable. Then for any function f on the range of X, one has H[f(X)]H[X].

Proof
Lemma 12.2 One-sided unconditional data processing inequality
#

Let X,Y be random variables. For any function f,g on the range of X, we have I[f(X):Y]I[X:Y].

Proof
Lemma 12.3 Unconditional data processing inequality
#

Let X,Y be random variables. For any functions f,g on the ranges of X,Y respectively, we have I[f(X):g(Y)]I[X:Y].

Proof
Lemma 12.4 Data processing inequality
#

Let X,Y,Z. For any functions f,g on the ranges of X,Y respectively, we have I[f(X):g(Y)|Z]I[X:Y|Z].

Proof

12.2 More Ruzsa distance estimates

Let G be an additive group.

Lemma 12.5 Flipping a sign
#

If X,Y are G-valued, then

d[X;Y]3d[X;Y].
Proof
Lemma 12.6 Kaimonovich–Vershik–Madiman inequality
#

If n0 and X,Y1,,Yn are jointly independent G-valued random variables, then

H[X+i=1nYi]H[X]i=1n(H[X+Yi]H[X]).
Proof
Lemma 12.7 Kaimonovich–Vershik–Madiman inequality, II
#

If n1 and X,Y1,,Yn are jointly independent G-valued random variables, then

d[X;i=1nYi]2i=1nd[X;Yi].
Proof
Lemma 12.8 Kaimonovich–Vershik–Madiman inequality, III
#

If n1 and X,Y1,,Yn are jointly independent G-valued random variables, then

d[X;i=1nYi]d[X;Y1]+12(H[i=1nYi]H[Y1]).
Proof
Lemma 12.9 Comparing sums
#

Let (Xi)1im and (Yj)1jl be tuples of jointly independent random variables (so the X’s and Y’s are also independent of each other), and let f:{1,,l}{1,,m} be a function, then

H[j=1lYj]H[i=1mXi]+j=1l(H[YjXf(j)]H[Xf(j)]).
Proof
Lemma 12.10 Sums of dilates I
#

Let X,Y,X be independent G-valued random variables, with X a copy of X, and let a be an integer. Then

H[X(a+1)Y]H[XaY]+H[XYX]H[X]

and

H[X(a1)Y]H[XaY]+H[XYX]H[X].
Proof
Lemma 12.11 Sums of dilates II
#

Let X,Y be independent G-valued random variables, and let a be an integer. Then

H[XaY]H[X]4|a|d[X;Y].
Proof

We remark that in the paper [GGMT2024] the variant estimate

H[XaY]H[X](4+10log2|a|)d[X;Y]

is also proven by a similar method. This variant is superior for |a|9 (or |a|=7); but we will not need this estimate here.

12.3 Multidistance

We continue to let G be an abelian group.

Definition 12.12 Multidistance
#

Let m be a positive integer, and let X[m]=(Xi)1im be an m-tuple of G-valued random variables Xi. Then we define

D[X[m]]:=H[i=1mX~i]1mi=1mH[X~i],

where the X~i are independent copies of the Xi.

Lemma 12.13 Multidistance of copy
#

If X[m]=(Xi)1im and Y[m]=(Yi)1im are such that Xi and Yi have the same distribution for each i, then D[X[m]]=D[Y[m]].

Proof
Lemma 12.14 Multidistance of independent variables
#

If X[m]=(Xi)1im are jointly independent, then D[X[m]]=H[i=1mXi]1mi=1mH[Xi].

Proof
Lemma 12.15 Nonnegativity
#

For any such tuple, we have D[X[m]]0.

Proof
Lemma 12.16 Relabeling
#

If ϕ:{1,,m}{1,,m} is a bijection, then D[X[m]]=D[(Xϕ(j))1jm].

Proof
Lemma 12.17 Multidistance and Ruzsa distance, I

Let m2, and let X[m] be a tuple of G-valued random variables. Then

1j,km:jkd[Xj;Xk]m(m1)D[X[m]].
Proof
Lemma 12.18 Multidistance and Ruzsa distance, II

Let m2, and let X[m] be a tuple of G-valued random variables. Then

j=1md[Xj;Xj]2mD[X[m]].
Proof
Lemma 12.19 Multidistance and Ruzsa distance, III

Let m2, and let X[m] be a tuple of G-valued random variables. If the Xi all have the same distribution, then D[X[m]]md[Xi;Xi] for any 1im.

Proof
Lemma 12.20 Multidistance and Ruzsa distance, IV

Let m2, and let X[m] be a tuple of independent G-valued random variables. Let W:=i=1mXi. Then

d[W;W]2D[Xi].
Proof
Proposition 12.21 Vanishing

If D[X[m]]=0, then for each 1im there is a finite subgroup HiG such that d[Xi;UHi]=0.

Proof

With more effort one can show that Hi is independent of i, but we will not need to do so here.

12.4 The tau functional

Fix m2, and a reference variable X0 in G.

Definition 12.22 η
#

We set η:=132m3.

Definition 12.23 τ-functional
#

If (Xi)1im is a tuple, we define its τ-functional

τ[(Xi)1im]:=D[(Xi)1im]+ηi=1md[Xi;X0].
Definition 12.24 τ-minimizer
#

A τ-minimizer is a tuple (Xi)1im that minimizes the τ-functional among all tuples of G-valued random variables.

Proposition 12.25 Existence of τ-minimizer
#

If G is finite, then a τ-minimizer exists.

Proof
Proposition 12.26 Minimizer close to reference variables
#

If (Xi)1im is a τ-minimizer, then i=1md[Xi;X0]2mηd[X0;X0].

Proof
Lemma 12.27 Lower bound on multidistance
#

If (Xi)1im is a τ-minimizer, and k:=D[(Xi)1im], then for any other tuple (Xi)1im, one has

kD[(Xi)1im]ηi=1md[Xi;Xi].
Proof
Definition 12.28 Conditional multidistance
#

If X[m]=(Xi)1im and Y[m]=(Yi)1im are tuples of random variables, with the Xi being G-valued (but the Yi need not be), then we define

D[X[m]|Y[m]]=(yi)1im(1impYi(yi))D[(Xi|Yi=yi)1im]
2

where each yi ranges over the support of pYi for 1im.

Lemma 12.29 Alternate form of conditional multidistance
#

If the (Xi,Yi) are independent,

D[X[m]|Y[m]]:=H[i=1mXi|(Yj)1jm]1mi=1mH[Xi|Yi].
3

Proof
Lemma 12.30 Conditional multidistance nonnegative
#

If X[m]=(Xi)1im and Y[m]=(Yi)1im are tuples of random variables, then D[X[m]|Y[m]]0.

Proof
Lemma 12.31 Lower bound on conditional multidistance
#

If (Xi)1im is a τ-minimizer, and k:=D[(Xi)1im], then for any other tuples (Xi)1im and (Yi)1im with the Xi G-valued, one has

kD[(Xi)1im|(Yi)1im]ηi=1md[Xi;Xi|Yi].
Proof
Corollary 12.32 Lower bound on conditional multidistance, II
#

With the notation of the previous lemma, we have

kD[X[m]|Y[m]]ηi=1md[Xσ(i);Xi|Yi]
4

for any permutation σ:{1,,m}{1,,m}.

Proof

12.5 The multidistance chain rule

Lemma 12.33 Multidistance chain rule

Let π:GH be a homomorphism of abelian groups and let X[m] be a tuple of jointly independent G-valued random variables. Then D[X[m]] is equal to

D[X[m]|π(X[m])]+D[π(X[m])]+I[i=1mXi:π(X[m])|π(i=1mXi)]
5

where π(X[m]):=(π(Xi))1im.

Proof

We will need to iterate the multidistance chain rule, so it is convenient to observe a conditional version of this rule, as follows.

Lemma 12.34 Conditional multidistance chain rule
#

Let π:GH be a homomorphism of abelian groups. Let I be a finite index set and let X[m] be a tuple of G-valued random variables. Let Y[m] be another tuple of random variables (not necessarily G-valued). Suppose that the pairs (Xi,Yi) are jointly independent of one another (but Xi need not be independent of Yi). Then

D[X[m]|Y[m]]=D[X[m]|π(X[m]),Y[m]]+D[π(X[m])|Y[m]]+I[i=1mXi:π(X[m])|π(i=1mXi),Y[m]].
Proof

We can iterate the above lemma as follows.

Lemma 12.35

Let m be a positive integer. Suppose one has a sequence

GmGm1G1G0={0}
9

of homomorphisms between abelian groups G0,,Gm, and for each d=0,,m, let πd:GmGd be the homomorphism from Gm to Gd arising from this sequence by composition (so for instance πm is the identity homomorphism and π0 is the zero homomorphism). Let X[m]=(Xi)1im be a jointly independent tuple of Gm-valued random variables. Then

D[X[m]]=d=1mD[πd(X[m])|πd1(X[m])]+d=1m1I[iXi:πd(X[m])|πd(iXi),πd1(X[m])].
10

In particular, by Lemma 2.27,

D[X[m]]d=1mD[πd(X[m])|πd1(X[m])]+I[iXi:π1(X[m])|π1(iXi)].
Proof

In our application we will need the following special case of the above lemma.

Corollary 12.36
#

Let G be an abelian group and let m2. Suppose that Xi,j, 1i,jm, are independent G-valued random variables. Then

I[(i=1mXi,j)j=1m:(j=1mXi,j)i=1m|i=1mj=1mXi,j]j=1m1(D[(Xi,j)i=1m]D[(Xi,j)i=1m|(Xi,j++Xi,m)i=1m])+D[(Xi,m)i=1m]D[(j=1mXi,j)i=1m],

where all the multidistances here involve the indexing set {1,,m}.

Proof

12.6 Bounding the mutual information

As before, G is an abelian group, and m2. We let X[m]=(Xi)i=1m be a τ-minimizer.

Proposition 12.37 Bounding mutual information
#

Suppose that Xi,j, 1i,jm, are jointly independent G-valued random variables, such that for each j=1,,m, the random variables (Xi,j)i=1m coincide in distribution with some permutation of X[m]. Write

I:=I[(i=1mXi,j)j=1m:(j=1mXi,j)i=1m|i=1mj=1mXi,j].

Then

I4m2ηk.
14

Proof

12.7 Endgame

Now let m2, let G be an m-torsion abelian group, and let (Xi)1im be a τ-minimizer.

Definition 12.38 Additional random variables
#

By a slight abuse of notation, we identify Z/mZ and {1,,m} in the obvious way, and let Yi,j be an independent copy of Xi for i,jZ/mZ. Then also define:

W:=i,jZ/mZYi,j

and

Z1:=i,jZ/mZiYi,j,  Z2:=i,jZ/mZjYi,j,  Z3:=i,jZ/mZ(ij)Yi,j.

The addition (ij) takes place over Z/mZ. Note that, because we are assuming G is m-torsion, it is well-defined to multiply elements of G by elements of Z/mZ. We will also define for i,j,rZ/mZ the variables

Pi:=jZ/mZYi,j,Qj:=iZ/mZYi,j,Rr:=i,jZ/mZi+j=rYi,j.
21

Lemma 12.39 Zero-sum
#

We have

Z1+Z2+Z3=0
22

Proof
Proposition 12.40 Mutual information bound

We have

I[Z1:Z2|W],I[Z2:Z3|W],I[Z1:Z3|W]t

where

t:=4m2ηk.
23

Proof
Lemma 12.41 Entropy of W

We have H[W](2m1)k+1mi=1mH[Xi].

Proof
Lemma 12.42 Entropy of Z2

We have H[Z2](8m216m+1)k+1mi=1mH[Xi].

Proof
Lemma 12.43 Mutual information bound
#

We have I[W:Z2]2(m1)k.

Proof
Lemma 12.44 Distance bound
#

We have i=1md[Xi;Z2|W]8(m3m2)k.

Proof
Lemma 12.45 Application of BSG
#

Let G be an abelian group, let (T1,T2,T3) be a G3-valued random variable such that T1+T2+T3=0 holds identically, and write

δ:=I[T1:T2]+I[T1:T3]+I[T2:T3].

Let Y1,,Yn be some further G-valued random variables and let α>0 be a constant. Then there exists a random variable U such that

d[U;U]+αi=1nd[Yi;U](2+αn2)δ+αi=1nd[Yi;T2].
27

Proof
Proposition 12.46 Vanishing entropy

We have k=0.

Proof

12.8 Wrapping up

Theorem 12.47 Entropy form of PFR
#

Suppose that G is a finite abelian group of torsion m. Suppose that X is a G-valued random variable. Then there exists a subgroup HG such that

d[X;UH]64m3d[X;X].
Proof
Lemma 12.48
#

Suppose that G is a finite abelian group of torsion m. If AG is non-empty and |A+A|K|A|, then A can be covered by at most K(64m3+2)/2|A|1/2/|H|1/2 translates of a subspace H of G with

|H|/|A|[K64m3,K64m3].
33

Proof
Theorem 12.49 PFR
#

Suppose that G is a finite abelian group of torsion m. If AG is non-empty and |A+A|K|A|, then A can be covered by most mK96m3+2 translates of a subspace H of G with |H||A|.

Proof
  1. In fact, by permuting the variables (Yi,j)i,jZ/mZ, one can see that the random variables (W,Z1,Z2) and (W,Z1,Z3) have the same distribution, so this is in some sense identical to – and can be deduced from – the first application.