PFR

2 Shannon entropy inequalities

Random variables in this paper are measurable maps X:ΩS from a probability space Ω to a measurable space S, and called S-valued random variables. In many cases we will assume that singletons in S are measurable. Often we will restrict further to the case when S is finite with the discrete σ-algebra, which of course implies that S has measurable singletons.

Definition 2.1 Entropy
#

If X is an S-valued random variable, the entropy H[X] of X is defined

H[X]:=sSP[X=x]log1P[X=x]

with the convention that 0log10=0.

Lemma 2.2 Entropy and relabeling
  • If X:ΩS and Y:ΩT are random variables, and Y=f(X) for some injection f:ST, then H[X]=H[Y].

  • If X:ΩS and Y:ΩT are random variables, and Y=f(X) and X=g(Y) for some functions f:ST, g:TS, then H[X]=H[Y].

Proof

If X is an S-valued random variable, then H[X]log|S|.

Proof
Definition 2.4 Uniform distribution
#

If H is a subset of S, an S-random variable X is said to be uniformly distributed on H if P[X=s]=1/|H| for sX and P[X=s]=0 otherwise.

Lemma 2.5 Uniform distributions exist

Given a finite non-empty subset H of a set S, there exists a random variable X (on some probability space) that is uniformly distributed on H.

Proof
Lemma 2.6 Entropy of uniform random variable

If X is S-valued random variable, then H[X]=log|S| if and only if X is uniformly distributed on S.

Proof
Lemma 2.7 Entropy of uniform random variable, II

If X is uniformly distributed on H, then, then H[X]=log|H|.

Proof
Lemma 2.8 Bounded entropy implies concentration

If X is an S-valued random variable, then there exists sS such that P[X=s]exp(H[X]).

Proof

We use X,Y to denote the pair ω(X(ω),Y(ω)).

Lemma 2.9 Commutativity and associativity of joint entropy

If X:ΩS, Y:ΩT, and Z:ΩU are random variables, then H[X,Y]=H[Y,X] and H[X,(Y,Z)]=H[(X,Y),Z].

Proof
Definition 2.10 Conditioned event
#

If X:ΩS is an S-valued random variable and E is an event in Ω, then the conditioned event (X|E) is defined to be the same random variable as X, but now the ambient probability measure has been conditioned to E.

Note: it may happen that E has zero measure. In which case, the ambient probability measure should be replaced with a zero measure. (In our formalization we achieve this by working with arbitrary measures, and normalizing them to be probability measures if possible, else using the zero measure. Conditioning is also formalized using existing Mathlib definitions.)

Definition 2.11 Conditional entropy

If X:ΩS and Y:ΩT are random variables, the conditional entropy H[X|Y] is defined as

H[X|Y]:=yYP[Y=y]H[(X|Y=y)].
Lemma 2.12 Conditional entropy and relabeling

If X:ΩS, Y:ΩT, and Z:ΩU are random variables, and Y=f(X,Z) almost surely for some map f:S×UT that is injective for each fixed U, then H[X|Z]=H[Y|Z].

Similarly, if g:TU is injective, then H[X|g(Y)]=H[X|Y].

Proof
Lemma 2.13 Chain rule

If X:ΩS and Y:ΩT are random variables, then

H[X,Y]=H[Y]+H[X|Y].
Proof
Lemma 2.14 Conditional chain rule

If X:ΩS, Y:ΩT, Z:ΩU are random variables, then

H[X,Y|Z]=H[Y|Z]+H[X|Y,Z].
Proof
Definition 2.15 Mutual information

If X:ΩS, Y:ΩT are random variables, then

I[X:Y]:=H[X]+H[Y]H[X,Y].
Lemma 2.16 Alternative formulae for mutual information

With notation as above, we have

I[X:Y]=I[Y:X]
I[X:Y]=H[X]H[X|Y]
I[X:Y]=H[Y]H[Y|X]
Proof
Lemma 2.17 Nonnegativity of mutual information
#

We have I[X:Y]0.

Proof
Corollary 2.18 Subadditivity
#

With notation as above, we have H[X,Y]H[X]+H[Y].

Proof
Corollary 2.19 Conditioning reduces entropy
#

With notation as above, we have H[X|Y]H[X].

Proof
Corollary 2.20 Submodularity

With three random variables X,Y,Z, one has H[X|Y,Z]H[X|Z].

Proof
Corollary 2.21 Alternate form of submodularity

With three random variables X,Y,Z, one has

H[X,Y,Z]+H[Z]H[X,Z]+H[Y,Z].
Proof
Definition 2.22 Independent random variables
#

Two random variables X:ΩS and Y:ΩT are independent if the law of (X,Y) is the product of the law of X and the law of Y. Similarly for more than two variables.

Lemma 2.23 Vanishing of mutual information

If X,Y are random variables, then I[X:Y]=0 if and only if X,Y are independent.

Proof
Corollary 2.24 Additivity of entropy
#

If X,Y are random variables, then H[X,Y]=H[X]+H[Y] if and only if X,Y are independent.

Proof
Definition 2.25 Conditional mutual information

If X,Y,Z are random variables, with Z U-valued, then

I[X:Y|Z]:=zUP[Z=z]I[(X|Z=z):(Y|Z=z)].
Lemma 2.26 Alternate formula for conditional mutual information

We have

I[X:Y|Z]:=H[X|Z]+H[Y|Z]H[X,Y|Z]

and

I[X:Y|Z]:=H[X|Z]H[X|Y,Z].
Proof
Lemma 2.27 Nonnegativity of conditional mutual information

If X,Y,Z are random variables, then I[X:Y|Z]0.

Proof
Definition 2.28 Conditionally independent random variables
#

Two random variables X:ΩS and Y:ΩT are conditionally independent relative to another random variable Z:ΩU if P[X=sY=t|Z=u]=P[X=s|Z=u]P[Y=t|Z=u] for all sS,tT,uU. (We won’t need conditional independence for more variables than this.)

Lemma 2.29 Vanishing conditional mutual information
#

If X,Y,Z are random variables, then I[X:Y|Z]=0 iff X,Y are conditionally independent over Z.

Proof
Corollary 2.30 Entropy of conditionally independent variables
#

If X,Y are conditionally independent over Z, then

H[X,Y,Z]=H[X,Z]+H[Y,Z]H[Z].
Proof