2 Shannon entropy inequalities
Random variables in this paper are measurable maps from a probability space to a measurable space , and called -valued random variables. In many cases we will assume that singletons in are measurable. Often we will restrict further to the case when is finite with the discrete -algebra, which of course implies that has measurable singletons.
Definition
2.1
Entropy
If is an -valued random variable, the entropy of is defined
with the convention that .
Lemma
2.2
Entropy and relabeling
If and are random variables, and for some injection , then .
If and are random variables, and and for some functions , , then .
Proof
▶
Expand out both entropies and rearrange.
Lemma
2.3
Jensen bound
If is an -valued random variable, then .
Proof
▶
This is a direct consequence of Lemma 1.2.
Lemma
2.5
Uniform distributions exist
Given a finite non-empty subset of a set , there exists a random variable (on some probability space) that is uniformly distributed on .
Proof
▶
Direct computation in one direction. Converse direction needs Lemma 1.3.
Lemma
2.8
Bounded entropy implies concentration
If is an -valued random variable, then there exists such that .
Proof
▶
We have
and the claim follows.
We use to denote the pair ).
Lemma
2.9
Commutativity and associativity of joint entropy
If , , and are random variables, then and .
Proof
▶
Set up an injection from to and use Lemma 2.2 for the first claim. Similarly for the second claim.
Definition
2.10
Conditioned event
If is an -valued random variable and is an event in , then the conditioned event is defined to be the same random variable as , but now the ambient probability measure has been conditioned to .
Note: it may happen that has zero measure. In which case, the ambient probability measure should be replaced with a zero measure. (In our formalization we achieve this by working with arbitrary measures, and normalizing them to be probability measures if possible, else using the zero measure. Conditioning is also formalized using existing Mathlib definitions.)
Definition
2.11
Conditional entropy
If and are random variables, the conditional entropy is defined as
Lemma
2.12
Conditional entropy and relabeling
If , , and are random variables, and almost surely for some map that is injective for each fixed , then .
Similarly, if is injective, then .
Lemma
2.13
Chain rule
If and are random variables, then
Lemma
2.14
Conditional chain rule
If , , are random variables, then
Proof
▶
For each , we can apply Lemma 2.13 to the random variables and to obtain
Now multiply by and sum. Some helper lemmas may be needed to get to the form above. This sort of “average over conditioning” argument to get conditional entropy inequalities from unconditional ones is commonly used in this paper.
Lemma
2.16
Alternative formulae for mutual information
With notation as above, we have
Proof
▶
Immediate from Lemmas 2.9, 2.13.
Lemma
2.17
Nonnegativity of mutual information
Corollary
2.18
Subadditivity
With notation as above, we have .
Corollary
2.19
Conditioning reduces entropy
With notation as above, we have .
Corollary
2.20
Submodularity
With three random variables , one has .
Proof
▶
Apply the “averaging over conditioning” argument to Corollary 2.19.
Corollary
2.21
Alternate form of submodularity
With three random variables , one has
Definition
2.22
Independent random variables
Two random variables and are independent if the law of is the product of the law of and the law of . Similarly for more than two variables.
Lemma
2.23
Vanishing of mutual information
If are random variables, then if and only if are independent.
Proof
▶
An application of the equality case of Jensen’s inequality, Lemma 1.3.
Corollary
2.24
Additivity of entropy
If are random variables, then if and only if are independent.
Definition
2.25
Conditional mutual information
If are random variables, with -valued, then
Lemma
2.26
Alternate formula for conditional mutual information
Lemma
2.27
Nonnegativity of conditional mutual information
If are random variables, then .
Definition
2.28
Conditionally independent random variables
Two random variables and are conditionally independent relative to another random variable if for all . (We won’t need conditional independence for more variables than this.)
Lemma
2.29
Vanishing conditional mutual information
If are random variables, then iff are conditionally independent over .
Corollary
2.30
Entropy of conditionally independent variables
If are conditionally independent over , then