2 Shannon entropy inequalities

Random variables in this paper are measurable maps $X : Ω \to S$ from a probability space $Ω$ to a measurable space $S$ , and called $S$ -valued random variables. In many cases we will assume that singletons in $S$ are measurable. Often we will restrict further to the case when $S$ is finite with the discrete $σ$ -algebra, which of course implies that $S$ has measurable singletons.

Definition 2.1 Entropy

✓

If $X$ is an $S$ -valued random variable, the entropy $H [X]$ of $X$ is defined

H [X] := \sum_{s \in S} P [X = x] \log \frac{1}{P [X = x]}

with the convention that $0 \log \frac{1}{0} = 0$ .

Lemma 2.2 Entropy and relabeling

✓

If $X : Ω \to S$ and $Y : Ω \to T$ are random variables, and $Y = f (X)$ for some injection $f : S \to T$ , then $H [X] = H [Y]$ .
If $X : Ω \to S$ and $Y : Ω \to T$ are random variables, and $Y = f (X)$ and $X = g (Y)$ for some functions $f : S \to T$ , $g : T \to S$ , then $H [X] = H [Y]$ .

Proof ▶

Lemma 2.3 Jensen bound

✓

If $X$ is an $S$ -valued random variable, then $H [X] \leq \log | S |$ .

Proof ▶

Definition 2.4 Uniform distribution

✓

If $H$ is a subset of $S$ , an $S$ -random variable $X$ is said to be uniformly distributed on $H$ if $P [X = s] = 1 / | H |$ for $s \in X$ and $P [X = s] = 0$ otherwise.

Lemma 2.5 Uniform distributions exist

✓

Given a finite non-empty subset $H$ of a set $S$ , there exists a random variable $X$ (on some probability space) that is uniformly distributed on $H$ .

Proof ▶

Lemma 2.6 Entropy of uniform random variable

✓

If $X$ is $S$ -valued random variable, then $H [X] = \log | S |$ if and only if $X$ is uniformly distributed on $S$ .

Proof ▶

Lemma 2.7 Entropy of uniform random variable, II

✓

If $X$ is uniformly distributed on $H$ , then, then $H [X] = \log | H |$ .

Proof ▶

Lemma 2.8 Bounded entropy implies concentration

✓

If $X$ is an $S$ -valued random variable, then there exists $s \in S$ such that $P [X = s] \geq \exp (- H [X])$ .

Proof ▶

We have

H [X] = \sum_{s \in S} P [X = s] \log \frac{1}{P [X = s]} \geq min_{s \in S} \log \frac{1}{P [X = s]}

and the claim follows.

We use $X, Y$ to denote the pair $ω \mapsto (X (ω), Y (ω)$ ).

Lemma 2.9 Commutativity and associativity of joint entropy

✓

If $X : Ω \to S$ , $Y : Ω \to T$ , and $Z : Ω \to U$ are random variables, then $H [X, Y] = H [Y, X]$ and $H [X, (Y, Z)] = H [(X, Y), Z]$ .

Proof ▶

Set up an injection from $(X, Y)$ to $(Y, X)$ and use Lemma 2.2 for the first claim. Similarly for the second claim.

Definition 2.10 Conditioned event

✓

If $X : Ω \to S$ is an $S$ -valued random variable and $E$ is an event in $Ω$ , then the conditioned event $(X | E)$ is defined to be the same random variable as $X$ , but now the ambient probability measure has been conditioned to $E$ .

Note: it may happen that $E$ has zero measure. In which case, the ambient probability measure should be replaced with a zero measure. (In our formalization we achieve this by working with arbitrary measures, and normalizing them to be probability measures if possible, else using the zero measure. Conditioning is also formalized using existing Mathlib definitions.)

Definition 2.11 Conditional entropy

✓

If $X : Ω \to S$ and $Y : Ω \to T$ are random variables, the conditional entropy $H [X | Y]$ is defined as

H [X | Y] := \sum_{y \in Y} P [Y = y] H [(X | Y = y)] .

Lemma 2.12 Conditional entropy and relabeling

✓

If $X : Ω \to S$ , $Y : Ω \to T$ , and $Z : Ω \to U$ are random variables, and $Y = f (X, Z)$ almost surely for some map $f : S \times U \to T$ that is injective for each fixed $U$ , then $H [X | Z] = H [Y | Z]$ .

Similarly, if $g : T \to U$ is injective, then $H [X | g (Y)] = H [X | Y]$ .

Proof ▶

Lemma 2.13 Chain rule

✓

If $X : Ω \to S$ and $Y : Ω \to T$ are random variables, then

H [X, Y] = H [Y] + H [X | Y] .

Proof ▶

Lemma 2.14 Conditional chain rule

✓

If $X : Ω \to S$ , $Y : Ω \to T$ , $Z : Ω \to U$ are random variables, then

H [X, Y | Z] = H [Y | Z] + H [X | Y, Z] .

Proof ▶

For each $z \in U$ , we can apply Lemma 2.13 to the random variables $(X | Z = z)$ and $(Y | Z = z)$ to obtain

H [(X | Z = z), (Y | Z = z)] = H [Y | Z = z] + H [(X | Z = z) | (Y | Z = z)] .

Now multiply by $P [Z = z]$ and sum. Some helper lemmas may be needed to get to the form above. This sort of “average over conditioning” argument to get conditional entropy inequalities from unconditional ones is commonly used in this paper.

Definition 2.15 Mutual information

✓

If $X : Ω \to S$ , $Y : Ω \to T$ are random variables, then

I [X : Y] := H [X] + H [Y] - H [X, Y] .

Lemma 2.16 Alternative formulae for mutual information

✓

With notation as above, we have

I [X : Y] = I [Y : X]

I [X : Y] = H [X] - H [X | Y]

I [X : Y] = H [Y] - H [Y | X]

Proof ▶

Lemma 2.17 Nonnegativity of mutual information

✓

We have $I [X : Y] \geq 0$ .

Proof ▶

Corollary 2.18 Subadditivity

✓

With notation as above, we have $H [X, Y] \leq H [X] + H [Y]$ .

Proof ▶

Corollary 2.19 Conditioning reduces entropy

✓

With notation as above, we have $H [X | Y] \leq H [X]$ .

Proof ▶

Corollary 2.20 Submodularity

✓

With three random variables $X, Y, Z$ , one has $H [X | Y, Z] \leq H [X | Z]$ .

Proof ▶

Corollary 2.21 Alternate form of submodularity

✓

With three random variables $X, Y, Z$ , one has

H [X, Y, Z] + H [Z] \leq H [X, Z] + H [Y, Z] .

Proof ▶

Definition 2.22 Independent random variables

✓

Two random variables $X : Ω \to S$ and $Y : Ω \to T$ are independent if the law of $(X, Y)$ is the product of the law of $X$ and the law of $Y$ . Similarly for more than two variables.

Lemma 2.23 Vanishing of mutual information

✓

If $X, Y$ are random variables, then $I [X : Y] = 0$ if and only if $X, Y$ are independent.

Proof ▶

Corollary 2.24 Additivity of entropy

✓

If $X, Y$ are random variables, then $H [X, Y] = H [X] + H [Y]$ if and only if $X, Y$ are independent.

Proof ▶

Definition 2.25 Conditional mutual information

✓

If $X, Y, Z$ are random variables, with $Z$ $U$ -valued, then

I [X : Y | Z] := \sum_{z \in U} P [Z = z] I [(X | Z = z) : (Y | Z = z)] .

Lemma 2.26 Alternate formula for conditional mutual information

✓

We have

I [X : Y | Z] := H [X | Z] + H [Y | Z] - H [X, Y | Z]

and

I [X : Y | Z] := H [X | Z] - H [X | Y, Z] .

Proof ▶

Lemma 2.27 Nonnegativity of conditional mutual information

✓

If $X, Y, Z$ are random variables, then $I [X : Y | Z] \geq 0$ .

Proof ▶

Definition 2.28 Conditionally independent random variables

✓

Two random variables $X : Ω \to S$ and $Y : Ω \to T$ are conditionally independent relative to another random variable $Z : Ω \to U$ if $P [X = s \land Y = t | Z = u] = P [X = s | Z = u] P [Y = t | Z = u]$ for all $s \in S, t \in T, u \in U$ . (We won’t need conditional independence for more variables than this.)

Lemma 2.29 Vanishing conditional mutual information

✓

If $X, Y, Z$ are random variables, then $I [X : Y | Z] = 0$ iff $X, Y$ are conditionally independent over $Z$ .

Proof ▶

Corollary 2.30 Entropy of conditionally independent variables

✓

If $X, Y$ are conditionally independent over $Z$ , then

H [X, Y, Z] = H [X, Z] + H [Y, Z] - H [Z] .

Proof ▶