Entropy and conditional entropy #
Main definitions #
entropy
: entropy of a random variable, defined asmeasureEntropy (volume.map X)
condEntropy
: conditional entropy of a random variableX
w.r.t. another oneY
mutualInfo
: mutual information of two random variables
Main statements #
chain_rule
: $H[⟨X, Y⟩] = H[Y] + H[X | Y]$entropy_cond_le_entropy
: $H[X | Y] ≤ H[X]$. (Chain rule another way.)entropy_triple_add_entropy_le
: $H[X, Y, Z] + H[Z] ≤ H[X, Z] + H[Y, Z]$. (Submodularity of entropy.)
Notations #
H[X] = entropy X
H[X | Y ← y] = Hm[(ℙ[|Y ← y]).map X]
H[X | Y] = condEntropy X Y
, such thatH[X | Y] = (volume.map Y)[fun y ↦ H[X | Y ← y]]
I[X : Y] = mutualInfo X Y
All notations have variants where we can specify the measure (which is otherwise
supposed to be volume
). For example H[X ; μ]
and I[X : Y ; μ]
instead of H[X]
and
I[X : Y]
respectively.
Entropy of a random variable with values in a finite measurable space.
Equations
- H[X ; μ] = Hm[MeasureTheory.Measure.map X μ]
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Entropy of a random variable with values in a finite measurable space.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Entropy of a random variable with values in a finite measurable space.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Entropy of a random variable with values in a finite measurable space.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Entropy of a random variable with values in a finite measurable space.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Entropy of a random variable agrees with entropy of its distribution.
Entropy of a random variable is also the kernel entropy of the distribution over a Dirac mass.
Any variable on a zero measure space has zero entropy.
Two variables that agree almost everywhere, have the same entropy.
Entropy is always non-negative.
Two variables that have the same distribution, have the same entropy.
Entropy is at most the logarithm of the cardinality of the range.
Entropy is at most the logarithm of the cardinality of a set in which X almost surely takes values in.
Entropy is at most the logarithm of the cardinality of a set in which X almost surely takes values in.
H[X] = ∑ₛ P[X=s] log 1 / P[X=s]
.
H[X | Y=y] = ∑_s P[X=s | Y=y] log 1/(P[X=s | Y=y])
.
If X
, Y
are S
-valued and T
-valued random variables, and Y = f(X)
for
some injection f : S \to T
, then H[Y] = H[X]
.
One can also use entropy_of_comp_eq_of_comp
as an alternative if verifying injectivity is fiddly.
For the upper bound only, see entropy_comp_le
.
The entropy of any constant is zero.
If X
is uniformly distributed on H
, then H[X] = log |H|
.
Variant of IsUniform.entropy_eq
where H
is a finite Set
rather than Finset
.
If X
is S
-valued random variable, then H[X] = log |S|
if and only if X
is uniformly
distributed.
If X
is an S
-valued random variable, then there exists s ∈ S
such that
P[X = s] ≥ \exp(- H[X])
. TODO: remove the probability measure hypothesis, which is unncessary here.
If X
is an S
-valued random variable, then there exists s ∈ S
such that
P[X=s] ≥ \exp(-H[X])
.
If X
is an S
-valued random variable of non-positive entropy, then X
is almost surely constant.
H[X, f(X)] = H[X]
.
H[X, Y] = H[Y, X]
.
H[(X, Y), Z] = H[X, (Y, Z)]
.
Conditional entropy of a random variable w.r.t. another.
This is the expectation under the law of Y
of the entropy of the law of X
conditioned on the
event Y = y
.
Equations
- H[X | Y ; μ] = ∫ (x : T), (fun (y : T) => H[X | Y ← y ; μ]) x ∂MeasureTheory.Measure.map Y μ
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Conditional entropy of a random variable w.r.t. another.
This is the expectation under the law of Y
of the entropy of the law of X
conditioned on the
event Y = y
.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Conditional entropy of a random variable w.r.t. another.
This is the expectation under the law of Y
of the entropy of the law of X
conditioned on the
event Y = y
.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Conditional entropy of a random variable is equal to the entropy of its conditional kernel.
Any random variable on a zero measure space has zero conditional entropy.
Conditional entropy is non-negative.
Conditional entropy is at most the logarithm of the cardinality of the range.
H[X|Y] = ∑_y P[Y=y] H[X|Y=y]
.
H[X|Y] = ∑_y P[Y=y] H[X|Y=y]
$.
H[X|Y] = ∑_y ∑_x P[Y=y] P[X=x|Y=y] log ⧸(P[X=x|Y=y])
$.
H[X|Y] = ∑_y ∑_x P[Y=y] P[X=x|Y=y] log ⧸(P[X=x|Y=y])
$.
Same as previous lemma, but with a sum over a product space rather than a double sum.
If X : Ω → S
, Y : Ω → T
are random variables, and f : T × S → U
is
injective for each fixed t ∈ T
, then H[f(Y, X) | Y] = H[X | Y]
.
Thus for instance H[X-Y|Y] = H[X|Y]
.
A weaker version of the above lemma in which f
is independent of Y
.
H[X, Y| Z] = H[Y, X| Z]
.
One form of the chain rule : H[X, Y] = H[X] + H[Y | X]
.
Another form of the chain rule : H[X, Y] = H[Y] + H[X | Y]
.
Another form of the chain rule : H[X | Y] = H[X, Y] - H[Y]
.
Two pairs of variables that have the same joint distribution, have the same conditional entropy.
If X : Ω → S
and Y : Ω → T
are random variables, and f : T → U
is an
injection then H[X | f(Y)] = H[X | Y]
.
H[X | f(X)] = H[X] - H[f(X)]
.
If X : Ω → S
, Y : Ω → T
, Z : Ω → U
are random variables,
then H[X, Y | Z] = H[X | Z] + H[Y|X, Z]
.
H[X, Y | Z] = H[Y | Z] + H[X | Y, Z]
.
Data-processing inequality for the entropy: H[f(X)] ≤ H[X]
.
To upgrade this to equality, see entropy_of_comp_eq_of_comp
or entropy_comp_of_injective
.
A Schroder-Bernstein type theorem for entropy : if two random variables are functions of each
other, then they have the same entropy. Can be used as a substitute for
entropy_comp_of_injective
if one doesn't want to establish the injectivity.
The mutual information I[X : Y]
of two random variables
is defined to be H[X] + H[Y] - H[X ; Y]
.
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The mutual information I[X : Y]
of two random variables
is defined to be H[X] + H[Y] - H[X ; Y]
.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The mutual information I[X : Y]
of two random variables
is defined to be H[X] + H[Y] - H[X ; Y]
.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Substituting variables for ones with the same distributions doesn't change the mutual information.
The conditional mutual information I[X : Y| Z]
is the mutual information of X| Z=z
and
Y| Z=z
, integrated over z
.
Equations
- I[X : Y|Z;μ] = ∫ (x : U), (fun (z : U) => H[X | Z ← z ; μ] + H[Y | Z ← z ; μ] - H[⟨X, Y⟩ | Z ← z ; μ]) x ∂MeasureTheory.Measure.map Z μ
Instances For
The conditional mutual information I[X : Y| Z]
is the mutual information of X| Z=z
and
Y| Z=z
, integrated over z
.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The conditional mutual information I[X : Y| Z]
is the mutual information of X| Z=z
and
Y| Z=z
, integrated over z
.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Mutual information is non-negative.
Subadditivity of entropy.
I[X : Y] = 0
iff X, Y
are independent.
Alias of the reverse direction of ProbabilityTheory.mutualInfo_eq_zero
.
I[X : Y] = 0
iff X, Y
are independent.
The mutual information with a constant is always zero.
H[X, Y] = H[X] + H[Y]
if and only if X, Y
are independent.
If X, Y
are independent, then H[X, Y] = H[X] + H[Y]
.
I[X : Y] = I[Y : X]
.
I[X : Y] = H[X] - H[X|Y]
.
I[X : Y] = H[Y] - H[Y | X]
.
H[X] - I[X : Y] = H[X | Y]
.
H[Y] - I[X : Y] = H[Y | X]
.
The conditional mutual information agrees with the information of the conditional kernel.
A variant of condMutualInfo_eq_sum
when Z
has finite codomain.
Conditional information is non-nonegative.
I[X : Y | Z] = I[Y : X | Z]
.
I[X : Y| Z] = H[X| Z] + H[Y| Z] - H[X, Y| Z]
.
I[X : Y| Z] = H[X| Z] - H[X|Y, Z]
.
If f(Z, X)
is injective for each fixed Z
, then I[f(Z, X) : Y| Z] = I[X : Y| Z]
.
I[X : Y| Z]=0
iff X, Y
are conditionally independent over Z
.
If X, Y
are conditionally independent over Z
, then H[X, Y, Z] = H[X, Z] + H[Y, Z] - H[Z]
.
H[X] - H[X|Y] = I[X : Y]
H[X | Y] ≤ H[X]
.
H[X | Y, Z] ≤ H[X | Z]
.
Data-processing inequality for the conditional entropy: H[Y|f(X)] ≥ H[Y|X]
To upgrade this to equality, see condEntropy_of_injective'
The submodularity inequality: H[X, Y, Z] + H[Z] ≤ H[X, Z] + H[Y, Z]
.