Kullback-Leibler divergence #
Definition of Kullback-Leibler divergence and basic facts
If X, Y
are two G
-valued random variables, the Kullback--Leibler divergence is defined as
KL(X ‖ Y) := ∑ₓ 𝐏(X = x) log (𝐏(X = x) / 𝐏(Y = x))
.
Note that this definition only makes sense when X
is absolutely continuous wrt to Y
,
i.e., ∀ x, 𝐏(Y = x) = 0 → 𝐏(X = x) = 0
. Otherwise, the divergence should be infinite, but since
we use real numbers for ease of computations, this is not a possible choice.
Equations
- KL[X ; μ # Y ; μ'] = ∑' (x : G), ((MeasureTheory.Measure.map X μ) {x}).toReal * Real.log (((MeasureTheory.Measure.map X μ) {x}).toReal / ((MeasureTheory.Measure.map Y μ') {x}).toReal)
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
If X, Y
are two G
-valued random variables, the Kullback--Leibler divergence is defined as
KL(X ‖ Y) := ∑ₓ 𝐏(X = x) log (𝐏(X = x) / 𝐏(Y = x))
.
Note that this definition only makes sense when X
is absolutely continuous wrt to Y
,
i.e., ∀ x, 𝐏(Y = x) = 0 → 𝐏(X = x) = 0
. Otherwise, the divergence should be infinite, but since
we use real numbers for ease of computations, this is not a possible choice.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
If X, Y
are two G
-valued random variables, the Kullback--Leibler divergence is defined as
KL(X ‖ Y) := ∑ₓ 𝐏(X = x) log (𝐏(X = x) / 𝐏(Y = x))
.
Note that this definition only makes sense when X
is absolutely continuous wrt to Y
,
i.e., ∀ x, 𝐏(Y = x) = 0 → 𝐏(X = x) = 0
. Otherwise, the divergence should be infinite, but since
we use real numbers for ease of computations, this is not a possible choice.
Equations
- One or more equations did not get rendered due to their size.
Instances For
If X'
is a copy of X
, and Y'
is a copy of Y
, then KL(X' ‖ Y') = KL(X ‖ Y)
.
KL(X ‖ Y) ≥ 0
.
KL(X ‖ Y) = 0
if and only if Y
is a copy of X
.
If $S$ is a finite set, $w_s$ is non-negative, and ${\bf P}(X=x) = \sum_{s\in S} w_s {\bf P}(X_s=x)$, ${\bf P}(Y=x) = \sum_{s\in S} w_s {\bf P}(Y_s=x)$ for all $x$, then $$D_{KL}(X\Vert Y) \le \sum_{s\in S} w_s D_{KL}(X_s\Vert Y_s).$$
If $f:G \to H$ is an injection, then $D_{KL}(f(X)\Vert f(Y)) = D_{KL}(X\Vert Y)$.
The distribution of X + Z
is the convolution of the distributions of Z
and X
when these
random variables are independent.
Probably already available somewhere in some form, but I couldn't locate it.
The distribution of X + Z
is the convolution of the distributions of Z
and X
when these
random variables are independent.
Probably already available somewhere in some form, but I couldn't locate it.
If $X, Y, Z$ are independent $G$-valued random variables, then $$D_{KL}(X+Z\Vert Y+Z) \leq D_{KL}(X\Vert Y).$$
If $X,Y,Z$ are random variables, with $X,Z$ defined on the same sample space, we define $$ D_{KL}(X|Z \Vert Y) := \sum_z \mathbf{P}(Z=z) D_{KL}( (X|Z=z) \Vert Y).$$
Equations
- KL[X | Z ; μ # Y ; μ'] = ∑' (z : S), (μ (Z ⁻¹' {z})).toReal * KL[X ; ProbabilityTheory.cond μ (Z ⁻¹' {z}) # Y ; μ']
Instances For
If $X,Y,Z$ are random variables, with $X,Z$ defined on the same sample space, we define $$ D_{KL}(X|Z \Vert Y) := \sum_z \mathbf{P}(Z=z) D_{KL}( (X|Z=z) \Vert Y).$$
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pretty printer defined by notation3
command.
Equations
- One or more equations did not get rendered due to their size.
Instances For
If $X,Y,Z$ are random variables, with $X,Z$ defined on the same sample space, we define $$ D_{KL}(X|Z \Vert Y) := \sum_z \mathbf{P}(Z=z) D_{KL}( (X|Z=z) \Vert Y).$$
Equations
- One or more equations did not get rendered due to their size.
Instances For
If $X, Y$ are $G$-valued random variables, and $Z$ is another random variable defined on the same sample space as $X$, then $$D_{KL}((X|Z)\Vert Y) = D_{KL}(X\Vert Y) + \bbH[X] - \bbH[X|Z].$$
KL(X|Z ‖ Y) ≥ 0
.