Kullback-Leibler divergence #
Definition of Kullback-Leibler divergence and basic facts
If X, Y are two G-valued random variables, the Kullback--Leibler divergence is defined as
KL(X ‖ Y) := ∑ₓ 𝐏(X = x) log (𝐏(X = x) / 𝐏(Y = x)).
Note that this definition only makes sense when X is absolutely continuous wrt to Y,
i.e., ∀ x, 𝐏(Y = x) = 0 → 𝐏(X = x) = 0. Otherwise, the divergence should be infinite, but since
we use real numbers for ease of computations, this is not a possible choice.
Equations
Instances For
If X, Y are two G-valued random variables, the Kullback--Leibler divergence is defined as
KL(X ‖ Y) := ∑ₓ 𝐏(X = x) log (𝐏(X = x) / 𝐏(Y = x)).
Note that this definition only makes sense when X is absolutely continuous wrt to Y,
i.e., ∀ x, 𝐏(Y = x) = 0 → 𝐏(X = x) = 0. Otherwise, the divergence should be infinite, but since
we use real numbers for ease of computations, this is not a possible choice.
Equations
- One or more equations did not get rendered due to their size.
Instances For
If X, Y are two G-valued random variables, the Kullback--Leibler divergence is defined as
KL(X ‖ Y) := ∑ₓ 𝐏(X = x) log (𝐏(X = x) / 𝐏(Y = x)).
Note that this definition only makes sense when X is absolutely continuous wrt to Y,
i.e., ∀ x, 𝐏(Y = x) = 0 → 𝐏(X = x) = 0. Otherwise, the divergence should be infinite, but since
we use real numbers for ease of computations, this is not a possible choice.
Equations
- One or more equations did not get rendered due to their size.
Instances For
If X' is a copy of X, and Y' is a copy of Y, then KL(X' ‖ Y') = KL(X ‖ Y).
KL(X ‖ Y) ≥ 0.
KL(X ‖ Y) = 0 if and only if Y is a copy of X.
If $S$ is a finite set, $w_s$ is non-negative, and ${\bf P}(X=x) = \sum_{s\in S} w_s {\bf P}(X_s=x)$, ${\bf P}(Y=x) = \sum_{s\in S} w_s {\bf P}(Y_s=x)$ for all $x$, then $$D_{KL}(X\Vert Y) \le \sum_{s\in S} w_s D_{KL}(X_s\Vert Y_s).$$
If $f:G \to H$ is an injection, then $D_{KL}(f(X)\Vert f(Y)) = D_{KL}(X\Vert Y)$.
The distribution of X + Z is the convolution of the distributions of Z and X when these
random variables are independent.
Probably already available somewhere in some form, but I couldn't locate it.
The distribution of X + Z is the convolution of the distributions of Z and X when these
random variables are independent.
Probably already available somewhere in some form, but I couldn't locate it.
If $X, Y, Z$ are independent $G$-valued random variables, then $$D_{KL}(X+Z\Vert Y+Z) \leq D_{KL}(X\Vert Y).$$
If $X,Y,Z$ are random variables, with $X,Z$ defined on the same sample space, we define $$ D_{KL}(X|Z \Vert Y) := \sum_z \mathbf{P}(Z=z) D_{KL}( (X|Z=z) \Vert Y).$$
Instances For
If $X,Y,Z$ are random variables, with $X,Z$ defined on the same sample space, we define $$ D_{KL}(X|Z \Vert Y) := \sum_z \mathbf{P}(Z=z) D_{KL}( (X|Z=z) \Vert Y).$$
Equations
- One or more equations did not get rendered due to their size.
Instances For
If $X,Y,Z$ are random variables, with $X,Z$ defined on the same sample space, we define $$ D_{KL}(X|Z \Vert Y) := \sum_z \mathbf{P}(Z=z) D_{KL}( (X|Z=z) \Vert Y).$$
Equations
- One or more equations did not get rendered due to their size.
Instances For
If $X, Y$ are $G$-valued random variables, and $Z$ is another random variable defined on the same sample space as $X$, then $$D_{KL}((X|Z)\Vert Y) = D_{KL}(X\Vert Y) + \bbH[X] - \bbH[X|Z].$$
KL(X|Z ‖ Y) ≥ 0.