Documentation

PFR.Kullback

Kullback-Leibler divergence #

Definition of Kullback-Leibler divergence and basic facts

noncomputable def KLDiv {Ω : Type u_1} {Ω' : Type u_2} {G : Type u_5} [mΩ : MeasurableSpace Ω] [mΩ' : MeasurableSpace Ω'] [hG : MeasurableSpace G] (X : Ω → G) (Y : Ω' → G) (μ : MeasureTheory.Measure Ω := by volume_tac) (μ' : MeasureTheory.Measure Ω' := by volume_tac) :

If X, Y are two G-valued random variables, the Kullback--Leibler divergence is defined as KL(X ‖ Y) := ∑ₓ 𝐏(X = x) log (𝐏(X = x) / 𝐏(Y = x)).

Note that this definition only makes sense when X is absolutely continuous wrt to Y, i.e., ∀ x, 𝐏(Y = x) = 0 → 𝐏(X = x) = 0. Otherwise, the divergence should be infinite, but since we use real numbers for ease of computations, this is not a possible choice.

Equations

KL[X ; μ # Y ; μ'] = ∑' (x : G), ((MeasureTheory.Measure.map X μ) {x}).toReal * Real.log (((MeasureTheory.Measure.map X μ) {x}).toReal / ((MeasureTheory.Measure.map Y μ') {x}).toReal)

def «termKL[_;_#_;_]».«delab_app.KLDiv» :

Lean.PrettyPrinter.Delaborator.Delab

Pretty printer defined by notation3 command.

Equations

One or more equations did not get rendered due to their size.

def «termKL[_;_#_;_]» :

Lean.ParserDescr

If X, Y are two G-valued random variables, the Kullback--Leibler divergence is defined as KL(X ‖ Y) := ∑ₓ 𝐏(X = x) log (𝐏(X = x) / 𝐏(Y = x)).

Note that this definition only makes sense when X is absolutely continuous wrt to Y, i.e., ∀ x, 𝐏(Y = x) = 0 → 𝐏(X = x) = 0. Otherwise, the divergence should be infinite, but since we use real numbers for ease of computations, this is not a possible choice.

Equations

One or more equations did not get rendered due to their size.

def «termKL[_#_]» :

Lean.ParserDescr

If X, Y are two G-valued random variables, the Kullback--Leibler divergence is defined as KL(X ‖ Y) := ∑ₓ 𝐏(X = x) log (𝐏(X = x) / 𝐏(Y = x)).

Note that this definition only makes sense when X is absolutely continuous wrt to Y, i.e., ∀ x, 𝐏(Y = x) = 0 → 𝐏(X = x) = 0. Otherwise, the divergence should be infinite, but since we use real numbers for ease of computations, this is not a possible choice.

Equations

One or more equations did not get rendered due to their size.

def «termKL[_#_]».«delab_app.KLDiv» :

Lean.PrettyPrinter.Delaborator.Delab

Pretty printer defined by notation3 command.

Equations

One or more equations did not get rendered due to their size.

@[simp]

theorem KLDiv_self {Ω : Type u_1} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [hG : MeasurableSpace G] {X : Ω → G} :

KL[X ; μ # X ; μ] = 0

theorem ProbabilityTheory.IdentDistrib.KLDiv_eq {Ω : Type u_1} {Ω' : Type u_2} {Ω'' : Type u_3} {Ω''' : Type u_4} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [mΩ' : MeasurableSpace Ω'] {μ' : MeasureTheory.Measure Ω'} [mΩ'' : MeasurableSpace Ω''] {μ'' : MeasureTheory.Measure Ω''} [mΩ''' : MeasurableSpace Ω'''] {μ''' : MeasureTheory.Measure Ω'''} [hG : MeasurableSpace G] {X : Ω → G} {Y : Ω' → G} (X' : Ω'' → G) (Y' : Ω''' → G) (hX : IdentDistrib X X' μ μ'') (hY : IdentDistrib Y Y' μ' μ''') :

KL[X ; μ # Y ; μ'] = KL[X' ; μ'' # Y' ; μ''']

If X' is a copy of X, and Y' is a copy of Y, then KL(X' ‖ Y') = KL(X ‖ Y).

theorem KLDiv_eq_sum {Ω : Type u_1} {Ω' : Type u_2} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [mΩ' : MeasurableSpace Ω'] {μ' : MeasureTheory.Measure Ω'} [hG : MeasurableSpace G] {X : Ω → G} {Y : Ω' → G} [Fintype G] :

KL[X ; μ # Y ; μ'] = ∑ x : G, ((MeasureTheory.Measure.map X μ) {x}).toReal * Real.log (((MeasureTheory.Measure.map X μ) {x}).toReal / ((MeasureTheory.Measure.map Y μ') {x}).toReal)

theorem KLDiv_eq_sum_negMulLog {Ω : Type u_1} {Ω' : Type u_2} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [mΩ' : MeasurableSpace Ω'] {μ' : MeasureTheory.Measure Ω'} [hG : MeasurableSpace G] {X : Ω → G} {Y : Ω' → G} [Fintype G] :

KL[X ; μ # Y ; μ'] = ∑ x : G, -((MeasureTheory.Measure.map Y μ') {x}).toReal * (((MeasureTheory.Measure.map X μ) {x}).toReal / ((MeasureTheory.Measure.map Y μ') {x}).toReal).negMulLog

theorem KLDiv_nonneg {Ω : Type u_1} {Ω' : Type u_2} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [mΩ' : MeasurableSpace Ω'] {μ' : MeasureTheory.Measure Ω'} [hG : MeasurableSpace G] {X : Ω → G} {Y : Ω' → G} [Fintype G] [MeasurableSingletonClass G] [MeasureTheory.IsZeroOrProbabilityMeasure μ] [MeasureTheory.IsZeroOrProbabilityMeasure μ'] (hX : Measurable X) (hY : Measurable Y) (habs : ∀ (x : G), (MeasureTheory.Measure.map Y μ') {x} = 0 → (MeasureTheory.Measure.map X μ) {x} = 0) :

0 ≤ KL[X ; μ # Y ; μ']

KL(X ‖ Y) ≥ 0.

theorem KLDiv_eq_zero_iff_identDistrib {Ω : Type u_1} {Ω' : Type u_2} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [mΩ' : MeasurableSpace Ω'] {μ' : MeasureTheory.Measure Ω'} [hG : MeasurableSpace G] {X : Ω → G} {Y : Ω' → G} [Fintype G] [MeasurableSingletonClass G] [MeasureTheory.IsProbabilityMeasure μ] [MeasureTheory.IsProbabilityMeasure μ'] (hX : Measurable X) (hY : Measurable Y) (habs : ∀ (x : G), (MeasureTheory.Measure.map Y μ') {x} = 0 → (MeasureTheory.Measure.map X μ) {x} = 0) :

KL[X ; μ # Y ; μ'] = 0 ↔ ProbabilityTheory.IdentDistrib X Y μ μ'

KL(X ‖ Y) = 0 if and only if Y is a copy of X.

theorem KLDiv_of_convex {Ω : Type u_1} {Ω' : Type u_2} {Ω'' : Type u_3} {Ω''' : Type u_4} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [mΩ' : MeasurableSpace Ω'] {μ' : MeasureTheory.Measure Ω'} [mΩ'' : MeasurableSpace Ω''] {μ'' : MeasureTheory.Measure Ω''} [mΩ''' : MeasurableSpace Ω'''] {μ''' : MeasureTheory.Measure Ω'''} [hG : MeasurableSpace G] {X : Ω → G} {Y : Ω' → G} [Fintype G] [MeasureTheory.IsFiniteMeasure μ'''] {ι : Type u_6} {S : Finset ι} {w : ι → ℝ} (hw : ∀ s ∈ S, 0 ≤ w s) (X' : ι → Ω'' → G) (Y' : ι → Ω''' → G) (hconvex : ∀ (x : G), ((MeasureTheory.Measure.map X μ) {x}).toReal = ∑ s ∈ S, w s * ((MeasureTheory.Measure.map (X' s) μ'') {x}).toReal) (hconvex' : ∀ (x : G), ((MeasureTheory.Measure.map Y μ') {x}).toReal = ∑ s ∈ S, w s * ((MeasureTheory.Measure.map (Y' s) μ''') {x}).toReal) (habs : ∀ s ∈ S, ∀ (x : G), (MeasureTheory.Measure.map (Y' s) μ''') {x} = 0 → (MeasureTheory.Measure.map (X' s) μ'') {x} = 0) :

KL[X ; μ # Y ; μ'] ≤ ∑ s ∈ S, w s * KL[X' s ; μ'' # Y' s ; μ''']

If $S$ is a finite set, $w_{s}$ is non-negative, and $P (X = x) = \sum_{s \in S} w_{s} P (X_{s} = x)$ , $P (Y = x) = \sum_{s \in S} w_{s} P (Y_{s} = x)$ for all $x$ , then $D_{K L} (X ‖ Y) \leq \sum_{s \in S} w_{s} D_{K L} (X_{s} ‖ Y_{s}) .$

theorem KLDiv_of_comp_inj {Ω : Type u_1} {Ω' : Type u_2} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [mΩ' : MeasurableSpace Ω'] {μ' : MeasureTheory.Measure Ω'} [hG : MeasurableSpace G] {X : Ω → G} {Y : Ω' → G} {H : Type u_6} [MeasurableSpace H] [DiscreteMeasurableSpace G] [MeasurableSingletonClass H] {f : G → H} (hf : Function.Injective f) (hX : Measurable X) (hY : Measurable Y) :

KL[f ∘ X ; μ # f ∘ Y ; μ'] = KL[X ; μ # Y ; μ']

If $f : G \to H$ is an injection, then $D_{K L} (f (X) ‖ f (Y)) = D_{K L} (X ‖ Y)$ .

theorem KLDiv_add_const {Ω : Type u_1} {Ω' : Type u_2} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [mΩ' : MeasurableSpace Ω'] {μ' : MeasureTheory.Measure Ω'} [hG : MeasurableSpace G] [AddCommGroup G] [DiscreteMeasurableSpace G] {X : Ω → G} {Y : Ω' → G} (hX : Measurable X) (hY : Measurable Y) (s : G) :

KL[fun (ω : Ω) => X ω + s ; μ # fun (ω : Ω') => Y ω + s ; μ'] = KL[X ; μ # Y ; μ']

theorem ProbabilityTheory.IndepFun.map_add_eq_sum {Ω : Type u_1} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [hG : MeasurableSpace G] [Fintype G] [AddCommGroup G] [DiscreteMeasurableSpace G] {X Z : Ω → G} (h_indep : IndepFun X Z μ) (hX : Measurable X) (hZ : Measurable Z) (S : Set G) :

(MeasureTheory.Measure.map (X + Z) μ) S = ∑ s : G, (MeasureTheory.Measure.map Z μ) {s} * (MeasureTheory.Measure.map X μ) ({-s} + S)

The distribution of X + Z is the convolution of the distributions of Z and X when these random variables are independent. Probably already available somewhere in some form, but I couldn't locate it.

theorem ProbabilityTheory.IndepFun.map_add_singleton_eq_sum {Ω : Type u_1} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [hG : MeasurableSpace G] [Fintype G] [AddCommGroup G] [DiscreteMeasurableSpace G] {X Z : Ω → G} (h_indep : IndepFun X Z μ) (hX : Measurable X) (hZ : Measurable Z) (x : G) :

(MeasureTheory.Measure.map (X + Z) μ) {x} = ∑ s : G, (MeasureTheory.Measure.map Z μ) {s} * (MeasureTheory.Measure.map X μ) {x - s}

The distribution of X + Z is the convolution of the distributions of Z and X when these random variables are independent. Probably already available somewhere in some form, but I couldn't locate it.

theorem absolutelyContinuous_add_of_indep {Ω : Type u_1} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [hG : MeasurableSpace G] [Fintype G] [AddCommGroup G] [DiscreteMeasurableSpace G] {X Y Z : Ω → G} (h_indep : ProbabilityTheory.IndepFun (⟨X, Y⟩) Z μ) (hX : Measurable X) (hY : Measurable Y) (hZ : Measurable Z) (habs : ∀ (x : G), (MeasureTheory.Measure.map Y μ) {x} = 0 → (MeasureTheory.Measure.map X μ) {x} = 0) (x : G) :

(MeasureTheory.Measure.map (Y + Z) μ) {x} = 0 → (MeasureTheory.Measure.map (X + Z) μ) {x} = 0

theorem KLDiv_add_le_KLDiv_of_indep {Ω : Type u_1} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [hG : MeasurableSpace G] [Fintype G] [AddCommGroup G] [DiscreteMeasurableSpace G] {X Y Z : Ω → G} [MeasureTheory.IsZeroOrProbabilityMeasure μ] (h_indep : ProbabilityTheory.IndepFun (⟨X, Y⟩) Z μ) (hX : Measurable X) (hY : Measurable Y) (hZ : Measurable Z) (habs : ∀ (x : G), (MeasureTheory.Measure.map Y μ) {x} = 0 → (MeasureTheory.Measure.map X μ) {x} = 0) :

KL[X + Z ; μ # Y + Z ; μ] ≤ KL[X ; μ # Y ; μ]

If $X, Y, Z$ are independent $G$ -valued random variables, then $D_{K L} (X + Z ‖ Y + Z) \leq D_{K L} (X ‖ Y) .$

noncomputable def condKLDiv {Ω : Type u_1} {Ω' : Type u_2} {G : Type u_5} [mΩ : MeasurableSpace Ω] [mΩ' : MeasurableSpace Ω'] [hG : MeasurableSpace G] {S : Type u_6} (X : Ω → G) (Y : Ω' → G) (Z : Ω → S) (μ : MeasureTheory.Measure Ω := by volume_tac) (μ' : MeasureTheory.Measure Ω' := by volume_tac) :

If $X, Y, Z$ are random variables, with $X, Z$ defined on the same sample space, we define $D_{K L} (X | Z ‖ Y) := \sum_{z} P (Z = z) D_{K L} ((X | Z = z) ‖ Y) .$

Equations

KL[X | Z ; μ # Y ; μ'] = ∑' (z : S), (μ (Z ⁻¹' {z})).toReal * KL[X ; μ[|Z ⁻¹' {z}] # Y ; μ']

def «termKL[_|_;_#_;_]» :

Lean.ParserDescr

If $X, Y, Z$ are random variables, with $X, Z$ defined on the same sample space, we define $D_{K L} (X | Z ‖ Y) := \sum_{z} P (Z = z) D_{K L} ((X | Z = z) ‖ Y) .$

Equations

One or more equations did not get rendered due to their size.

def «termKL[_|_;_#_;_]».«delab_app.condKLDiv» :

Lean.PrettyPrinter.Delaborator.Delab

Pretty printer defined by notation3 command.

Equations

One or more equations did not get rendered due to their size.

def «termKL[_|_#_]».«delab_app.condKLDiv» :

Lean.PrettyPrinter.Delaborator.Delab

Pretty printer defined by notation3 command.

Equations

One or more equations did not get rendered due to their size.

def «termKL[_|_#_]» :

Lean.ParserDescr

If $X, Y, Z$ are random variables, with $X, Z$ defined on the same sample space, we define $D_{K L} (X | Z ‖ Y) := \sum_{z} P (Z = z) D_{K L} ((X | Z = z) ‖ Y) .$

Equations

One or more equations did not get rendered due to their size.

theorem condKLDiv_eq {Ω : Type u_1} {Ω' : Type u_2} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [mΩ' : MeasurableSpace Ω'] {μ' : MeasureTheory.Measure Ω'} [hG : MeasurableSpace G] {S : Type u_6} [MeasurableSpace S] [Fintype S] [MeasurableSingletonClass S] [Fintype G] [MeasureTheory.IsZeroOrProbabilityMeasure μ] [MeasureTheory.IsFiniteMeasure μ'] {X : Ω → G} {Y : Ω' → G} {Z : Ω → S} (hX : Measurable X) (hZ : Measurable Z) (habs : ∀ (x : G), (MeasureTheory.Measure.map Y μ') {x} = 0 → (MeasureTheory.Measure.map X μ) {x} = 0) :

KL[X | Z ; μ # Y ; μ'] = KL[X ; μ # Y ; μ'] + H[X ; μ] - H[X | Z ; μ]

If $X, Y$ are $G$ -valued random variables, and $Z$ is another random variable defined on the same sample space as $X$ , then $D_{K L} ((X | Z) ‖ Y) = D_{K L} (X ‖ Y) + \bbH [X] - \bbH [X | Z] .$

theorem condKLDiv_nonneg {Ω : Type u_1} {Ω' : Type u_2} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [mΩ' : MeasurableSpace Ω'] {μ' : MeasureTheory.Measure Ω'} [hG : MeasurableSpace G] {S : Type u_6} [MeasurableSingletonClass G] [Fintype G] {X : Ω → G} {Y : Ω' → G} {Z : Ω → S} [MeasureTheory.IsZeroOrProbabilityMeasure μ'] (hX : Measurable X) (hY : Measurable Y) (habs : ∀ (x : G), (MeasureTheory.Measure.map Y μ') {x} = 0 → (MeasureTheory.Measure.map X μ) {x} = 0) :

0 ≤ KL[X | Z ; μ # Y ; μ']

KL(X|Z ‖ Y) ≥ 0.

theorem tendsto_KLDiv_id_right {Ω : Type u_1} {G : Type u_5} [mΩ : MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [hG : MeasurableSpace G] {X : Ω → G} [TopologicalSpace G] [DiscreteTopology G] [Fintype G] [DiscreteMeasurableSpace G] [MeasureTheory.IsFiniteMeasure μ] {α : Type u_6} {l : Filter α} {ν : α → MeasureTheory.ProbabilityMeasure G} {ν' : MeasureTheory.ProbabilityMeasure G} (h : Filter.Tendsto ν l (nhds ν')) (habs : ∀ (x : G), ν' {x} = 0 → (MeasureTheory.Measure.map X μ) {x} = 0) :

Filter.Tendsto (fun (n : α) => KL[X ; μ # id ; ↑(ν n)]) l (nhds KL[X ; μ # id ; ↑ν'])

theorem tendsto_KLDiv_id_left {Ω : Type u_1} {G : Type u_5} [mΩ : MeasurableSpace Ω] [hG : MeasurableSpace G] [TopologicalSpace G] [DiscreteTopology G] [Fintype G] [DiscreteMeasurableSpace G] {Y : Ω → G} {μ : MeasureTheory.Measure Ω} {α : Type u_6} {l : Filter α} {ν : α → MeasureTheory.ProbabilityMeasure G} {ν' : MeasureTheory.ProbabilityMeasure G} (h : Filter.Tendsto ν l (nhds ν')) :

Filter.Tendsto (fun (n : α) => KL[id ; ↑(ν n) # Y ; μ]) l (nhds KL[id ; ↑ν' # Y ; μ])