Read up till Section 2.7.
Finally got a feeling for what mutual information and the KL divergence is. I especially liked the interpretation for encoding a random variable X. The KL divergence is the inefficiency, the additional needed when one uses an encoding assuming X follows q(x) but actually the true probability mass function is p(x).
The Venn-Diagram with H(X | Y), H(Y | X), I(X;Y) was helpful too.
— Aug 10, 2025 08:21PM
Add a comment