Learn probabilistic PCA first to better understand Variational Autoencoders

Reconstructions of images sampled from $p(x|z)$ distribution

I prepare a presentation on Variational Autoencoders to my university colleagues. I think what can help them understand the topic better. I noticed that a derivation of the loss function can cause problems as it requires understanding and intuition of probabilistic concepts.

I think that understanding probabilistic PCA first helps to understand Variational Autoencoders faster. Probabilistic PCA is a great and simple algorithm. The big advantage of this algorithm is that all the probability distributions used in the algorithm are Gaussian distributions. So starting from the latent variable distribution $p(z)$ and distribution of observed variables conditioned on latent variable $p(x|z)$, we can easily obtain the distribution of observed variable $p(x)$, and posterior distribution $p(z|x)$. It could be obtained analytically by known equations. What’s more, we can easily sample from that distribution using known libraries like Numpy or Scipy. So to gain the intuition for this algorithm, it is worth to play with the examples. From my experience, learning probabilistic PCA makes learning Variational Autoencoders much easier, because I started to “feel” the equations.

I prepared the short implementation of the probabilistic PCA method. I plan to show the code to listeners on my presentation to make the algorithm easier to understand. I hope that it also helps you. The code is available at my Github:


I plan to prepare longer text on my blog concerning that method.