Bayesian Deep Learning for Cosmology and Time Domain Astrophysics #2
June 20-24, 2022
Justine Zeghal, François Lanusse, Alexandre Boucaud, Eric Aubourg, Benjamin Remy
Context
We want to infer the parameters that generated an observation $x_0$.
$\to$ Bayes theorem: $ p(\theta|x_0)$$ \propto$ $ \underbrace{p(x_0|\theta)}_{\text{likelihood}}$ $ \underbrace{p(\theta)}_{\text{prior}}$
$ p(x_0|\theta) = \int p(x_0|\theta, z)p(z)dz$ $\to$ intractable
Different ways to do Likelihood-Free Inference:
$\to$ Require a large number of simulations
What if we add the gradients of joint log probability of the simulator with respect to input parameters in the LFI process ?
Posterior Estimation Algorithm
Draw N parameters $\theta_i \sim p(\theta)$
$\downarrow$
Draw N simulations $x_i \sim p(x|\theta)_{i=1..N}$
$\downarrow$
Train a neural density estimator $q_{\phi}(\theta|x)$ on $(x_i,\theta_i)_{i=1..N}$
$\downarrow$
Approximate the posterior: $p(\theta|x_0) \approx q_{\phi}(\theta|x=x_0)$
The idea
With a few simulations it's hard to approximate the distribution.
$\to$ we need more simulations
but if we have a few simulations
and the gradients
then it's possible to have an idea of the shape of the distribution.
$\to$ We integrate the gradients $\nabla_{\theta} \log p(\theta|x)$ in the process to reduce the number of simulations.
How can we train our neural density estimator with both simulations and score ?
Normalizing Flows (NFs) as Neural Density Estimators
The key idea of NFs is to transform a simple density distribution $p_z(z)$ through a series of bijective functions $f_i$ to reconstruct a complex target distribution $p_x(x)$.
$z$
$z'$
$\longrightarrow$
$f_2(z')$
$\longleftarrow$
$f^{-1}_1(z')$
$\longrightarrow$
$f_{1}(z) $
$\longleftarrow$
$f_2^{-1}(x)$
$x$
The density distribution is obtained by using the change of variable formula:
$\scriptstyle p_x^{\phi}(x) = p_z\left(f^{-1}_{\phi}(x)\right) \det \left|\frac{\partial f^{-1}_{\phi}(x)}{\partial x} \right|. $
How can we find the transformation parameter $\phi$ from the data $x$
to be as close as possible to the true distribution $p(x)$?
$\to$ we need a tool to compare distributions: the Kullback-Leiber Divergence.
$$\begin{array}{ll} D_{KL}(p(x)||p_x^{\phi}(x)) &= \mathbb{E}_{p(x)}\Big[ \log\left(\frac{p(x)}{p_x^{\phi}(x)}\right) \Big] \\ &= \mathbb{E}_{p(x)}\left[ \log\left(p(x)\right) \right] - \mathbb{E}_{p(x)}\left[ \log\left(p_x^{\phi}(x)\right) \right]\\ \end{array} $$
$$ \implies Loss =- \mathbb{E}_{p(x)}\left[ \log\left(p_x^{\phi}(x)\right) \right] + cte $$
By minimizing the negative log likelihood.
But to train the NF, we want to use both
simulations and
gradients:
$\displaystyle \mathcal{L} = $ $\displaystyle \ -\mathbb{E}[\log \: p_x^{\phi}(x)]$ $+\lambda \ $$ \displaystyle \mathbb{E}\left[ \parallel \nabla_{x} \log p_x(x) - \underbrace{\nabla_x \log p_x^{\phi}(x)}\parallel_2^2 \right]$
Problem: the gradient of current NFs lack expressivity.
Experiment: Lotka Volterra
Draw N parameters $\alpha,\beta,\gamma,\delta \sim p(\underbrace{\alpha,\beta,\gamma,\delta}_{\theta})$
Run N simulations $x \sim p(\underbrace{\text{Prey}, \: \text{Predator}}_{x}| \underbrace{\alpha,\beta,\gamma,\delta}_{\theta})$
Compress $x \in \mathbb{R}^{20}$ into $y = r(x) \in \mathbb{R}^4$
Train a NF on ($y_i$,$\theta_i$)$_{i=1..N}$
Approximate the posterior: $p(\theta|x_0) \approx q_{\phi}(\theta|r(x_0))$
Results
$ \underbrace{p(\theta|x_0)}_{\text{posterior}} \propto$ $ \underbrace{p(x_0|\theta)}_{\text{likelihood}}$ $ \underbrace{p(\theta)}_{\text{prior}}$
And found out that whether with our method or SCANDAL the gradients do not help.
The prior we used in our experiment.
The true posterior.
The posterior is very small compare to the prior.
The prior.
The true posterior.
And we performed the same experiment in this framework.
$ \underbrace{p(\theta|x_0)}_{\text{posterior}} \propto$ $ \underbrace{p(x_0|\theta)}_{\text{likelihood}}$ $ \underbrace{p(\theta)}_{\text{prior}}$
With simulations only
With simulations and score
Conclusion
Thank you !