Neural Posterior Estimation with Differentiable Simulators


Summer School Euclid 2022

Août 16-26, 2022


Justine Zeghal, François Lanusse, Alexandre Boucaud, Eric Aubourg, Benjamin Remy


Drawing Drawing Drawing Drawing Drawing

Bayes theorem

    We want to infer the parameters $\theta$ that generated an observation $x_0$

    $ \underbrace{p(x_0|\theta)}_{\text{likelihood}}$

    $ \underbrace{p(\theta)}_{\text{prior}}$

    $ \underbrace{p(\theta|x_0)}_{\text{posterior}}$

    $\propto$

    $\to$ Bayes theorem:

    $p(x_0|\theta)=\int p(x_0|\theta,z)p(z)dz$

    Solution: Simulation-Based Inference (SBI) / Likelihood-Free Inference (LFI)

Example

Classical method:
SBI method:

shear map

$\longrightarrow$

compression

two-point correlation function

$\longrightarrow$

approximation

$\approx p(x_0|\theta)$

$\downarrow$

run a MCMC to get

$\approx p(\theta|x_0)$

$\xrightarrow{\hspace*{5cm}}$

density approximation

via deep learning

$\approx p(\theta|x_0)$

$\to$ find a way to reduce the number of simulations

The idea



$\to$ We integrate the gradients $\nabla_{\theta} \log p(\theta|x)$ in the process to reduce the number of simulations.

How can we train our neural density estimator with both simulations and score ?

Normalizing Flows (NFs) as Neural Density Estimators


How can we find the transformation parameter $\phi$ from the data $x$
to be as close as possible to the true distribution $p(x)$?

$\to$ we need a tool to compare distributions: the Kullback-Leiber Divergence.

$$\begin{array}{ll} D_{KL}(p(x)||p_x^{\phi}(x)) &= \mathbb{E}_{p(x)}\Big[ \log\left(\frac{p(x)}{p_x^{\phi}(x)}\right) \Big] \\ &= \mathbb{E}_{p(x)}\left[ \log\left(p(x)\right) \right] - \mathbb{E}_{p(x)}\left[ \log\left(p_x^{\phi}(x)\right) \right]\\ \end{array} $$

$$ \implies Loss =- \mathbb{E}_{p(x)}\left[ \log\left(p_x^{\phi}(x)\right) \right] + cte $$

By minimizing the negative log likelihood.

    But to train the NF, we want to use both simulations and gradients:

    $\displaystyle \mathcal{L} = $ $\displaystyle \ -\mathbb{E}[\log \: p_x^{\phi}(x)]$ $+\lambda \ $$ \displaystyle \mathbb{E}\left[ \parallel \nabla_{x} \log p_x(x) - \underbrace{\nabla_x \log p_x^{\phi}(x)}\parallel_2^2 \right]$

Problem: the gradient of current NFs lack expressivity.

Smooth Normalizing Flows paper here

Experiment: Lotka Volterra

  • Draw N parameters $\alpha,\beta,\gamma,\delta \sim p(\underbrace{\alpha,\beta,\gamma,\delta}_{\theta})$

  • Run N simulations $x \sim p(\underbrace{\text{Prey}, \: \text{Predator}}_{x}| \underbrace{\alpha,\beta,\gamma,\delta}_{\theta})$

  • Compress $x \in \mathbb{R}^{20}$ into $y = r(x) \in \mathbb{R}^4$

  • Train a NF on ($y_i$,$\theta_i$)$_{i=1..N}$

  • Approximate the posterior: $p(\theta|x_0) \approx q_{\phi}(\theta|r(x_0))$

Results

  • We tested with only simulations

  • Then with simulations and scores

With simulations only

With simulations and score

Conclusion



  • This is the first NPE method that uses the score.

  • We used Smooth Normalizing Flows to be able to train using the score.

  • The score helps reducing the number of simulations when the prior is not too wide.

  • More and more simulators are differentiable.






  • Thank you !