Accelerating Simulation-Based Inference with Differentiable Simulators

Neural Posterior Estimation with Differentiable Simulators

Summer School Euclid 2022

Août 16-26, 2022

Justine Zeghal, François Lanusse, Alexandre Boucaud, Eric Aubourg, Benjamin Remy

Bayes theorem

We want to infer the parameters $\theta$ that generated an observation $x_0$

$ \underbrace{p(x_0|\theta)}_{\text{likelihood}}$

$ \underbrace{p(\theta)}_{\text{prior}}$

$ \underbrace{p(\theta|x_0)}_{\text{posterior}}$

$\propto$

$\to$ Bayes theorem:

$p(x_0|\theta)=\int p(x_0|\theta,z)p(z)dz$

Solution: Simulation-Based Inference (SBI) / Likelihood-Free Inference (LFI)

Example

Classical method:

SBI method:

shear map

$\longrightarrow$

compression

two-point correlation function

$\longrightarrow$

approximation

$\approx p(x_0|\theta)$

$\downarrow$

run a MCMC to get

$\approx p(\theta|x_0)$

$\xrightarrow{\hspace*{5cm}}$

density approximation

via deep learning

$\approx p(\theta|x_0)$

$\to$ find a way to reduce the number of simulations

The idea

$\to$ We integrate the gradients $\nabla_{\theta} \log p(\theta|x)$ in the process to reduce the number of simulations.

How can we train our neural density estimator with both simulations and score ?

Normalizing Flows (NFs) as Neural Density Estimators

How can we find the transformation parameter $\phi$ from the data $x$
to be as close as possible to the true distribution $p(x)$?

$\to$ we need a tool to compare distributions: the Kullback-Leiber Divergence.

$$\begin{array}{ll} D_{KL}(p(x)||p_x^{\phi}(x)) &= \mathbb{E}_{p(x)}\Big[ \log\left(\frac{p(x)}{p_x^{\phi}(x)}\right) \Big] \\ &= \mathbb{E}_{p(x)}\left[ \log\left(p(x)\right) \right] - \mathbb{E}_{p(x)}\left[ \log\left(p_x^{\phi}(x)\right) \right]\\ \end{array} $$

$$ \implies Loss =- \mathbb{E}_{p(x)}\left[ \log\left(p_x^{\phi}(x)\right) \right] + cte $$

By minimizing the negative log likelihood.

But to train the NF, we want to use both simulations and gradients:

$\displaystyle \mathcal{L} = $ $\displaystyle \ -\mathbb{E}[\log \: p_x^{\phi}(x)]$ $+\lambda \ $$ \displaystyle \mathbb{E}\left[ \parallel \nabla_{x} \log p_x(x) - \underbrace{\nabla_x \log p_x^{\phi}(x)}\parallel_2^2 \right]$

Problem: the gradient of current NFs lack expressivity.

Smooth Normalizing Flows paper here

Experiment: Lotka Volterra

Draw N parameters $\alpha,\beta,\gamma,\delta \sim p(\underbrace{\alpha,\beta,\gamma,\delta}_{\theta})$
Run N simulations $x \sim p(\underbrace{\text{Prey}, \: \text{Predator}}_{x}| \underbrace{\alpha,\beta,\gamma,\delta}_{\theta})$
Compress $x \in \mathbb{R}^{20}$ into $y = r(x) \in \mathbb{R}^4$
Train a NF on ($y_i$,$\theta_i$)$_{i=1..N}$
Approximate the posterior: $p(\theta|x_0) \approx q_{\phi}(\theta|r(x_0))$