Accelerating Simulation-Based Inference with Differentiable Simulators

Neural Posterior Estimation with Differentiable Simulators

IN2P3/IRFU Machine Learning workshop

September 26-28, 2022

Justine Zeghal, François Lanusse, Alexandre Boucaud, Eric Aubourg, Benjamin Remy

Context

We want to infer the parameters that generated an observation $x_0$.

$\to$ Bayes theorem: $ p(\theta|x_0)$$ \propto$ $ \underbrace{p(x_0|\theta)}_{\text{likelihood}}$ $ \underbrace{p(\theta)}_{\text{prior}}$

$ p(x_0|\theta) = \int p(x_0|\theta, z)p(z)dz$ $\to$ intractable

Different ways to do Likelihood-Free Inference:

Likelihood Estimation
Ratio Estimation

Posterior Estimation

$\to$ Require a large number of simulations

What if we add the gradients of joint log probability of the simulator with respect to input parameters in the LFI process ?

Posterior Estimation Algorithm

Draw N parameters $\theta_i \sim p(\theta)$

$\downarrow$

Draw N simulations $x_i \sim p(x|\theta)_{i=1..N}$

$\downarrow$

Train a neural density estimator $q_{\phi}(\theta|x)$ on $(x_i,\theta_i)_{i=1..N}$

$\downarrow$

Approximate the posterior: $p(\theta|x_0) \approx q_{\phi}(\theta|x=x_0)$

The idea

With a few simulations it's hard to approximate the distribution.

$\to$ we need more simulations

but if we have a few simulations

and the gradients

then it's possible to have an idea of the shape of the distribution.

$\to$ We integrate the gradients $\nabla_{\theta} \log p(\theta|x)$ in the process to reduce the number of simulations.

How can we train our neural density estimator with both simulations and score ?

Normalizing Flows (NFs) as Neural Density Estimators

The key idea of NFs is to transform a simple density distribution $p_z(z)$ through a series of bijective functions $f_i$ to reconstruct a complex target distribution $p_x(x)$.

$z$

$z'$

$\longrightarrow$

$f_2(z')$

$\longleftarrow$

$f^{-1}_1(z')$

$\longrightarrow$

$f_{1}(z) $

$\longleftarrow$

$f_2^{-1}(x)$

$x$

The density distribution is obtained by using the change of variable formula:

$\scriptstyle p_x^{\phi}(x) = p_z\left(f^{-1}_{\phi}(x)\right) \det \left|\frac{\partial f^{-1}_{\phi}(x)}{\partial x} \right|. $

How can we find the transformation parameter $\phi$ from the data $x$
to be as close as possible to the true distribution $p(x)$?

$\to$ we need a tool to compare distributions: the Kullback-Leiber Divergence.

$$\begin{array}{ll} D_{KL}(p(x)||p_x^{\phi}(x)) &= \mathbb{E}_{p(x)}\Big[ \log\left(\frac{p(x)}{p_x^{\phi}(x)}\right) \Big] \\ &= \mathbb{E}_{p(x)}\left[ \log\left(p(x)\right) \right] - \mathbb{E}_{p(x)}\left[ \log\left(p_x^{\phi}(x)\right) \right]\\ \end{array} $$

$$ \implies Loss =- \mathbb{E}_{p(x)}\left[ \log\left(p_x^{\phi}(x)\right) \right] + cte $$

By minimizing the negative log likelihood.

But to train the NF, we want to use both simulations and gradients:

$\displaystyle \mathcal{L} = $ $\displaystyle \ -\mathbb{E}[\log \: p_x^{\phi}(x)]$ $+\lambda \ $$ \displaystyle \mathbb{E}\left[ \parallel \nabla_{x} \log p_x(x) - \underbrace{\nabla_x \log p_x^{\phi}(x)}\parallel_2^2 \right]$

Problem: the gradient of current NFs lack expressivity.

Smooth Normalizing Flows paper here

Experiment: Lotka Volterra

Draw N parameters $\alpha,\beta,\gamma,\delta \sim p(\underbrace{\alpha,\beta,\gamma,\delta}_{\theta})$
Run N simulations $x \sim p(\underbrace{\text{Prey}, \: \text{Predator}}_{x}| \underbrace{\alpha,\beta,\gamma,\delta}_{\theta})$
Compress $x \in \mathbb{R}^{20}$ into $y = r(x) \in \mathbb{R}^4$
Train a NF on ($y_i$,$\theta_i$)$_{i=1..N}$
Approximate the posterior: $p(\theta|x_0) \approx q_{\phi}(\theta|r(x_0))$