IN2P3/IRFU Machine Learning workshop
September 26-28, 2022
Justine Zeghal, François Lanusse, Alexandre Boucaud, Eric Aubourg, Benjamin Remy
Context
We want to infer the parameters that generated an observation $x_0$.
$\to$ Bayes theorem: $ p(\theta|x_0)$$ \propto$ $ \underbrace{p(x_0|\theta)}_{\text{likelihood}}$ $ \underbrace{p(\theta)}_{\text{prior}}$
$ p(x_0|\theta) = \int p(x_0|\theta, z)p(z)dz$ $\to$ intractable
Different ways to do Likelihood-Free Inference:
$\to$ Require a large number of simulations
What if we add the gradients of joint log probability of the simulator with respect to input parameters in the LFI process ?
Posterior Estimation Algorithm
Draw N parameters $\theta_i \sim p(\theta)$
$\downarrow$
Draw N simulations $x_i \sim p(x|\theta)_{i=1..N}$
$\downarrow$
Train a neural density estimator $q_{\phi}(\theta|x)$ on $(x_i,\theta_i)_{i=1..N}$
$\downarrow$
Approximate the posterior: $p(\theta|x_0) \approx q_{\phi}(\theta|x=x_0)$
The idea
With a few simulations it's hard to approximate the distribution.
$\to$ we need more simulations
but if we have a few simulations
and the gradients
then it's possible to have an idea of the shape of the distribution.
$\to$ We integrate the gradients $\nabla_{\theta} \log p(\theta|x)$ in the process to reduce the number of simulations.
How can we train our neural density estimator with both simulations and score ?
Normalizing Flows (NFs) as Neural Density Estimators
The key idea of NFs is to transform a simple density distribution $p_z(z)$ through a series of bijective functions $f_i$ to reconstruct a complex target distribution $p_x(x)$.
$z$
$z'$
$\longrightarrow$
$f_2(z')$
$\longleftarrow$
$f^{-1}_1(z')$
$\longrightarrow$
$f_{1}(z) $
$\longleftarrow$
$f_2^{-1}(x)$
$x$
The density distribution is obtained by using the change of variable formula:
$\scriptstyle p_x^{\phi}(x) = p_z\left(f^{-1}_{\phi}(x)\right) \det \left|\frac{\partial f^{-1}_{\phi}(x)}{\partial x} \right|. $
How can we find the transformation parameter $\phi$ from the data $x$
to be as close as possible to the true distribution $p(x)$?
$\to$ we need a tool to compare distributions: the Kullback-Leiber Divergence.
$$\begin{array}{ll} D_{KL}(p(x)||p_x^{\phi}(x)) &= \mathbb{E}_{p(x)}\Big[ \log\left(\frac{p(x)}{p_x^{\phi}(x)}\right) \Big] \\ &= \mathbb{E}_{p(x)}\left[ \log\left(p(x)\right) \right] - \mathbb{E}_{p(x)}\left[ \log\left(p_x^{\phi}(x)\right) \right]\\ \end{array} $$
$$ \implies Loss =- \mathbb{E}_{p(x)}\left[ \log\left(p_x^{\phi}(x)\right) \right] + cte $$
By minimizing the negative log likelihood.
But to train the NF, we want to use both
simulations and
gradients:
$\displaystyle \mathcal{L} = $ $\displaystyle \ -\mathbb{E}[\log \: p_x^{\phi}(x)]$ $+\lambda \ $$ \displaystyle \mathbb{E}\left[ \parallel \nabla_{x} \log p_x(x) - \underbrace{\nabla_x \log p_x^{\phi}(x)}\parallel_2^2 \right]$
Problem: the gradient of current NFs lack expressivity.
Experiment: Lotka Volterra
Draw N parameters $\alpha,\beta,\gamma,\delta \sim p(\underbrace{\alpha,\beta,\gamma,\delta}_{\theta})$
Run N simulations $x \sim p(\underbrace{\text{Prey}, \: \text{Predator}}_{x}| \underbrace{\alpha,\beta,\gamma,\delta}_{\theta})$
Compress $x \in \mathbb{R}^{20}$ into $y = r(x) \in \mathbb{R}^4$
Train a NF on ($y_i$,$\theta_i$)$_{i=1..N}$
Approximate the posterior: $p(\theta|x_0) \approx q_{\phi}(\theta|r(x_0))$
The proposal distribution.
The true posterior.
The posterior is very small compare to the proposal distribution.
Results
$ \underbrace{p(\theta|x_0)}_{\text{posterior}} \propto$ $ \underbrace{p(x_0|\theta)}_{\text{likelihood}}$ $ \underbrace{p(\theta)}_{\text{prior}}$
And found out that whether with our method or SCANDAL the gradients do not help.
The proposal distribution.
The true posterior.
Results
$ \underbrace{p(\theta|x_0)}_{\text{posterior}} \propto$ $ \underbrace{p(x_0|\theta)}_{\text{likelihood}}$ $ \underbrace{p(\theta)}_{\text{prior}}$
With simulations only
With simulations and score
Conclusion
Thank you !