Origins of the Asymptotic Bias of SNIS

Monte Carlo

importance sampling

Edgeworth expansions

uniform integrability

A short note on the known \(\mathcal{O}(1/N)\) asymptotic bias formula for self-normalised importance sampling: where we could find it first written down, why turning it into a proper moment statement is subtler than it looks, and what we are currently trying to do about it in the revision of our paper.

Published

April 10, 2026

Anyone who has used self-normalised importance sampling (SNIS) has seen the same \(\mathcal{O}(1/N)\) expression for its asymptotic bias. There is broad consensus, across several communities, both that SNIS is biased and on what its leading-order bias should be. In the course of the revision of (Deligiannidis et al. 2025) we ended up asking a narrower question: what are the origins of this expression, and under what rigorous assumptions does it actually hold? The answer turned out to be trickier than we expected.

Hubert Robert, *Vue imaginaire de la Grande Galerie du Louvre en ruines*, 1796. Musée du Louvre, Paris.

The setting and the formula

Fix a measurable space \((\mathcal{X},\mathcal{B}(\mathcal{X}))\), a target density \(\pi\), and a proposal \(q\) with \(\mathrm{supp}(\pi)\subseteq\mathrm{supp}(q)\). As in (Deligiannidis et al. 2025) we write \[ \omega : \mathcal{X}\to [0,\infty), \qquad \omega(x) := \frac{\pi(x)}{q(x)}, \] for the importance weight function, \(q(h) := \mathbb{E}_q[h(X)] = \int h(x)\, q(x)\, dx\) for expectations under the proposal, and standardise throughout to \(q(\omega)=1\), so that \(\pi(f) = q(\omega f)\) for any test function \(f : \mathcal{X}\to\mathbb{R}\).

Algorithm 1 (Self-normalised importance sampling)

Input. Proposal \(q\), unnormalised target \(\pi\), test function \(f\), sample size \(N\geq 1\).

Draw \(x_1,\dots,x_N \overset{\text{i.i.d.}}{\sim} q\).
Compute the importance weights \(\omega_n := \omega(x_n) = \pi(x_n)/q(x_n)\) for \(n=1,\dots,N\).
Return the self-normalised estimator \[ \widehat{F}_N \;:=\; \frac{\sum_{n=1}^N \omega_n f(x_n)}{\sum_{n=1}^N \omega_n}. \]

The estimator \(\widehat F_N\) is strongly consistent for \(\pi(f)\) but biased, and the known expression for its leading-order bias, reproduced in essentially every review of importance sampling, is \[ \mathbb{E}[\widehat F_N] - \pi(f) \;=\; -\frac{1}{N} q(\omega^2 (f-\pi(f))) + o(1/N). \] Our aim, in the revision of (Deligiannidis et al. 2025), was to track down the origins of this expression and to pin down a set of rigorous assumptions under which it holds as a statement about \(\mathbb{E}[\widehat F_N]\).

Where we first saw it written down

The earliest place we could locate the expression explicitly is Timothy Hesterberg’s 1988 Stanford PhD thesis (Hesterberg 1988), Section 2.5.2, equation (2.63): \[ \mathbb{E}[\hat\mu_{\text{ratio}} - \mu] \;=\; -\frac{1}{n} \mathbb{E}_g[W(Y-\mu W)] + \text{lower-order terms}, \] which, translated into the notation of (Deligiannidis et al. 2025), reads \[ \mathbb{E}[\widehat F_N - \pi(f)] \;=\; -\frac{1}{N} q(\omega^2 (f-\pi(f))) + \text{lower-order terms}. \] Hesterberg is a familiar name in the importance sampling community; we are not trying to unearth anything obscure here. What matters for our purposes is how the expression is obtained there, and what Hesterberg is careful not to claim about it.

The derivation has two steps. First, \(\widehat F_N\) is written as a ratio \(\bar Y / \bar W\) of sample averages and expanded to second order around its population means, taking expectations term by term to read off the \(\mathcal{O}(1/N)\) constant. Second, to give this analytic support, Hesterberg invokes the Edgeworth expansion of Bhattacharya and Ghosh (1978) for smooth functions of sample means, and identifies \(-q(\omega^2(f-\pi(f)))\) as the bias term appearing in that expansion. On page 40 he adds an important caveat:

“These are the first order bias terms in the Edgeworth expansion of the distributions of these estimates. These are more useful than the actual biases, which may be infinite or undefined.”

In other words, what is available from this derivation is a leading-order bias in distribution, together with an explicit note that the corresponding moment statement about \(\mathbb{E}[\widehat F_N]\) may not even make sense. The expression has since been reproduced in many places, see for instance Liu (2008, sec. 2.5), but the caveat is real and has not, as far as we can tell, been systematically addressed.

Why the distributional result does not give the moment result

The subtlety is a basic probabilistic one: knowing the distribution of a sequence of random variables, even to high pointwise accuracy, does not pin down their expectations. Writing \(\mathbb{E}[X]\) as a tail integral, \[ \mathbb{E}[X] \;=\; \int_0^\infty (1 - F_X(t))\, dt \;-\; \int_{-\infty}^0 F_X(t)\, dt, \] if one approximates \(F_X\) by \(\Phi + r_N\) with \(|r_N(t)|\) pointwise small, there is no reason for \(\int r_N(t)\, dt\) to be small: the CDF error can be tiny at every fixed \(t\) and still carry non-negligible mass in the tails.

A simple counterexample. For \(N\geq 1\), let \(Z \sim \mathcal{N}(0,1)\), and let \(V_N\) be independent of \(Z\) with \[ V_N \;=\; \begin{cases} 0 & \text{with probability } 1 - 1/N,\\ N^{3/4} & \text{with probability } 1/N. \end{cases} \] Set \(T_N := Z + V_N / \sqrt{N}\). Since \(V_N = 0\) with probability \(1 - 1/N\), \[ F_{T_N}(z) \;=\; (1 - 1/N)\, \Phi(z) + (1/N)\, \Phi(z - N^{1/4}) \;=\; \Phi(z) + \mathcal{O}(1/N) \] uniformly in \(z\), so at the CDF level \(T_N\) is Gaussian to order \(\mathcal{O}(1/N)\) and the Edgeworth bias term is \(0\). But \[ \mathbb{E}[T_N] \;=\; \frac{N^{3/4}}{N \sqrt{N}} \;=\; N^{-3/4} \;\neq\; 0. \] A rare but large spike is invisible to the CDF at any fixed \(z\) yet shifts the mean. This is exactly the pathology Hesterberg was warning about, and it is why the Edgeworth route cannot by itself guarantee that \(\mathbb{E}[\widehat F_N]\) obeys the \(\mathcal{O}(1/N)\) expansion, or even that it exists.

The missing ingredient. The standard bridge from distributional to moment convergence is uniform integrability. By Billingsley (1999, Theorem 3.5), if \(X_N \xrightarrow{d} X\) and \(\{X_N\}\) is uniformly integrable, then \(\mathbb{E}[X_N] \to \mathbb{E}[X]\); a sufficient condition is \(\sup_N \mathbb{E}[|X_N|^{1+\delta}] < \infty\) for some \(\delta > 0\). An Edgeworth expansion does not supply this on its own, and for SNIS the main difficulty is concentrated in the denominator \(\bar\omega\), whose inverse moments need to be controlled separately. The general point that distributional expansions do not automatically transfer to moments is discussed in Bhattacharya and Rao (1986, chap. 2) and Hall (1992, sec. 2.5).

What we are trying to do about it

In the revision of (Deligiannidis et al. 2025) we are working on the missing moment statement. Concretely, we assume that \(f\) satisfies \(\pi(|f|) < \infty\) and that, for some \(\epsilon > 0\) and some \(\eta > 0\), \[ q(|\omega^2 (f - \pi(f))|^{1+\epsilon}) < \infty, \qquad q(\omega^{-\eta}) < \infty, \] and under these conditions we show that \[ \lim_{N\to\infty} N \cdot \mathbb{E}[\widehat F_N - \pi(f)] \;=\; -\, q(\omega^2 (f-\pi(f))), \] as a statement about expectations rather than CDFs. The first assumption is the natural moment condition on the quantity that appears on the right-hand side. The second, the inverse-moment condition \(q(\omega^{-\eta}) < \infty\), controls the behaviour of the weights \(\omega\) near \(0\), which in turn governs the behaviour of \(\bar\omega\) on the event where it is small; this is exactly where the subtlety in passing from a distributional to a moment statement lives. We expect this inverse-moment assumption to be relaxable at the price of stronger (higher) moment conditions on \(\omega\) and \(\omega f\), and we are currently trying to work out the sharp trade-off. The precise statements will appear in the revised version of (Deligiannidis et al. 2025), and we will update this post once they do.

A parallel, independent result

While we were doing this, Kamélia Daudel and François Roueff were, in parallel, proving a closely related moment statement in a different context. Their paper (Daudel and Roueff 2024) studies asymptotics for gradient estimators of the VR-IWAE bound in importance weighted variational inference, and along the way obtains a first-order moment expansion for self-normalised importance weights of the same flavour as the one we are after in (Deligiannidis et al. 2025). The two efforts are motivated by different applications and written in different languages, but they point in the same direction: upgrading the known asymptotic bias of SNIS to a proper moment statement. We are grateful to Kamélia Daudel for pointing out her work to us.

Conclusion

There is long-standing consensus that SNIS is biased and on the form of its leading-order bias, \[ \mathbb{E}[\widehat F_N] - \pi(f) \;\approx\; -\frac{1}{N} q(\omega^2 (f-\pi(f))). \] What seems to have been missing is a self-contained moment-level proof: the Edgeworth-based derivation going back to Hesterberg (1988) is distributional, and uniform integrability, which is what would turn it into a moment statement, is not automatic here. Closing that gap is what the revision of (Deligiannidis et al. 2025) is aiming to do, and what (Daudel and Roueff 2024) does from a different angle.

Somewhat surprisingly to us, we could not locate a rigorous proof of this known expression anywhere in the literature. We hope this post sheds a little bit of light on the nature of the asymptotic bias of SNIS, and we will get back to it once the revision of (Deligiannidis et al. 2025) is out.

Acknowledgements. Thanks to my supervisor Pierre Jacob for pushing me to track down the first occurrence of this formula, and for finding Hesterberg’s 1988 thesis along the way.

Image: Hubert Robert, Vue imaginaire de la Grande Galerie du Louvre en ruines, 1796 (Musée du Louvre, Paris).

References

Bhattacharya, R. N., and J. K. Ghosh. 1978. “On the Validity of the Formal Edgeworth Expansion.” The Annals of Statistics 6 (2): 434–51. https://doi.org/10.1214/aos/1176344134.

Bhattacharya, R. N., and R. Ranga Rao. 1986. Normal Approximation and Asymptotic Expansions. Wiley Series in Probability and Mathematical Statistics. Wiley.

Billingsley, Patrick. 1999. Convergence of Probability Measures. 2nd ed. Wiley.

Daudel, Kamélia, and François Roueff. 2024. “Learning with Importance Weighted Variational Inference: Asymptotics for Gradient Estimators of the VR-IWAE Bound.” Submitted. https://arxiv.org/pdf/2410.12035.

Deligiannidis, George, Pierre E. Jacob, El Mahdi Khribch, and Austin Wang. 2025. On Importance Sampling and Independent Metropolis–Hastings.

Hall, Peter. 1992. The Bootstrap and Edgeworth Expansion. Springer Series in Statistics. Springer-Verlag.

Hesterberg, Timothy C. 1988. “Advances in Importance Sampling.” {PhD} dissertation, Stanford University.

Liu, Jun S. 2008. Monte Carlo Strategies in Scientific Computing. 2nd ed. Springer.