Better than Monte Carlo (this post is not about QMC) – Statisfaction

(This is repost from this December 2022 post on the old website, but since math support is so poor on Wordpress, I’d rather have this post published here.)

Say I want to approximate the integral \[I(f) := \int_{[0, 1]^s} f(u) du\] based on \(n\) evaluations of function \(f\). I could use plain old Monte Carlo: \[\hat{I}(f) = \frac 1 n \sum_{i=1}^n f(U_i),\quad U_i \sim \mathrm{U}([0, 1]^s).\] whose RMSE (root mean square error) is \(O(n^{-1/2})\).

Can I do better? That is, can I design an alternative estimator/algorithm, which performs \(n\) evaluations and returns a random output, such that its RMSE converge quicker?

Surprisingly, the answer to this question has been known for a long time. If I am ready to focus on functions \(f\in\mathcal{C}^r([0, 1]^s)\), Bakhvalov (1959) showed that the best rate I can hope for is \(O(n^{-1/2-r/s}).\) That is, there exist algorithms that achieve this rate, and algorithms achieving a better rate simply do not exist.

Ok, but how can I actually design such an algorithm? The proof of Bakhvalov contains a very simple recipe. Say I am able to construct a good approximation \(f_n\) of \(f\), based on \(n\) evaluations; assume the approximation error is \(\|f-f_n\|_\infty = O(n^{-\alpha})\), \(\alpha>0\). Then I could compute the following estimator, based on a second batch of \(n\) evaluations: \[ \hat{I}(f) := I(f_n) + \frac 1 n \sum_{i=1}^n (f-f_n)(U_i),\quad U_i \sim \mathrm{Uniform}([0, 1]^s).\] and it is easy to check that this new estimator is unbiased, that its variance is \(O(n^{-1-2\alpha})\), and therefore its RMSE is \(O(n^{-1/2-\alpha})\). (It is based on \(2n\) evaluations.)

So there is strong relation between Bakhvalov results and function approximation. In fact, the best rate you can achieve for the latter is \(\alpha=r/s\), which explain the rate above for stochastic quadrature. You can see now why I gave this title to this post. QMC is about using points that are better than random points. But here I’m using IID points, and the improved rate comes from the fact I use a better approximation of \(f\).

Here is a simple example of a good function approximation. Take \(s=1\), and \[ f_n(u) = \sum_{i=1}^n f( \frac{2i-1}{2n} ) \mathbf{1}_{[(i-1)/n, i/n]}(u); \] that is, split \([0, 1]\) into \(n\) intervals \([(i-1)/n, i/n]\), and approximate \(f\) inside a given interval by its value at the centre of the interval. You can quickly check that the approximation error is then \(O(n^{-1})\) provided \(f\) is \(C^1\). So you get a simple recipe to get the optimal rate for \(s=1\) and \(r=1\).

Is it possible to generalise this type of construction to any \(r\) and any \(s\)? The answer is in our recent paper with Mathieu Gerber, which you can find here. You may also want to read Novak (2016), which is a very good entry on stochastic quadrature, and in particular gives a nice overview of Bakhvalov’s and related results.