Quantum workers in Bernoulli factories

quantum computing
simulation
Author

Rémi Bardenet

Published

February 14, 2024

TL;DR: A quantum computer lets you provably build more general Bernoulli factories than your laptop.

I have grown an interest for quantum computing, both for fun and because it naturally applies to sampling my favourite distribution, determinantal point processes. One of the natural (and still quite open) big questions in quantum computing is, for a given computational task such as solving a linear system, whether having access to a quantum computer gives you any advantage over using your laptop in the smartest way possible. Maybe the quantum computer lets you solve part of your problem faster, or maybe it allows you to solve a more general class of problems. Dale, Jennings, and Rudolph (2015) prove a quantum advantage of the latter kind, for a task that appeals to a computational statistician: a quantum computer gives you access to strictly more Bernoulli factories than your laptop does. In this post, I discuss one of their examples.

Figure 1: An excerpt from an excellent comic strip by Scott Aaronson and Zach Weinersmith.

Bernoulli factories

First, I need to define what a Bernoulli factory is. Loosely speaking, a Bernoulli factory is an algorithm that, when fed with i.i.d draws from a Bernoulli random variable \(B(p)\) with unknown parameter \(p\), outputs a stream of independent Bernoullis with parameter \(f(p)\). The algorithm does not have access to the value of \(p\), and needs to work for as large a range of values of \(p\) as possible. For instance, a trick attributed to von Neumann gives you a Bernoulli factory for the constant function \(f\equiv 1/2\), can you guess how? If you have never seen this trick, take a break and think about it. Here is a hint: try to pair Bernoulli draws and define two events of equal probability.

The problem of determining what Bernoulli factories can be constructed on a classical (as opposed to quantum) computer has been answered by Keane and O’Brien (1994). Essentially, it is necessary and sufficient that \((i)\) \(f\) be continuous on its domain \(\mathcal{P}\), and that \((ii)\) either \(f\) is constant or there exists an integer \(n\) such that, for all \(p\in\mathcal{P}\), \[ \min[ f(p), 1-f(p)] \geq \min [ p^n, (1-p)^n ]. \] In particular, a non-constant \(f\) should not take the values \(0\) or \(1\) in \((0,1)\), and cannot approach these extreme values too fast. In particular, the doubling function \(f_\mathrm{double}:p\mapsto 2p\) defined on \([0,1/2]\) does not correspond to a Bernoulli factory, while its restriction to \([0,1/2-\epsilon]\) does, for any \(\epsilon>0\). Another simple example is \[ f_\mathrm{quadratic}:p\mapsto 4p(1-p) \tag{1}\] defined on \([0,1]\), which does not correspond to a Bernoulli factory. Yet, the rest of the post shows that \(f_\mathrm{quadratic}\) does correspond to a specific weakening of the notion of Bernoulli factory, one that is natural in quantum computing.

Quantum computers and quantum coins

Now buckle up, because I need to define a mathematical model for a quantum computer. This model only requires basic algebra, albeit with strange notation. Let \(N\) be a positive integer, and \[ \mathbb{H} = (\mathbb{C}^2)^{\otimes N} = \mathbb{C}^2\otimes \dots \otimes \mathbb{C}^2, \] where the tensor product is taken \(N\) times. An \(N\)-qubit quantum computer is a machine that, when fed with

  1. a positive semi-definite, Hermitian operator \(\rho\) acting on \(\mathbb{H}\), with trace norm \(1\) (the state),
  2. a Hermitian operator \(A\) on \(\mathbb{H}\) (the observable),

outputs a draw from the random variable \(X_{A,\rho}\), with support included in the spectrum of \(A\), defined by \[ \mathbb{E} g(X_{A,\rho}) = \mathrm{Tr}(\rho g(A)), \quad g:\mathbb{H}\rightarrow \mathbb{R}_+. \tag{2}\] Here \(g(A)\) is the operator that has the same eigenvectors as \(A\), but where each eigenvalue \(\lambda\) is replaced by \(g(\lambda)\). The correspondence in Equation 2 between a state-observable pair and a probability distribution on the spectrum of the observable is a cornerstone of quantum physics called Born’s rule, and it is the only bit of quantum theory we shall need. In other words, we see a quantum computer as a procedure to draw from probability distributions parametrized by state-observable pairs. We give two fundamental examples of such state-observable pairs, which can be respectively interpreted as describing one quantum coin and two quantum coins.

The quantum coin. Consider a one-qubit computer, i.e. \(N=1\). Then \(\mathbb{H} = \mathbb{C}^2\) has dimension \(2\), and we fix an orthonormal basis, which we denote by \((\ket{0}, \ket{1})\). The strange notation \(\ket{\cdot}\) is inherited from physics, and is very practical in computations, as you will see. In short, denote by \(\braket{\cdot\vert\cdot}\) (a bracket, or bra-ket) the inner product in \(\mathbb{H}\). Now, a vector in \(\mathbb{H}\) is written \(\ket{v}\) (a ket). Similarly, define the linear form \(\bra{v}\) (a bra) by \[ \bra{v}: \ket{u} \mapsto \braket{v\vert u}. \] By construction, we can write things like \[ \bra{v} \ket{u} = \braket{v\vert u}, \] so that the bra-ket notation for linear forms and vectors is consistent with the inner product.

Now, remember we have fixed a basis \((\ket{0}, \ket{1})\) of \(\mathbb{H}\). For \(p\in[0,1]\), we define \[ \ket{p} = \sqrt{1-p} \ket{0} + \sqrt{p}\ket{1}. \] This definition is consistent with earlier notation, as \(\ket{p} = \ket{0}\) when \(p=0\), for instance. Now, we define a quantum coin as the state \(\rho_{\mathrm{qc}} = \ket{p}\bra{p}\). It is the projection onto \(\mathbb{C}\ket{p}\), and in particular it is a positive semi-definite, Hermitian operator of trace \(1\), and hence a valid state. As observable, we take the projection onto the second vector of the basis, which we denote in the bra-ket notation by \(\ket{1}\bra{1}\). What random variable \(X_{\ket{1}\bra{1}, \rho_{\mathrm{qc}}}\) does this state-observable pair define in Equation 2?

Well, the spectrum of the observable is \(\{0,1\}\), so we have defined a Bernoulli random variable. Moreover, the probability that it is equal to \(1\) is given by taking \(g:\lambda\mapsto \mathbf{1}_{\lambda=1}\) in Equation 2, yielding \[ \mathbb{P}(X_{\ket{1}\bra{1}, \rho_{\mathrm{qc}}} = 1) = \mathrm{Tr}\left[ \ket{1}\bra{1} \ket{p}\bra{p} \right] = \vert \braket{1\vert p}\vert^2 = p. \] by cyclicity of the trace. All of this to define a \(B(p)\) variable! Things get more interesting when you try to create two dependent Bernoulli variables.

Two quantum coins. Consider now a computer with two qubits, so that the Hilbert space is \(\mathbb{H}=\mathbb{C}^2\otimes\mathbb{C}^2\). From our orthonormal basis \((\ket{0}, \ket{1})\) of \(\mathbb{C}^2\), we can build an orthonormal basis \((\ket{i}\otimes\ket{j}, i,j\in\{0,1\})\) of \(\mathbb{H}\). To keep expressions short, it is customary to write \(\ket{i}\otimes\ket{j}\) as \(\ket{ij}\). To define a pair of quantum coins, we now consider the tensor product of two quantum coins, \[ \ket{p}\otimes \ket{p} = (1-p) \ket{00} + \sqrt{p(1-p)}\ket{01} + \sqrt{p(1-p)}\ket{10}+ p \ket{11}. \] We think of the corresponding state \(\rho_{2\mathrm{qc}} = (\ket{p}\otimes \ket{p})(\bra{p}\otimes \bra{p})\) as two quantum coins. Now consider for your observable an operator \(B\) with four distinct eigenvalues, say \(\lambda_{ij} \in\mathbb{C}\) for \(i, j\in\{0,1\}\), each corresponding to eigenvector \(\ket{ij}\). In other words, the spectral decomposition of \(B\) is \[ B = \sum_{i, j\in\{0,1\}} \lambda_{ij} \ket{ij}\bra{ij}. \] The random variable \(X_{B, \rho_{2\mathrm{qc}}}\), associated through Equation 2 to two quantum coins and our newly defined observable \(B\), has support in \[ \{\lambda_{00}, \lambda_{01}, \lambda_{10}, \lambda_{11}\}. \] Moreover, taking \(g:\lambda\mapsto \mathbf{1}_{\lambda=\lambda_{ij}}\) in Equation 2, we obtain \[ \mathbb{P}(X_{B, \rho_{2\mathrm{qc}}} = \lambda_{ij}) = \mathrm{Tr}\left[(\ket{p}\otimes \ket{p})(\bra{p}\otimes \bra{p}) \ket{ij}\bra{ij}\right] = p^{i}(1-p)^{1-i} \times p^{j}(1-p)^{1-j}, \] again by cyclicity of the trace and then carefully distributing our multiplication, noting that most terms are zero by orthogonality. Otherly put, the indices of \(X_{B, \rho_{2\mathrm{qc}}}\) are a pair of independent Bernoullis with equal parameter \(p\). Again, this might feel like a lot of algebraic pain for no gain, but wait for it.

What if we had taken the same state, but with another observable? Say the observable with four distinct eigenvalues \(\lambda_{\phi^+}, \lambda_{\phi-}, \lambda_{\psi+}, \lambda_{\psi-}\in \mathbb{C}\), and corresponding eigenvectors \[ \ket{\phi^{\pm}} = \frac{\ket{00}\pm\ket{11}}{\sqrt{2}}, \quad \ket{\psi^{\pm}} = \frac{\ket{01}\pm\ket{10}}{\sqrt{2}}. \] Then, the random variable \(X_{C, \rho_{2\mathrm{qc}}}\) defined by Born’s rule in Equation 2 is supported in \[ \{\lambda_{\phi^+}, \lambda_{\phi-}, \lambda_{\psi+}, \lambda_{\psi-}\}, \] with \[ \mathbb{P}(X_{C, \rho_{2\mathrm{qc}}} = \lambda_{\phi^+}) = \mathrm{Tr\left[ \rho_{2\mathrm{qc}} \ket{\phi^+}\bra{\phi^+} \right]} = \vert (\bra{p}\otimes\bra{p})\ket{\phi^+} \vert^2 = \frac12. \] Similarly, \[ \mathbb{P}(X_{C, \rho_{2\mathrm{qc}}} = \lambda_{\phi^-}) = \vert (\bra{p}\otimes\bra{p})\ket{\phi^-} \vert^2 = \frac{(2p-1)^2}{2}, \] \[ \mathbb{P}(X_{C, \rho_{2\mathrm{qc}}} = \lambda_{\psi^+}) = \vert (\bra{p}\otimes\bra{p})\ket{\psi^+} \vert^2 = 2p(1-p), \] and \[ \mathbb{P}(X_{C, \rho_{2\mathrm{qc}}} = \lambda_{\psi^-}) = \vert (\bra{p}\otimes\bra{p})\ket{\psi^-} \vert^2 = 0. \] You can check that the four probabilities sum to \(1\). This time, if you map, e.g., \(\phi_+\) to the string \(00\), \(\phi^-\) to \(11\), \(\psi^+\) to \(01\), and \(\psi^-\) to \(10\), we no longer have independent Bernoulli draws, but a rather strange correlation structure. We shall see that \(X_{C,\rho_{2\mathrm{qc}}}\) allows us to build a Bernoulli factory that is beyond the reach of a classical computer.

A quantum Bernoulli factory

Imagine the following procedure. Draw the random variable \(X_{C,\rho_{2\mathrm{qc}}}\). If you obtain \(\lambda_{\phi^-}\) or \(\lambda_{\psi^+}\), then stop, and respectively output \(0\) and \(1\). Otherwise, draw another independent realization of \(X_{C,\rho_{2\mathrm{qc}}}\), etc. This is reminiscent of the von Neumann trick we mentioned earlier. What have we achieved? Well, the output is a Bernoulli draw with parameter \[ \frac{2p(1-p)}{2p(1-p)+\frac{(2p-1)^2}{2}} = 4p(1-p). \] Repeating the procedure as many times as you want draws, we thus have a Bernoulli factory for \(f_{\mathrm{quadratic}}\) in Equation 1, which we know to be beyond the reach of classical Bernoulli factories!

The difference is that our Bernoulli factory is a quantum Bernoulli factory. In particular, our basic resource is (physically) independent copies of \(\ket{p}\). This is asking for strictly more than (statistically) independent Bernoulli draws. Indeed, depending on your observable, two physically independent copies of \(\ket{p}\) can give you two i.i.d. Bernoullis \(X_{B,\rho_{2\mathrm{qc}}}\), or something more complicated like \(X_{C,\rho_{2\mathrm{qc}}}\). If you consider as equivalent the cost of preparing the two types of inputs, i.i.d. Bernoullis \(B(p)\) on one side and physically independent copies of \(\ket{p}\) on the other side, then you have a quantum advantage. It might be a big assumption, but I find it easier to swallow than similar caveats in other quantum advantages that I’ve read about.

Further remarks

The example in this post is from the paper by Dale, Jennings, and Rudolph (2015). The authors further characterize the Bernoulli factories that you can build with only single-qubit operations: they strictly include classical Bernoulli factories and the example from this post. In other words, it is not necessary to use pairs of qubits to build \(X_{C,\rho_{2\mathrm{qc}}}\). Since then, there has been more work on quantum Bernoulli factories, for instance considering quantum-to-quantum Bernoulli factories, where the goal is to create independent copies of \(\ket{f(p)}\) rather than a stream of Bernoulli random variables.

I thank my group for valuable comments during the writing of this post. One non-consensual point is that I have tried to reduce the quantum formalism to the correspondence in Equation 2 between a state-observable pair and a random variable. This has the advantage of keeping the necessary algebra to a minimum, but it forced me to introduce rather abstract observables, with a spectrum that we only use through its indices. A more standard (but arguably lengthier) treatment might have involved projection-valued measures.