In quantitative finance, risk assessment is computer intensive and expensive, and there is a market for cheaper and faster methods, as seen from the large literature on parallelism and GPU implementation of numerical methods for option pricing [[1], [6], [7], [11], [12], [14], [15], [16], [21]].
American contracts are not easy to compute on a parallel computer; even if a large number of them have to be computed at once, an embarrassingly parallel problem, still the cost of the transfer of data makes parallelism at the level of one contract attractive. But the task is not easy, especially when the number of underlying assets is large [[3], [5], [22]], ruling out the PDE approach [[2]]. Furthermore the most popular sequential algorithm is the LeastSquare Monte Carlo (LSMC) method of Longstaff and Schwartz [[19]]. Exploiting parallelism by allocating blocks of Monte Carlo paths to different processors is not convincingly efficient [[7]], because the backward regression is essentially sequential and needs all Monte Carlo paths in the same processor.
In this note, we investigate the parareal method, introduced in [[18]], for the task. An earlier study by Bal and Maday [[4]] has paved the way but it is restricted to Stochastic Differential Equations (SDE) without LSMC. Yet it contains a convergence proof for the twolevel method in the restricted case where the solution is computed exactly at the lowest level [[4]].
This note provides a description of the parareal method for American contracts, a numerical section to assess its performance. The scalar case is investigated. LeastSquare Monte Carlo (LSMC) and parareal time decomposition with two or more levels are used, leading to an efficient parallel implementation. It contains also a convergence argument for the twolevel parareal Monte Carlo method when the time step used for the Euler scheme at each level is appropriate.
Convergence of LSMC for American contracts has been proved by Clément, Lamberton and Protter [[9]]; it is not unreasonable to expect an extension of their estimates for the parareal method, but this note does not contain such a result, only a numerical assessment.
With the usual notations [[17]], consider a probability space , and functions , uniformly Lipschitz continuous in .
Let be a standard Brownian motion on . Let , be a diffusion process, strong solution to the SDE
(1)dXt=b(t,Xt)dt+σ(t,Xt)dWt, X(0)=X0∈R. A (vanilla) European contract on X is defined by its maturity T and its payoff , typically in the case of a put of strike price κ and interest rate r . An American style contract allows the owner to claim the payoff at any time . So a rational strategy to maximize the average profit at time t is to find the valued stopping time solution to the Snell envelope problem:V(t,Xt):=E[e−r(τt−t)f(τt,Xτt)Ft]=Pesssupτ∈TtFE[e−r(τ−t)f(τ,Xτ)Ft] where is the (augmented) filtration of W and denotes the set of valued stopping times. Such an optimal stopping time exists (see [[8], [20]]). We do not specify b , σ or f to stay in a general Optimal Stopping framework. In practice, American style options are replaced by the socalled Bermuda options where the exercise instants are restricted to a time grid , , where ( ). Owing to the Markov property of , the corresponding Snell envelope reads and satisfies a Backward Dynamic Programming recursion on k :(2)V(T,XT)=f(T,XT), V(tk,Xtk)=max{f(tk,Xtk),e−rhE[V(tk+1,Xtk+1)Xtk]}, k=K−1,…,0.
The optimal stopping times (from time ) are given by a similar backward recursion:
(3)τK=T,τk=tk if f(tk,Xtk)>e−rhE[V(tk+1,Xtk+1)Xtk], τk=τk+1 otherwise, k=K−1,…,0. When cannot be simulated at a reasonable computational cost, it can be approximated by the Euler scheme with step h , denoted , which is a simulable Markov chain recursively defined by(4)X¯tk+1h=X¯tkh+b(tk,X¯tkh)h+σ(tk,X¯tkh)ΔWk, X¯0h=X0, k=0,…,K−1, where so that are i.i.d. distributed random variables. From now on we switch to the Euler scheme, its Snell envelope, etc.
In LSMC, for each k , the conditional expectation as a function of , is approximated by its projection on the linear space spanned by the monomials from the values generated by M Monte Carlo paths using ((4)); then each path has its own optimal stopping time at each given by (for the stopping problem starting at k )
τK(m)=T, τk(m)=tk if f(tk,X¯tkh,(m))>e−rh∑p=0Pa¯kp (X¯tkh,(m))p, τk(m)=τk+1(m)otherwise where{a¯ko,…,a¯kP}=argmin{(ak0,…,akP)∈RP+1}∑m=1M(V(tk+1,X¯tk+1h,(m))−∑p=0Pakp(X¯tkh,(m))p)2. Finally the price of the American contract is computed byV(0,X0)=max{f(0,X0),1M∑m=1Me−rτ1(m)f(τ1(m),X¯τ1(m)h,(m))}. Note that is the best approximation of in the leastsquare sense (e.g., Longstaff and Schwartz's paper [[19]]) in the vector subspace of .


A twolevel parareal algorithm 
Consider an ODE
x˙=f(x,t),x(0)=x0,t∈[t0,tK]=∪0K−1[tk,tk+1]. Assume that is a highprecision integrator that computes x at from at . Assume is a similar integrator but of low precision. The parareal algorithm is an iterative process with above a forward loop in time, (5)xk+1n+1=GΔ(xkn+1,tk)+Gδ(xkn,tk)−GΔ(xkn,tk). So the coarsegrid solution is corrected by the difference between the finegrid prediction computed from the old value on that interval and the coarsegrid old solution. In this analysis, and are Euler explicit schemes with time steps respectively.
The same method can be applied to an SDE in the context of the Monte Carlo method provided the random variables defining in ((4)) are sampled once and for all in the initial phase of the algorithm and reused for all n (see the initialization step in 3.2 for the notations).
The iterative process ((5)) is applied on each sample path with a single step of ((4)) with and the result of J steps of ((4)) with . An error analysis is available in [[4]] for the stochastic case in the limit case , i.e. when the fine integrator is the exact solution. For further results of the parareal method applied to deterministic ODEs and PDEs, see [[18]] and [[13]]. In this note, we also extend the result of [[4]] to the case .
We denote by a realization of ; consider a refinement of each interval by a uniform subpartition of time step , for some integer . Then
[tk,tk+1]=∪j=0J−1[tk,j,tk,j+1] with tk,j+1=tk,j+δt,j=0,…,J−1, so that tk=tk,0=tk−1,J. Denote by the projection of f on the monomials .
Let be the iteration index of the parareal algorithm.
•  Initialization Generate for the M paths of the Monte Carlo method with the coarse and fine mesh. •  Compute recursively forward all Monte Carlo paths from by using ( (4)) with and then recursively backward from , .  
•  for •  for (forward loop): (i)  compute the finegrid solution of ( (4)) with refined step , started at from .  (ii)  compute the coarsegrid solution at : ;  (iii)  set .   •  end kloop
 •  initialization: Compute  •  for (backward loop): (i)  on each , from , compute by a backward loop in j V˜k,jδ,n=max{f(tk,j,X˜tk,jδ,n),e−rδtPE[V˜k,j+1δ,nX˜tk,jδ,n]},j=J−1,…,0;  (ii)  compute ;  (iii)  set .   •  end backward kloop
 
•  end nloop

Note that all finegrid computations are local and can be allocated to a separate processor for each k , for parallelization.
The following partial results can be established for 3.2bis obtained from 3.2 by changing the first step into and the last step into: .
Assume continuous, in x with spatial derivatives uniformly Lipschitz in . Then there exist C, independent of and n, such that for , :
(6)‖Xˆtkn−X¯tkδ‖L2(P)≤(CΔt)n(kn)‖X¯tkΔ−X¯tkδ‖L2(P)≤(CΔt)n(kn)Δt. Furthermore, for all (e.g., definition (iii)).
For a fixed δt and n parareal iterations, the final and uniform errors satisfy
(7)‖XˆTn−X¯Tδ‖L2(P)≤(CΔt)n2Δtn! and ‖maxk=0,…,KXˆtkn−X¯tkδ‖L2(P)≤(CΔt)n2(n+1)! respectively where C only depends on the Lipschitz constants and norms of and on T.
This estimate shows that when , the method converges exponentially in n and geometrically in Δt .
The estimate ((6)) indicates that a recursive use of parareal with each subinterval redivided into smaller intervals, the socalled multilevels parareal, is better than many iterations at the second level only. Indeed, as the error decreases proportionally to at each level and as Δt becomes at the next grid level, the error after L levels is decreased by .
(a ) Let
V˜tkΔ,n=Pesssupτ∈TtkFE[e−r(τ−tk)f(τ,Xˆτn)Ftk], V¯tkΔ,δ=Pesssupτ∈TtkFE[e−r(τ−tk)f(τ,X¯τδ)Ftk] where denotes the set of valued stopping times. Then, for some constant C, ‖maxk=0,…,KV˜tkΔ,n−V¯tkΔ,δ‖L2(P)≤[f]Lip(CΔt)n2(n+1)!. (Note that is but the coarse Snell envelope of the refined Euler scheme.) At a fixed time , we have the better estimate (8)‖V˜tkΔ,n−V¯tkΔ,δ‖2≤[f]Lip(CΔt)n+12(K+1n+1)−(kn+1). (b ) Let denote the “fine” Snell envelope of the refined Euler scheme at times defined by V˜tkδ=Pesssupτ∈TtkFE[e−r(τ−tk)f(τ,X¯τδ)Ftk]. Then, for some constant , ‖V˜tk,0δ,n−V¯tkδ‖L2(P)≤CΔt.
A result similar to (a ) can be obtained for , i.e. when is replaced by in the expectation defining at the cost of losing a in the error estimate. Both quantities and do not coincide as is not Markovian (it also depends on ).
The payoff is with , , , . We have chosen as in Longstaff and Schwartz's work [[19]]. The interpolation used in the LSMC is on the basis , i.e. . The American payoff is then 4.478 at an early exercise .
 
 Convergence of the parareal algorithm 
Here the underlying asset is given by the Black–Scholes SDE, , . In the test, . We have chosen a fine grid with . The free parameters are Δt , which governs the number of points on the coarse grid and n the number of parareal algorithms. The error between the American payoff computed on the fine grid by LSMC and the same computed by the parareal algorithm is displayed on Table 1 for both 3.2bis.
The same information about convergence is now displayed in the two graphs in Fig. 1 for the errors versus Δt and the errors versus n . We were not able to decrease Δt to smaller values because the computing time becomes too large.


 Fig. 1. Black–Scholes case: errors on the payoff versus Δt on the left for several values of n and versus n on the right for several values of Δt . Both graphs are for 3.2 in log–log scales and indicate a general behavior of the error ϵ not incompatible with (Theorem 3.1). Zoom 

The constant elasticity case 
The volatility is now a function of price [[10]]: . All parameters have the same values as above. The results are shown in Fig. 2.


 Fig. 2. Constant elasticity case: same legend as in Fig. 1. Zoom 
 
 Multilevel parareal algorithm 
The previous construction being recursive, one can again apply the twolevel parareal algorithm to LSMC on each interval . The problem of finding the optimal strategy for parallelism and computing time is complex, because there are so many parameters; in what follows, the number of levels is ; furthermore, when an interval with points is divided into subintervals, each one is endowed with a partition using points as well. So, if the coarse grid has K intervals, the 4th grid has intervals. The results are compared with the reference value of Longstaff–Schwartz, 4.478, shown in Table 2 and in Fig. 3.


 Fig. 3. Comparison between a standard LSMC solution and the parareal solution for the same number of time intervals at the finest level. The 4 points have respectively 1,2,3,4 levels; the first data point has one level and 4 intervals, the second has 2 levels and 16 intervals, the third one 3 levels and 64 intervals, the fourth one 4 levels and 256 intervals. The total number of time steps is on the horizontal axis, in log scale, and the error at n =2 is on the vertical axis, in log scale as well. Zoom 
The number of parareal iterations is 4 and the error is displayed at each n . We have also carried out some tests with subpartitions using . Thus each level has its own number of points per interval, . Errors shown on Table 3 are also shown in Fig. 3 for . It seems to be for K small and for K bigger. The method was implemented in parallel; each interval is allocated to a processor, at each level in a roundrobin order. Almost perfect parallelism is obtained in our tests on a machine with 32 processors, as shown in Fig. 4 and Table 3.


 Fig. 4. Speedup versus the number of processors, i.e. the parareal CPU time on a parallel machine divided by the parareal CPU time on the same machine but running on one processor. There are two levels only; the parameters are K =1,2..,32, n =2 and J =100 so as to keep each processor fully busy. Zoom 