We couple a stochastic collocation method with an analytical expansion of the canonical epidemiological master equation to analyze the effects of both extrinsic and intrinsic noise. It is shown that depending on the distribution of the extrinsic noise, the master equation yields quantitatively different results compared to using the expectation of the distribution for the stochastic parameter. This difference is incident to the nonlinear terms in the master equation, and we show that the deviation away from the expectation of the extrinsic noise scales nonlinearly with the variance of the distribution. The method presented here converges linearly with respect to the number of particles in the system and exponentially with respect to the order of the polynomials used in the stochastic collocation calculation. This makes the method presented here more accurate than standard Monte Carlo methods, which suffer from slow, nonmonotonic convergence. In epidemiological terms, the results show that extrinsic fluctuations should be taken into account since they effect the speed of disease breakouts and that the gamma distribution should be used to model the basic reproductive number.
Stochastic processes are used to model complex physical phenomena that range from astronomy  to epidemiology . An important example is stochastic chemical kinetics, which describes the time evolution of chemically reacting systems by taking into account the fact that molecules are discrete entities that exhibit randomness in their dynamical behavior. A master equation [3,4] can be used to model this probabilistic process. The number of variables in this equation is large for all but the simplest systems, so analytical or direct numerical integration methods are usually impractical. Alternatively, Monte Carlo samples of the stochastic process can be numerically generated via stochastic simulation algorithms (SSAs) .
Two different lines of work have provided methods for overcoming the difficulties of the explicit solution of the master equation without resorting to Monte Carlo simulations, namely,(1) truncating a power-series expansion of the master equation, methods of which include the Kramers-Motyal expansion [6,7], expansion [3,8,9], Wentzel-Kramers-Brillouin (WKB) approximation , moment closure methods , and distribution approximations , and (2) truncating the number of states in the master equation [13–15]. Here we have utilized the expansion so that we can obtain an analytical expression for the master equation and therefore analyze the influence of extrinsic noise in model.
The effect of extrinsic noise in biological models has recently attracted interest in cellular [16,17] and population  scales since the parameters of these models represent the environment of the underlying system. For example, in cellular systems, the rate parameters may represent the binding rate of two proteins in the cytoplasm. At the population level, the intrinsic and extrinsic noise was recently studied in influenza . Uncertainty quantification is the process of deterministically computing the effect of input uncertainty in equations of interest. The stochastic collocation method [20,21] consists of an expansion that is related to polynomial chaos . These methods have the desirable property of exponential convergence rates and therefore require comparatively few evaluations of the model equations.
There has not yet been a systematic study of the effect of both extrinsic and intrinsic fluctuations in the susceptible-infectious-recovered (SIR) master equation. In a previous publication  we showed that high-order approximations of the intrinsic noise in the master equation yield quantitatively different solutions. Here we examine the role of extrinsic noise on the master equation. We couple a high-order expansion of the master equation with a stochastic collocation method and analyze various sources of uncertainty modeled with uniform, beta, and gamma distributions. We show that depending on the distribution of the extrinsic noise, the master equation yields quantitatively different results compared to using the expectation of the distribution for the parameter. The use of a gamma distribution is based on an empirical study on the basic reproductive number in severe acute respiratory syndrome (SARS) . In epidemiological terms, the results show that extrinsic fluctuations should be taken into account since they effect the speed of disease breakouts and that the gamma distribution should be used to model the basic reproductive number.
The analysis presented here is notably not based on Monte Carlo simulations and therefore does not suffer from a slow, nonmonotonic convergence rate. Rather, the method here converges linearly with respect to the number of molecules and exponentially with respect to the order of the polynomials used in the stochastic collocation calculation. We note that there are restrictions on the applicability of the method as well as assumptions in the underlying model. Restrictions on the applicability of the method are (1) the stochastic process must be a so-called random variable, meaning that it must have a finite second moment  and (2) the error term incident to the expansion of the master equation is inversely proportional to the system size and therefore may be inaccurate for small systems. For example, a process with extrinsic noise from a Cauchy distribution would violate the assumption. We also make assumptions on the validity of the underlying model. Since the underlying equations rely on a compartmental model for the species, we assume that the system is well mixed such that mass action kinetics are valid. Moreover, we assume that the system is memoryless and subject to exponential waiting times between reaction events.
In Sec. II we discuss how to compute the average population of a stochastic process over time. In Sec. III we show the results using various distributions for the extrinsic noise. Additionally, we compare the results of the method presented here to a classical Monte Carlo simulation. In Sec. IV we conclude by highlighting the importance of various sources of noise in epidemiological models and the effect they can have on simulations.
The objective of the method presented here is to formulate an efficient way of calculating the expected population of a stochastic process over time. Here we assume that the stochastic process has intrinsic noise owing to the discrete nature of the system and has extrinsic noise that is governed by a known distribution. We also assume that the distribution of the extrinsic noise has a finite variance. In this section we will formulate a way to compute the average of the intrinsic noise by approximating a master equation. In the next section we will describe how to compute the expectation of the extrinsic noise without resorting to Monte Carlo sampling. We will consider an elementary nonlinear system, the canonical susceptible-infectious-recovered (SIR) model , which is the foundation of more detailed models that include age-dependent and spatially dependent processes, namely,
We follow van Kampen [3,8,9] and define the following ansatz: , where is an unknown random variable, and we subsequently define . The discrete shift operators are expanded by means of a power series [3,8,9]:
The objective of the stochastic collocation method in this study is to evaluate the integral
Here we analyze the role of a stochastic reproductive number and its effect on the canonical SIR model. The distribution that the reproductive number follows will be the uniform, beta, or gamma distribution. The uniform and beta distributions were chosen since these distributions are often used when limited or no information about the parameters is available. We also use a gamma distribution that is based on experimental results of the basic reproductive number in SARS . We compare the results of a model with both extrinsic and intrinsic noise to both a model with only intrinsic noise and a model without any noise included. We show that the effect of the extrinsic noise can be just as large as the intrinsic noise and quantify this difference by computing a mean-squared deviation.
We first analyze the effect of a uniform distribution for the random parameter , i.e., , where
We next analyzed the beta distribution for the stochastic parameter:
Experimental studies of diseases has shown that the reproductive number can be modeled effectively with a negative binomial distribution . Here we have used the continuous analog of the negative binomial distribution, namely, the gamma distribution:
In this section we compare the results obtained in Sec. III C with a Monte Carlo simulation. We use the stochastic simulation algorithm  to simulate a stochastic trajectory over time. Let denote the total number of Monte Carlo samples. A random variate for the extrinsic noise is drawn for each Monte Carlo sample: for . The propensities (unscaled probabilities for each reaction) are defined as follows: and for Eqs. (1) and (2). At each time step in the stochastic simulation algorithm, two random variables govern the evolution of the system, namely, a reaction index and a time step. The reaction index is sampled from a pointwise distribution for the propensities, i.e., , and a time step is sampled from an exponential distribution with a mean of . The system is updated by executing the reaction with index and incrementing the system time by .
Figure 4 shows the fraction of infectious individuals for 100 trajectories of a stochastic simulation along with the mean over time using a total of samples. We used a gamma distribution for defined in Eq. (20) with and . We note that the fluctuations are due to the noise arising both from sampling a distribution for the parameter and from the stochastic simulation algorithm itself. The mean of the processes over time is well captured by the expansion coupled with a stochastic collocation method. Both the approximate method derived here and the Monte Carlo simulation show accelerated disease progression compared with the continuum equations.
We performed an analytical expansion of the epidemiological master equation, which was then coupled to a stochastic collocation method to analyze the extrinsic noise of a random reproductive number. To analyze the role of extrinsic noise in a susceptible-infectious-recovered model, we used the uniform, beta, and gamma distributions. While the difference was small in the case of a uniform or beta distribution, the gamma distribution caused the infection to peak earlier in time and therefore caused the infectious individuals to go zero earlier than the continuum model and stochastic models without noise. This is of importance to the public health community since a faster disease progression observed in empirical data may lead to an erroneous estimation of the reproductive number in a continuum model without extrinsic noise. We showed that the deviation away from the expectation of the extrinsic noise scaled nonlinearly with the variance of the input distribution. In epidemiological terms, the results imply that both intrinsic and extrinsic fluctuations should be taken into account since they may affect the speed of disease breakouts.
The numerical methods presented here converge linearly with respect to the number of molecules in the system and exponentially with respect to the order of the polynomials used in the stochastic collocation calculation. This is in opposition to the standard Monte Carlo methods, which suffer from a slow, nonmonotonic convergence rate. Future work may involve the analysis of multiple sources of extrinsic noise. Efficient numerical methods, such as the Smolyak sparse-grid construction [27,28], could be used for a stochastic collocation method.
The author thanks Bill and Melinda Gates for their active support of this work and their sponsorship through the Global Good Fund.