Stochastic modelling of infectious diseases for heterogeneous populations

Infectious diseases remain a major cause of morbidity and mortality worldwide, triggering immeasurable loss in many societies. Most people may still have a fresh memory of the H1N1 outbreak in 2009, which brought pictures of empty streets and people wearing face masks and collectively caused at least 12799 deaths according to the World Health Organization (WHO) report [1]. The H1N1 pandemic calls for research on accurately modelling the spread dynamics of an infectious disease, which offers a practically useful means for policy makers to evaluate the potential effects of intervention strategies [24].

Here, ??0 is the effective transmission rate and k?0 is the recovery rate. Because the SIR-based models are well presented in the literature, herein, we omit a verbose introduction of these models. Readers with an interest in such a topic can find the details in [57].

The SIR-based models and its variants have proven to be quite useful in the study of the spread dynamics of infectious diseases [810]. In [1113], the progression of disease spread is characterized by tracking the number of S
t
with a chain binomial model. The number of susceptible members S
t+?t
(?t represents the infectious period of the disease and is always chosen to be 1/k) at time t+?t is a binomial random variable that depends on S
t
and I
t
?, S
t+?t
?B
i
n(S
t
,1?I
t
?), which provides a recursive relationship between S
t+?t
and S
t
and produces a formal stochastic process. However, the power of these models is mainly limited to uniform and homogeneous populations or populations with infinite size and homogeneous interactions. In many cases, the actual spread of infectious diseases occurs in a diverse or dispersed population. To study the spread of infectious diseases in heterogeneous populations, people usually divide a population into subpopulations that differ from each other. Sub-populations can be determined on the basis of social, cultural, economic, demographic, and geographic factors. Next, besides the dynamics of the internal spread within a subpopulation, the transmission dynamics between subpopulations should also be considered in the study of epidemic spreading.

Network-based epidemic modelling represents a popular approach for heterogeneous populations in which the nodes in the network correspond to sub-populations, and the links indicate the neighboring relationships. Many network-based models have been proposed, including patch models [1416], distance-transmission models [17], and multi-group models [18, 19]. However, these models require knowledge of every individual (or host) and all relationships between individuals, which may be not achievable due to information privacy-related restrictions and the high cost of subject recruitment. To overcome the difficulties of collecting data, researchers have investigated several types of computer-generated networks in the context of disease spread in population-scale studies [2024]. Grassberger first studied the dynamics of infectious diseases that propagate on regular networks using the percolation theory [25]. Recent studies have revealed that many real-world networks, including social networks in which infectious diseases propagate, are either small-world [26] or scale-free [27] rather than regular or random, as thought previously [28]. Because the underlying structures of networks will influence the effect that the dynamics of epidemics will have on them, researchers, such as Pastor-Satorras and Vespignani, have made many contributions to critical value analysis of typical epidemics on different types of complex network [23, 24, 29]. On the basis of the mean-field theory, they found that compared with homogeneous networks, scale-free networks are fragile to the invasion of infectious diseases, computer viruses, or any other type of negative epidemics.

Epidemics have also been studied in various disciplines. Sociologists are concerned with the diffusion of rumors or innovation on social networks [30]; economists have studied viral marketing and recommendation strategies by considering both cascading dynamics and the network effects of vital nodes [31]; and computer scientists are interested in how some topics can quickly cascade in virtual blog spaces and how their propagation trends [32, 33].

Although network-based studies have contributed to the modelling of disease and/or information dynamics, some models make a strong assumption that the structures of underlying networks over which epidemics spread are known beforehand. In the real world, however, the structures of underlying diffusion networks are not known directly. Many others assume the availability of information about the interactions occurring between individuals [3437] that are often not valid in the context of disease spread. What may be obtained is only the time at which particular sub-populations become infected, but not how they become infected, nor how they affect their neighboring areas. Moreover, the underlying structures of networks will greatly influence the dynamics of infectious disease spread.

Since the emergence of the H1N1 influenza pandemic in April 2009, its underlying dynamics have been of great public health interest, and many approaches for its study have been proposed [14, 3841]. Most of them are based on the classic SIR model. For example, Birrell et al. [40] provided an age structure-based compartmental model with a Bayesian synthesis of multiple evidence sources to reveal substantial changes in contact patterns throughout the epidemic. Besides of the compartmental models, other mathematical models are also used to describe the transmission dynamics [3, 4247]. The chain binomial model was used to calculate the household secondary attack rates to measure the transmissibility of the 2009 H1N1 influenza pandemic by Lessler et al. [44] and Klick et al. [45]. Yang et al. [46] constructed a model based on chains of infections and used the infection hazard function and survival function to study the 2009 H1N1 influenza pandemic. Ferguson et al. [3] and Cauchemez et al. [42, 43] incorporated other factors, such as household risk, within-school risk, and community risk, in the study of infection spread and found out that younger age groups under 19 years old were more susceptible than older age groups. Jin et al. [47] formulated an epidemic model of influenza A based on networks and calculated the basic reproduction number and studied the effects of various immunization schemes. However, this work required that the individual contact pattern be provided. Nonetheless, none of the aforementioned approaches takes spatial heterogeneity into consideration in the study of disease spread.

Recently, an outbreak of Ebola virus disease (EVD) swept across parts of West Africa from March 2014 to April 2015. By June 10, 2015, WHO had reported 27,237 confirmed, probable, or suspected cases in three countries with 11,158 deaths [48]. This epidemic received extensive research attention on its dynamics of spread [4957] (for further references in the review article [58]). To name a few, Chowell et al. found that district-level Ebola virus disease outbreaks in West Africa follow polynomial-based growth in time instead of the exponential growth that describes the progress of many infectious disease epidemics [52]. Fisman et al. used a simple, two parameter mathematical model to characterize epidemic growth patterns in the 2014 Ebola outbreak [53]. Webb et al. proposed a variant of the classic SIR model with three extra groups, incubating, contaminated and isolated, which can provide a more accurate prediction for the future incidences [56]. Carroll et al. used a deep sequencing approach to gain insight into the evolution of the Ebola virus (EBOV) in Guinea from the ongoing West African outbreak. The viral sequence data can be combined with epidemiological information to retrospectively test the effectiveness of control measures, and provides an unprecedented window into the evolution of an ongoing outbreak of viral haemorrhagic fever [57].

To accurately predict when and where outbreaks will occur, a feasible means is to deploy manual or electronic surveillance systems through regional or national public health and medical organizations [59]. Most of the surveillance data accumulated from such systems contains temporal, spatial, clinical, and demographic information. For instance, Telehealth Ontario is a teletriage helpline that is available free to all Ontario residents, which allows those with suspected infections to connect with experts who can assess their symptoms. The records of such calls provide valuable information on which individual from where was possibly infected and by which type of disease at what time. In this paper, we address the problem of modelling disease spread dynamics in heterogeneous populations from temporal-spatial surveillance data. We analyse the role of heterogeneity in a stochastic epidemic model on a two-dimensional lattice. Within a particular sub-population, the speed of spread is controlled by a single parameter, the transmissibility of the pathogen between individuals. Between sub-populations, the transmissibility becomes a random variable drawn from a probability distribution. Our work differs from existing studies in some fundamental ways, in light of the unique nature of infectious disease diffusion dynamics. Our results have practical implications for the analysis of disease control strategies in realistic heterogeneous epidemic systems.