Transcriptomic signatures differentiate survival from fatal outcomes in humans infected with Ebola virus

Understanding the pathogenesis of rare outbreak diseases such as EBOV is both difficult and important. To date, the overwhelming majority of cases of this disease have occurred in situations where high-quality healthcare and monitoring are challenging. This has made it difficult to understand in detail many of the basic aspects of disease development and pathogenesis; information on the disease has been gathered through animal models [16, 17].

Here we show that transcriptomic sequencing of blood samples taken from humans during the 2013-2016 West African outbreak and leftover from diagnostic sequencing can be an important means of acquiring multiple levels of information about the host response to virus infection, ranging from understanding how immune cell populations in the blood change over time during infection to helping define potential host biomarkers of infection. This strongly argues that integrating transcriptomic analysis of host responses during outbreaks can provide important insights into disease pathogenesis that affect clinical management. The strong ISG response observed in the large cohort of EVD patients is notable. IFN-like responses have been reported to be associated with moderate disease in a diverse set of US EBOV patients [23] and have been suggested to be protective for EBOV infection [24]. The existence of a strong innate immune response in acute cases in Guinean patients, given their differing clinical outcome and treatment, suggests that this response may not be offering significant protection to promote survival.

A particularly interesting finding from this study was the effectiveness of using a recovered control group as a comparison population to investigate differential gene expression. Ideally, a control group from Guinea that was never infected with EBOV would have been optimal; however, at the time (and still persisting), stigma associated with EBOV and ETCs rendered such a group challenging to identify and ethically problematic to take blood samples. Using recovered control group transcriptomes had the potential to identify some responses that began during acute Ebola infection and had not been resolved when samples were taken from the control group. However, comparison of the peripheral blood transcriptomes from this group to historical datasets from a completely unrelated healthy control group separated both geographically and temporally from the West African outbreak [18] indicated no significant differences in the transcriptome (Fig. 4). This suggested that at least when the peripheral blood samples were taken the control group had recovered from infection. We find that immune-cell populations predicted by our comparison analysis (Fig. 5) and biomarkers identified through comparison were both verified in independent datasets (e.g. Fig. 6 and Additional file 13). It is unlikely that the Ebola outbreak will be the only one in which the ‘ideal’ control population is difficult to sample and our results suggest that convalescent patients may serve as an important alternate control. By necessity, datasets from samples that had poor-quality reads were excluded from the analysis presented in this work. This may have biased the identification and measurement of differentially expressed genes as well as the downstream analysis of gene classifiers that correlated outcome. However, the underlying biology of EVD identified in this study through the transcriptomic approach correlated with both data from non-human primate studies and clinical information from patients.

The increased accumulation of mRNAs for genes involved in the clotting cascade found during acute infection with EBOV is consistent with earlier findings that fibrin deposition was closely associated with EBOV infection [25, 26]. The increased abundance of multiple FG gene isoforms as well as the increased abundance of albumin mRNA is initially perplexing, as these genes are considered liver-specific mRNAs and not mRNAs that are found in blood. We favour the hypothesis that the accumulation of these genes is an indication of significant liver damage leading to leakage of hepatic mRNAs into the blood [27]. It is important to note that during this outbreak, overt haemorrhaging was rarely observed [28], but strong increases in the abundance of these genes was seen in both the acute-survivor and acute-fatal patient groups, suggesting that significant liver damage was present. As a measure of aberrant liver function, in patients with EVD, aspartate transaminase (AST) values were found to be higher than alanine transaminase (ALT) [29]. This finding is consistent with observations of liver damage in repatriated patients from Liberia treated in the United States [30].

An important finding of this study is the similarity of the transcriptomic data to protein expression data analysing cytokine expression observed in this [9] and earlier outbreaks [8]. These data emphasise the similarity of cytokine abundance information collected through this approach to that seen in previous EBOV outbreaks and also to NHP models of EVD in humans (Additional file 2). The robust IFN response seen in human infection was somewhat surprising based on earlier reports. However, this was consistent with data from NHP models of lethal infection with this virus [31] that also show very strong increases in IFN-responsive genes in circulating immune cells and suggests that more robust IFN signalling may decrease an individual’s ability to survive EBOV infection. Certainly, our study on patients treated in a low-income setting and results from patients treated in a high-income setting [6] challenges the assumption that humans do not mount a robust immune response.

The changes in the immune response identified in this study in the acute phase of EVD between acute-survivors and acute-fatal patients were potentially caused by the differential activation of gene transcription and also potential infiltration/exfiltration of different cell types in the blood. Our prediction that monocyte cell populations were higher in the acute-survivors than the acute-fatal patients was validated on an independent group of patients using a cell-based approach (Fig. 5d)—completely different to that of RNA-seq (Fig. 5a and Additional file 6). We also predicted that NK cell populations were higher in acute-survivors compared to acute-fatal patients during the acute phase. NK cells have been previously suggested to be important innate immune cells in fighting EBOV infection [32], so the increased abundance of these cells perhaps providing the crucial survival advantage to the acute-survivors group.

Independent machine learning approaches identified different panels of genes whose abundance could accurately predict outcome over a range of Ct values. Using host gene profiles to predict outcome also worked for those Ct values (between 20 and 22) where the outcome was not clear in the data from the European Mobile Laboratory, i.e. where the case fatality rate was approximately 50%. Other studies have also shown that Ct values can be used to predict outcome. For example, a study of EVD patients in Sierra Leone showed that patients with a Ct???24 had an 87% chance of survival, whereas patients with a Ct value??24 had a 22% chance of survival [2]. Interestingly, the average patient Ct value in samples processed by the European Mobile Laboratory was 21.4, implying that for the average patient their outcome could not have been predicted based on Ct value alone. We show that the identified gene classifiers were valid over a wide range of viral loads which clearly indicated that the host response was unrelated to the amount of viremia. The classifiers were tested on an independent group of 20 patients. This provided strong preliminary information that the model was not highly over-fitted, though additional samples would increase confidence.

Assessing viral load together with an evaluation of the host response at the time of diagnostic sampling may give an accurate indication of the survival chance for the patient, across a broad range of viral loads. Management of patients in the developed world resulted in far better survival rates than the management of patients in West Africa, due to extensive palliative intervention; therefore, the acute and fatal outcomes in this study may be more reflective of the situation in untreated patients. The triage of large numbers of patients under very resource-poor situations may promote survival rates by focusing efforts on those most in need. Predictive models based on clinical data have also been proposed to determine the outcome for patients with EVD [33].

The ability to triage patients by disease severity and likely outcome can be of practical benefit for patient care. In any outbreak setting, it is inefficient to have the resources for the most intensive care scattered about the Ebola Treatment Unit (ETU). Developing tests that allow the stratification of risk would allow an ETU to centralise intensive care resources for maximum efficiency and efficacy.

Our findings also have implications for the design of clinical trials. Recent studies used Ct value as a proxy for the probability of survival (e.g. the Favipiravir trial), but as we demonstrate here, Ct-based prediction is not perfect. As our results provide improved prediction of outcome, the potential to show an effect of a therapeutic through a clinical trial is improved. During the last outbreak, there was much public disagreement over the ethics of randomising patients with EVD to control groups in clinical trials [3436]. In our opinion (and that of others), a reasonable compromise position would be to exclude from randomisation those patients with the lowest probability of survival and provide them with the study drug. The results of our work would allow for a more accurate identification of those EVD patients with a low probability of survival rather than simply rely on Ct values at admission.

The mRNAs associated with the correct prediction of outcome include the transcription factor eomesodermin (eomes), an important characteristic of CD8 T cell memory transition [37], and is consistent with our prediction of increased CD8+ memory cells in survivors (Fig. 5). Consistent across the three gene sets identified by SVM, RF and the PGP profile-based classifier were TGFB1, VACM1 and HOPX. TGFBI is an extracellular matrix protein that inhibits cell adhesion and is seen to be downregulated in both fatal patients and survivors. VCAM1 is a gene important for lymphocyte extravasation to sites of infection. This gene has previously been shown to be upregulated in response to EBOV infection [38], consistent with our prediction of increased CD8+ memory cells in survivors (Fig. 5). The decreased abundance of TGFB1 and the increased abundance of VCAM1 is suggestive of an increase in cell adhesion with an increased instance of leukocyte cell adhesion to the endothelial layer and movement out of the blood into tissues.