Human metapneumovirus epidemiological and evolutionary patterns in Coastal Kenya, 2007-11

Study population

Between January 2007 and December 2011, there were 16,439 admissions to KCH aged between
1 day and 59 months, of which 32.1 % (5284) were eligible for study as cases with
syndromic severe or very severe pneumonia (Table 1). Overall, 62.8 % of these children were tested for HMPV, ranging by year between
43 and 83 % due to changes in proportion of non-residents of KHDSS included in the
different samples (15.4 % in years 2007–09, versus 47.3 % in years 2010–11).

Table 1. Study population at Kilifi County Hospital, number tested and number of HMPV positive
samples recorded

HMPV prevalence in child admissions

HMPV was detected in 160 (4.8 %) of the 3320 samples tested. Prevalence by year ranged
from 2.9 % in 2007 to 8.8 % in 2009 (Table 1). Almost half of HMPV positive samples were identified in the years 2008 and 2009.
Children under 6 months of age accounted for 44 % of cases (71/160) while 74 % (118/160)
of cases were in children under 1 year old with only 1.3 % (2/160) HMPV positive for
children 36 months (Table 2). Of the 160 HMPV positive cases, 83.8 % (134/160) and 16.2 % (26/160) presented
with symptoms classified as either severe or very severe pneumonia, respectively (Table 2).

Table 2. HMPV positives stratified by age group of patients and pneumonia status in Kilifi
County Hospital

Temporal occurrence and circulation patterns of HMPV

HMPV occurrence showed a seasonal pattern with the majority of cases being detected
in the period from October of 1 year through to April of the next (Fig. 1a). The seasonal increase in cases tended to coincide with lower rainfall, higher temperature
and lower relative humidity (Fig. 1b). For subsequent analysis we assume August as the end month of one season and September
the start of the next (hence the colour scheme in Fig. 1a). However, there was no clear-cut demarcation between the end of one seasonal epidemic
and the next as sporadic HMPV cases were detected in seasonal troughs, except for
the inter-epidemic period between the end of the 2009-10 and rise of the 2010-11 seasons,
where no cases were observed over a 6 month interval (Fig. 1a).

thumbnailFig. 1. a Temporal distribution of HMPV positive samples in Kilifi over five years, showing
number of positive samples each month on the primary axis and number of samples tested
monthly on the secondary axis. Different colours indicate the different epidemics the samples were assigned; b Monthly weather patterns in Kilifi, Kenya in the period 2007–2011

Genetic diversity of HMPV samples from KCH

PCR amplification of the F gene was more successful than for the G gene, with 130
and 98 positive PCRs for F and G gene, respectively. A total of 123 samples from the
160 HMPV positives were successfully sequenced for both or either G or F gene only
and genotyped (Table 1). There was no statistically significant difference (P?=?0.613) in Ct values between sequenced samples and those that failed to be sequenced
(numbering 37).

Among the 123 samples successfully sequenced for the F protein over a 345 nucleotide
length region, 49 of these were unique. Overall mean nucleotide diversity for this
subset was 0.106. In the phylogenetic analysis we combined the Kilifi unique F sequences
with all others deposited in Genbank that were contemporaneous and overlapping in
the sequence F portion. Both A and B HMPV groups, specifically A2, B1 and B2 were
observed in Kilifi (Fig. 2). Subgroup A1 was not observed in Kilifi (Fig. 2). Within the subgroups, virus sequences from the same epidemic did not necessarily
group together into marked clusters instead they were interspersed on the phylogenetic
tree with the international sequences (Fig. 2). Majority of Kilifi sequences in the A2 subgroup occurred within three distinguishable
clusters and when compared to global sequences, clustered closely with sequences from
Canada and Nairobi Kenya and were highly similar in each of the subgroups in which
they fell (Fig. 2). A ML phylogeny of the HMPV F sequences from Kilifi alone, color coded by epidemic
is given in Additional file 2: Figure S1A. Notably, phylogenetic clusters formed within the different subgroups
on this tree had sequences from multiple epidemic periods i.e. no clear temporal clustering.

thumbnailFig. 2. Phylogenetic relatedness and temporal divergence of the combined Kilifi and contemporaneous
global F protein sequences over the 345-nucleotide portion analyzed. Taxa of Kilifi
viruses are coloured red. Node bars indicate the 95 % HPD height interval of the nodes; the node makers size are scaled
by posterior support, for Kilifi, coastal Kenya 2007–11

Of the 98 PCR positives for the G protein, 56 samples sequenced successfully over
the 606 nucleotides of the HMPV G coding region. This represented coverage of 88.2 %
of the entire G coding sequence. All the 56 sequences were determined to be of genotype
A2 within group A (figure not shown) with 53 providing unique sequences over the sequenced
region. This unique subset showed an overall mean genetic diversity of 0.079. An ML
phylogeny of the HMPV G sequences from Kilifi alone, color coded by epidemic is given
in Additional file 2: Figure S1B.

The phylogenetic resolution was far greater with G sequences compared to F sequences
(Additional file 2: Figure S1A), showing higher bootstrap support values and longer branch lengths.
Viruses deemed identical in the F portion we sequenced possessed multiple nucleotide
differences in the G portion (Additional file 2: Figure S1B). However, similar to what was observed in the F-based phylogeny, Kilifi
sequences did not cluster strictly according to epidemic, but rather sequences from
multiple epidemic periods frequently occurred within the phylogenetic clusters but
these tended to be those deriving from successive epidemics (Additional file 2: Figure S1B).

Comparison of Kilifi G gene sequences with global sequences showed that Kilifi sequences
clustered closely with some sequences from Canada, Peru, China and India. However,
there were clusters of sequences from Peru, Canada, India, Greece, Uruguay and Rwanda
for which close relatives were absent in Kilifi (Fig. 3a). The Kilifi G sequences diverged into three major clusters (cluster 1, 2, 3 in Fig 3; Additional file 2: Figure S1B) and one minor cluster (4 in Fig. 3). Each cluster consisted of sequences from viruses from more than one epidemic; cluster
1 of epidemic 2010-11; cluster 2 of epidemics 2008-09, 2009-10 and 2011-12 and cluster
3 of epidemics 2007-08, 2008-09, 2009-10 and 2010-11. Within each cluster, sequences
from the same epidemic grouped together. While cluster 1 was distinctly removed from
the other clusters (Fig. 3) and majority of global sequences, it was closely related to sequences mainly from
Asia specifically China and India. The major cluster of Kilifi sequences (cluster
2) consisting of 22 sequences was most closely related to one sequence from India.
Sequences in cluster 3 were closely related to sequences mainly from Canada and a
few sequences from India (Fig. 3). There was a unique branch of sequences mainly from Peru and one from China into
which none of the Kilifi sequences fitted.

thumbnailFig. 3. Phylogenetic and temporal placement of Kilifi group A G protein sequenced viruses,
for Kilifi, coastal Kenya 2007–2011. Panel a A total of 209 viruses compared in G sequences G (53 from Kilifi and 156 collated
from GenBank from 7 countries). Branches leading to Kilifi viruses are coloured red. Three letter codes of countries comprising branches without Kilifi representative
sequences are indicated next to the vertical line. Panel b 121 viruses that fell within the ancestral node leading to Kilifi viruses were reanalyzed
in BEAST. Again branches and leaves of Kilifi viruses are colored red on the phylogenetic temporally calibrated tree. Node bars indicate the 95 % HPD height interval of the nodes; the node maker sizes are scaled
by posterior support. The number 1, 2, 3 and 4 represent the three major and one minor cluster of sequences from Kilifi

A temporal analysis of genotype occurrence and circulation in Kilifi showed that the
majority (91/123) of circulating isolates were A2 and this type was dominant and circulating
in each of the five epidemics (Additional file 3: Figure S2) while B1 (3.3 %, 4/123) and B2 (22.8 %, 28/123) occurred less frequently
(Table 1; Additional file 3: Figure S2). Whereas A2 and B2 were recorded in every epidemic and two subgroups
circulated concurrently in each epidemic, B1 was only present in epidemic 2007–2008
(Additional file 3: Figure S2).

Subgroup prevalence patterns in Kilifi versus global

We compared the subgroup prevalence in the 123 F sequences from Kilifi with 290 global
sequences we collated from GenBank to show genotype distribution by year. The global
dataset was drawn from seven countries: Japan, Peru, Rwanda, Egypt, Thailand, India
and Canada. The patterns in Kilifi appeared considerably distinct from the overall
global patterns (Fig. 4). Only the year 2010 in Kilifi mirrored genotypes trends that were observed globally,
with subgroup A2 dominating (Fig. 4b).

thumbnailFig. 4. Pie charts showing the genotype distribution by year derived from F sequence analysis
of samples from Kilifi, Coastal Kenya 2007-11. Panel a This is based on the 290 F sequences collated from GenBank. Panel b This is based on 123 F sequences generated from samples that were collected in this
study at the KCH between 2007 and 2011. The numbers inside the pies indicate the genotype proportions per the respective year

Evolutionary analysis

We estimated the overall evolutionary rate for the F region analysed from the combined
Kilifi-global sequence dataset. It was determined as 1.96?×?10
?3
substitutions/site/year (95 % HPD Interval: 1.37?×?10
?3
, 2.57?×?10
?3
). This is including all group A and B strains. Divergence dates of the groups A-B,
subgroups A1-A2, and subgroup B1-B2 from these F data were estimated as, 1944.16 [95 %
HPD interval 1893.4, 1979.0], 1994.0 [95 % HPD interval 1986.3, 1998.9] and 1988.9
[95 % HPD interval 1972.8, 1997.8], respectively (Fig. 2). A similar analysis determined the evolutionary rate in the G region we sequenced
for the A2 genotype to be 5.915?×?10
?3
substitutions/site/year (95 % HPD Interval: 4.147?×?10
?3
, 7.887?×?10
?3
).

Analysis of protein changes in the F and G genes

The HMPV G protein is on average 236 amino acids long. For the Kilifi genotype A2
G protein sequences were predicted to encode 3 different protein lengths: 213, 217
or 228 due to usage of alternative stop codons. Our sequencing of the subgroup A2
was from amino acid 28 to end. We observed changes in these sequences leading to gains
or loss of N-glycosylation sites. A total of five N-glycosylation sites at positions
30, 52, 145, 152 and 180 were identified on the sequenced G protein. One-hundred-three
of the 228 codon positions were polymorphic and up to 5 variants were identified based
on sharing a combination of ?5 signature amino acid residues. There were six sites
where amino acid changes led to gain and another eight different sites where changes
led to loss of N-glycosylation (Additional file 4: Figure S3). The frequency of gain and loss of N-glycosylation overall was 36 and
56, respectively. Position 180 had one of the most frequent losses in N-glycosylation
that occurred in 39 sequences. Overall, from the amino acid changes observed, the
pattern of changes clearly demarcated the sequence set into five clusters (Additional
file 4: Figure S3).

The HMPV F protein is on average 539 amino acids long. Our F sequencing encompassed
105 codon positions, representing 19.5 % of the entire F protein sequence. Of those
that we sequenced, 15 % (17/115) showed amino acid changes, confirming its high degree
of conservation. There was no N-glycosylation site observed in the sequenced region
of the F protein (Additional file 5: Figure S4).