Phylodynamic analysis of the canine distemper virus hemagglutinin gene


Clinical and histological findings

A total of 12 isolates were collected from the rhinorrhea of puppies vaccinated during
the 2005–2007 CD outbreaks in Taiwan. Mild-to-severe upper respiratory disease and
neurological symptoms were the most common clinical manifestations and were noted
in six of the 12 dogs. Common gross lesions included mild-to-severe conjunctivitis,
rhinitis, and pneumonia. Three dogs which were dead on arrival showed demyelinative
encephalitis in the white matter of the cerebellum, and further blood tests revealed
lymphoid depletion.

Detection of recombination of the H gene

The 1824-nt CDV H gene sequences (encoding 607 amino acids) for the 188 examined strains were aligned
to analyze sequence variation. Three recombination events with permutation rates??70 %
were identified by both RDP and Simplot programs (Fig. 1). Event 1 represents cross-species recombination. Three strains showed the same recombination
pattern: AF178039_97_CN_LesserPanda, AF178038_97_CN_GiantPanda, and KM386683_13_CN_DL1_SiberianTiger.
The major parent was AB687720_08_JP_CYN07dV_MacacaFascicularis (Asia-1), and the minor
parents included all strains of RL genotype. However, the breakpoint in strain AF178039
differed from those of AF178038 and KM386683. Furthermore, the strain AF178039 was
also identified as a minor parent of the latter two strains. Event 2 represents intra-genotypic
recombination in the America-1 genotype. The AM903376_06_ID_India_Dog strain was identified
as a recombination of the major parent strain AY548109_98_US_2655_Raccoon with the
minor parent strains DQ903854_51_US_Lederle_Dog, AF378705_30_ZA_Onderstepoort_fox,
and Z35493_40_US_Convac_Dog. The KJ123771_04_US_171391513_Dog strain was also identified
as a potential recombinant of the major parent strain EU716337_04_US_164071_Dog and
unknown minor parent strains. All of these recombination events showed highly discordant
topologies between two neighboring segments flanking the breakpoint. After removing
all of the potential recombinant strains and MLV sequences, the RDP program detected
no recombination in this filtered 176-sequences dataset. Notably, the MLV strains
in the America-1 and RL genotypes were apparently associated with recombination events
1 and 2, respectively.

Fig. 1. Recombination map of the canine distemper virus (CDV) H gene in strains of the 188-sequence data set. For each recombination event, the left
side of the figure shows (a) the BOOTSCAN plot, and the second panel shows (b) the similarity plot. The y-axis shows (a) the percentage of permutation trees and (b) the pairwise identity of each pair in the sequence. The x-axis is the alignment
position. The analysis was performed with a sliding window of 200 nt with a 20-nt
step. The comparison was performed using 50 % consensus sequences with 1000 bootstrap
replicates. Each curve in the figure compares the query and reference sequences. Both
plots were generated with the Simplot program. Potential recombination breakpoints
are identified where sudden alterations in bootstrap values or in similarity values
occurred or where crossover occurred. Potential recombination breakpoints are identified
where sudden alterations in bootstrap or similarity values occurred or where a crossover
occurred. The third panel (c) shows a schematic diagram of potential recombinant regions and breakpoint locations
identified by Recombination Detection Program v3.44 (RDP). Phylogenetic trees were
compared by neighbor sequence fragment alignments flanking the breakpoint. (d). The potential recombinant strains, major parent strains, minor parent strains,
and non-parent strains are indicated in red, blue, green, and purple, respectively

Phylodynamics analysis

Phylogenetic relationships (among the 188- and 176-sequences dataset) were inferred
by the NJ and BMCMC methods based on the full-length H gene sequences (Figs. 2 and 3). Phylogenetic trees constructed by the NJ and BMCMC methods showed similar genotypic
groupings in both datasets. The CDVs grouped into a monophyletic cluster. All 14 distinct
genotypes, which included America-1 and ?2, Asia-1 to ?4, Africa-1 and ?2, Arctic,
EU-1/SA-1, EW, RL, SA-2 and SA-3), and were depicted in this analysis (Figs. 2 and 3). In both datasets, almost all genotypes were supported by bootstrap (BS) and PP
values of 70 and 0.9, respectively. The two exceptions were for EW in both datasets
of the NJ trees and for America-1 in the 188- sequences dataset of the BMCMC tree.
The analysis revealed that each genotype has continued to evolve, and isolates have
been continuing identified and reported as recently as the past decade, including
Italy strains in the Arctic genotype identified in 2012–2013 and Denmark strains in
the EU-1/SA-1 genotype identified in 2011–2013. The Asia-3 genotype was identified
as an extension of Asia-2 with high support values. Notably, the structure of the
clusters tended to be more influenced by geography rather than by host range. All
Taiwan strains, including the 12 isolates analyzed in this study, clustered into two
subclusters in the Asia-1 genotype with high support values. This indicated that two
clades of CDV strains had co-circulated in Taiwan: one major clade (grouped with Japan
strains) and one minor clade (clustered with China strains).

Fig. 2. Maximum clade credibility tree based on 188 canine distemper virus H gene sequences. For each branch, the colour and thickness indicate the most probable
location and most probable state, respectively. For key nodes, the size of the node
cycle represents the posterior probability (PP), and support values are labeled above
the nodes as either bootstrap (BS) or PP according to BS-NJ/PP-BEAST. Numbers in parentheses
are the PP values of each location. The scale bar at the bottom shows the time in
years. Strains are designated by accession number-year isolated (last two digits)-country
abbreviation-strain name-host and are colored according to the host animal species.
For each genotype, the geographic distribution and duration of isolation are also
shown on the right. The modified live virus vaccine strains are shaded grey, potential
recombinant strains are shaded yellow, and strains isolated in this study are shaded
green. The tree was constructed using BEAST program

Fig. 3. Maximum clad credibility tree based on 176 canine distemper virus H gene sequences. For each branch, the colour and thickness indicate the most probable
location and probability of the state, respectively. For key nodes, the size of the
node cycle represents the posterior probability (PP), and support values are labeled
the nodes as either bootstrap (BS) or PP in BS-NJ/PP-BEAST. Numbers in parentheses
are the PP values of each location. The scale bar on the bottom shows the time in
years. The strain names are colored according to their host animal species. Genotypes
and nucleotide/amino acid sequence similarities (%) appear on the right. Strains isolated
in this study are shaded green

Interestingly, the MCC tree topology showed a mixture of long and short terminal branches
in each cluster (Figs. 2 and 3), whereas the NJ tree showed no difference in the lengths of terminal branches (Additional
file 2). Two distinct topologies were observed: MLV genotypes (America-1 and RL) showed
geographically unstructured long terminal branches, whereas others showed structured
short terminal branches and the branches originated from the same outbreak in each
cluster (Fig. 2). The topology for the non-MLV genotypes clearly revealed a ladder-like structure
with geographic clusters. Geographically co-circulating lineages are very common.
For example, Asia-1 and ?2 strains have co-circulated in China and Japan; meanwhile,
Arctic, EW, and other EU-1/SA-1 stains have co-circulated in several European countries.
Both genotypes associated with MLV contained MLV strains and wild-type isolates. In
the America-1 genotype, all wild-type isolates were clustered together with the Snyder
Hill strain. Strains have been identified in United States (1998, raccoon), India
(2006, dog), China (1992, fitches; 2006, mink; 2008 and 2009, dog) and Kazakhstan
(1989, mink; 2007, seal). In the RL genotype, one strains was isolated in China (lesser
panda, 1997) and one was isolated in the United States (2004, dog). The MLV-associated
genotypes did not have geographically distinguishable topologies and were located
in long terminal branches.

Although the relaxed clocked model was used to allow for a varying evolutionary rate
in each branch, this analysis was based on the isolation dates and locations of the
MLV prototype. A dataset (176-sequences) without sequences for MLVs and recombinant
strains was also analyzed (Fig. 3). The branch length differed, the topology was similar from the full (188-sequences)
(Figs. 2 and 3). However, the only one strain AY964114_04_US_25259_Dog of the RL genotype in the
176-sequences dataset was clustered with the Asia-4 genotype in the BMCMC tree (Fig. 3) and with the EW genotype in the NJ tree (Additional file 2). Support values were low in both trees. Analysis of the 176-sequences dataset revealed
four dispersal routes with support values of BF??3: (1) from the United States to
Tanzania (BF?=?3.277) in 1988 in the Africa-2 genotype; (2) from China to Taiwan (BF?=?8.514)
in 1998 in the Asia-1 genotype; (3) from Germany to Switzerland (BF?=?64.160) in 2001
in the EU-1/SA-1 genotype, and (4) from Germany to Australia (BF?=?14.866) in 2005
in the EW genotype. Additional file 3: Movie 1 shows an animation depicting the spatiotemporal spread of CDV.

Population dynamics

Figure 4a shows the saddle-like BSP obtained for the 176-sequence dataset. For the first sampled
strain (detected in 1978), the BSP revealed an effective population size (95 % HPD)
of 56.1 (25.1–277.9). The BSP showed the occurrence of a population bottleneck in
the early 1980s, although this may have been due to the limited number of sequences
available for ancestor strains in GenBank. The population then peaked and plateaued
(median population size range, 317.1–325.0) during 1985–1995, possibly due to advances
in nucleotide sequencing technology and the consequent increase in the number of sequences
reported worldwide. Since then, the viral population has consistently decreased (322.2–376.2)
as vaccination programs and international cooperation have improved. After 2005, the
population decrease was consistent (173.3–153.3), but the rate of decrease was relatively
slow. The estimated T
MRCA
was 1945 (1818–1966) and the evolutionary rate (95 % HPD) was 7.41?×?10
?4
(6.00?×?10
?4
–8.80?×?10
?4
) substitutions per site per year (s/s/y). In the BSP for the Taiwan strains, the
estimated initial effective median population size (95 % HPD) was estimated as 22.2
(4.0–227.4) in 2003. The population plateaued until 2005 and then consistently decreased
from 22.2 to 1.2 during 2005–2008 (Fig. 4b).

Fig. 4. Bayesian skyline plot (BSP) of (a) 176-sequence dataset and (b) Taiwan strains isolated during 2003–2008 The x-axis is the time scale in years,
and the y-axis is a logarithmic scale of Ne? (where Ne is the effective population size and ? is the generation time). The thick solid line indicates the median estimates, and
the shaded area indicates the 95 % HPD

Detection of variation and selection of the H gene

The 1824-nt CDV H gene sequences (encoding 607 amino acids) for the 176 strains examined were aligned
to analyze sequence variation and selection patterns. Pairwise comparisons showed
that nucleotide and amino acid similarities for CDV strains were 87.2–100 % and 82.5–100 %,
respectively. The amino acid sequence similarity was lower than 95 % between the EU-2,
America-2, EU-1/SA-1, and Asia-1 genotypes (Fig. 3). The MEME method revealed 18 sites with evidence of episodic diversifying selection
(P??0.05): sites 15, 82, 150, 171, 175, 225, 254, 266, 275, 291, 310, 321, 368, 369,
372, 412, 537, and 549 (Additional file 4). Of these, positions 150 and 310 were in the second nucleotide of the potential
glycosylation site, and positions 275 and 412 were proline residues in the OP strain.
Ten of the these sites were located near the beginning (225 in B1S2, 266 in B1S3,
and 291 in B2S2’) or near the end (254 in B1S3, 275 in B1S4, 321 in B2S3, 368 in B3S2,
412 in B3S3, 537 in B5S3, and 549 in B5S4) of the ?-strand. Comparisons of deduced
H protein amino acid sequences (residues 1–607) in different strains revealed that
mutation patterns in Cys, His, glycosylation sites, and others had stronger associations
with clusters than with isolation locations or hosts (Additional file 4). The Taiwan strains did not show any unique substitution patterns. Further, mutational
signatures did not differ between MLV and wild-type strains or among different host
species. However, viral isolates from non-human primates were clustered in a separate
sublineage and had substitutions patterns unique to S24F, E276V, Q392R, D435Y, and
I542F. Thus, the association between mutational signature and host tropism requires
further study. Site-specific selection was analyzed using the SLAC and MEME methods.
The SLAC method revealed one positive selection site (position 549) and 94 negative
selection sites (P??0.05) (Fig. 5a). Figure 5b shows the results of the BMG analysis, revealing 18 pairs of epistatic interactions
(PP??0.5). Interestingly, multiple epistatic interaction was detected at position
506 (with positions 103, 386, 488, and 427), and the relationships among these five
residues are labelled in the secondary structure (Fig. 5c).

Fig. 5. Site-specific variation among global canine distemper virus H sequences. a Detection of site-specific selection. Normalized dN–dS values were plotted for each codon site by using the SLAC method on the DataMonkey
website. b Secondary structure guide for Measles virus (MeV) H protein (PDB ID codes 2ZP8).
c Epistatic interactions among CDV H protein residues. Each square node represents
the position of a residue in a CDV H amino acid sequence identified in at least one
interaction. The analysis detected edges with marginal posterior probabilities (PPs)
exceeding a default cutoff value of 0.5. Data are shown as PP(?)/PP(?)/PP(?). The
arrows indicate the directions of interactions. d The multiple epistatic sites detected in this study are labeled in the simplified
secondary structure. Each amino acid residue and its coordinate structure are also
denoted