Chemostat culture systems support diverse bacteriophage communities from human feces


Human subjects and chemostat cultured communities

We recruited five human subjects through the University of Guelph and sampled their
feces. Donors #1, #2, and #10 were a co-habiting family unit of father, mother, and
child, respectively, while donors #8 and #9 were unrelated. Each fecal sample was
homogenized and processed immediately into a chemostat vessel, and the remainder of
the feces was stored until processing of the viromes could take place. Chemostat vessels
were operated under conditions designed to mimic the human distal colon 30]. Cultured microbial communities were taken from donors #1, #2, and #10 on days 4,
8, 12, 16, and 24 and from donors #8 and #9 on days 3, 6, 12, 18, and 24. Previous
studies have shown that cellular microbial communities reach stable state in culture
by day 24 30]. We tested for the presence of fluorescent-staining particles (FSPs) in feces and
cultures, similar to those previously described, to indicate the likely presence of
viruses in both sample types 38]. On day 24, we found that there were numerous FSPs in both sample types, with a mean
3.7?×?10
9
FSPs in feces and 1.4?×?10
9
FSPs in the chemostat cultures for all subjects studied. The presence of such high
densities of FSPs in both sample types strongly suggested the presence of substantial
viral communities.

We isolated and processed viruses from both feces and chemostat cultures utilizing
sequential filtration followed by cesium chloride density gradient centrifugation
according to our previously described protocols 39]. We sequenced the resulting viral DNA from the feces and chemostat cultures of the
five donors using semiconductor sequencing 40] and produced 18,584,604 reads (619,487 reads per time point and sample type) of mean
length 215 nucleotides (Additional file 1: Table S1). We used BLASTN to compare all viromes to the RDP 16S rRNA genes database
(E-value 10
?5
) and found that all were free of 16S rRNA gene homologues. We also used BLASTN to
search the viromes for homologues against a human reference genome, and some similar
sequences (E-value 10
?5
) were identified and removed prior to further analysis. These data suggest that these
chemostat and fecal viromes were relatively free of contaminating cellular nucleic
acids.

Identification of viruses and viral families in cultured fecal material

We assembled virome reads from each subject to construct longer contigs, as this generally
results in more productive searches for sequence similarities. We used several different
assemblers including CLC Genomics Workbench, MetaVelvet 41], and IDBA-UD 42] in the construction of contigs and found that CLC Genomics Workbench produced fewer
contigs with longer mean and maximum lengths and higher N50 values than the other
assemblers tested (Additional file 2: Table S2). With the CLC assembler, 96.4?±?1.3 % of all reads were assembled into
contigs. Therefore, we utilized the contigs constructed using CLC Genomics Workbench
throughout the study. The mean GC content for all contigs was 45.5 % (mean of 46.3 %
for cultured viromes and 41.7 % for fecal viromes; p?=?0.01) (Additional file 3: Figure S1, Panel A). The mean length amongst all contigs was 941 nucleotides (mean
of 908 nucleotides for cultured viromes and 965 nucleotides for fecal viromes; p?=?0.35) (Additional file 3: Figure S1, Panel B). The differences identified in GC content suggest that there
may be features of viromes that are specific to both fecal and cultured communities.

Prior studies of viromes generally have identified large proportions of putative viruses
that have no significant sequence similarities in available databases. We used BLASTX
analysis of the assembled contigs against the NCBI non-redundant database (NR) to
determine the relative proportion of our viromes that had significant sequence similarities.
We utilized the numbers of reads assigned to each contig to identify the proportions
of reads belonging to contigs with significant sequence similarities in the NR database.
We found that in all subjects combined, 94.9?±?1.7 % of the reads belonged to contigs
that had similar sequences in the NR database (Fig. 1); 88.0?±?10.2 % of those reads belonged to contigs that had sequence similarities
to phages, 6.9?±?9.9 % belonged to contigs that were similar to bacteria, and 5.1?±?1.7 %
belonged to contigs with no significant sequence similarities in the NR database.
Most virome studies have contigs that have significant sequence similarities to bacteria;
however, in the absence of finding 16S rRNA genes in any of the viromes, the similarities
to bacteria likely represent annotation deficiencies rather than bacterial contamination.
These data strongly indicate that by analysis of assembled contigs, we gain a more
comprehensive view of the constituents of viromes than is typically observed from
analysis of virome reads.

Fig. 1. Plots of percentage of virome reads that belong to contigs with significant sequence
similarities in the NCBI NR database. The percentage of reads is demonstrated on the
Y-axis, and each donor, time point, and sample type is demonstrated on the X-axis

To decipher whether there may be viruses present in the cultured viral communities
similar to those present in other public databases, we mapped the reads from each
subject against a composite database of known viruses including the NCBI viral database
and the Phantome database. We found that there were numerous different viruses that
were matched by reads from the cultured viral communities. For example, many reads
from day 24 mapped to Enterobacteria Phage FIAA91ss, including 1280 reads (0.14 %
of the reads) from donor #1 (Fig. 2a) and 1981 reads (0.25 % of the reads) from donor #2 (Fig. 2b). Very few reads matching Phage FIAA91ss were found in donor #1 on days 4, 8, 12,
and 16 (Additional file 3: Figure S2, Panel A). Similar results were found for donor however, a higher
number of reads were identified on day 12 and their proportion increased up to day
24 (Additional file 3: Figure S2, Panel B). Enterobacteria Phage FIAA91ss is a myovirus, and although many
myoviruses lead primarily lytic lifestyles, the presence of predicted integrase and
transposase genes in the FIAA91ss genome suggests that like other P2-like phages,
it has a primarily lysogenic lifestyle. There also were many reads from day 24 that
mapped specifically to Enterobacteria Phage IME10, including 985 reads (0.11 % of
the reads) from donor #1 (Fig. 2c) and 255 reads (0.03 % of the reads) from donor #2 (Fig. 2d). Unlike Phage FIAA91ss, Phage IME10 was found in both donors at all time points
(Additional file 3: Figure S3). Enterobacteria Phage IME10 is a podovirus, a group which generally lead
lytic lifestyles; however, Phage IME10 has a predicted repressor protein that suggests
it may be lysogenic. Many reads from both donors also mapped to Enterobacteria Phage
HK620 at all time points, suggesting that a similar phage is present in the cultured
viromes of both donors (Additional file 3: Figure S4). Interestingly, reads matching Enterobacteria phages FIAA91ss, IME10,
and HK620 were identified in donors #1 and #2 (husband and wife) but were not identified
in donors #10 (daughter), #8, or #9.

Fig. 2. Read mappings of chemostat viromes from day 24 to Enterobacteria phage FIAA91ss (a and b) and Enterobacteria phage IME10 (c and d). a and c represent donor #1 and b and d represent donor #2. The genes and their respective directions are shown by the yellow arrows, and the annotation of each gene is represented above. The relative location along
the phage genomes are demonstrated by the scale at the top of the diagram, and the
relative proportion of reads mapping to the phages are shown at the lower portion
of the diagrams

We also analyzed the cultured viral communities using the MG-RAST Server 43] and categorized those with sequences similar to known phages according to their families.
We found that in all five donors, their cultured communities at all time points had
reads similar to known myoviruses, podoviruses, and siphoviruses (Additional file
3: Figure S5). The proportions of reads similar to those different phage families generally
differed substantially within a subject over time and between different subjects.
The relative proportions of phage families observed do not necessarily reflect those
found in the fecal viromes of each subject, particularly in donors #8 and #9. Viruses
from the family Microviridae also were found in the feces of four of the five subjects,
but none were identified in the chemostat cultures.

Comparisons of fecal and cultured viral communities

We identified significant sequence similarities to each assembled viral contig by
BLASTX analysis against the NCBI non-redundant database to decipher which viral genes
had similarities in the chemostat cultures. The vast majority of the contigs had similarities
to hypothetical phage proteins, proteins involved in replication/integration, restriction/modification
enzymes, or tail fibers (Additional file 3: Figure S6). There were no significant differences identified in the relative proportions
of contigs similar to different phage categories for either fecal or cultured phage
communities regardless of the time point examined.

We previously demonstrated that phages in the mouth are highly persistent members
of human oral microbial communities 10]. We utilized similar techniques to decipher whether phages in the chemostat cultures
might also persist over time. By creating viral assemblies from all time points within
a donor combined, we then could assess which different time points contributed to
each assembly. We found that 45?±?15 % of assemblies from all subjects included contigs
from day 4, 42?±?8 % from day 8, 44?±?10 % from day 12, 38?±?17 % from day 16, 37?±?18 %
from day 24, and 19?±?7 % from feces (Additional file 3: Figures S7 and S8). The substantial difference in the percentage of assemblies that
included fecal contigs suggests that there is less conservation in the phages present
in feces compared to chemostat cultures. We also used BLASTN to compare the contigs
between all donors and time points studied and found a similar pattern of conserved
viruses over time in the chemostat cultures (Additional file 3: Figure S9). There was generally less conservation when comparing the chemostat culture
viromes with those of the feces. The patterns of similar viruses across all donors
suggested that there were individual-specific features of the viromes in each donor,
giving the heatmap a “matrix-like” appearance. There was considerable similarity amongst
the chemostat and fecal viromes of donors #2 and #10 (mother and child). To verify
that the patterns of shared viruses we observed in the heatmap (Additional file 3: Figure S9) and the assemblies (Additional file 3: Figure S7) were statistically significant, we utilized a permutation test 44] to compare the proportions of shared viruses by sample type. For the cultured communities,
25 % of the virus contigs sampled had significant sequence similarities across all
subjects compared with only 9 % when comparing cultured communities with fecal communities
(Table 1), a difference that was close to statistical significance (p?=?0.057). Similar results were found for fecal communities, where 25 % of the viruses
sampled across subjects had significant sequence similarities compared with only 13 %
when comparing between cultured communities and fecal communities (p?=?0.141).

Table 1. Viral homologues within and between subject groups

We created assemblies from all contigs in each donor to determine the relative proportions
of viral contigs that were conserved in each donor over time. In donor #1, 53.3?±?4.0 %
of the contigs were conserved amongst the chemostat cultures over time compared to
25.9?±?3.0 % conserved between the chemostat cultures and the feces (Fig. 3). In donor #2, 62.3?±?6.2 % were conserved in chemostats compared to 26.5?±?4.2 %
between feces and chemostats; in donor #8, 44.3?±?17.5 % compared to 18.6?±?4.4 %;
in donor #9, 45.4?±?5.5 % compared to 32.2?±?11.6 %; and in donor #10, 65.3?±?7.1 %
compared to 18.9?±?3.5 %. In the chemostat cultures, the greatest conservation generally
was between days 16 and 24, with 64.6?±?8.4 % of the viral contigs conserved. These
data indicate that while there are shared viruses between the chemostat cultures and
the feces, there are identifiable differences in viral ecology between the sample
types. Because our data also suggested that there were individual-specific features
of each cultured virome, we used a permutation test to verify that the viral communities
were significantly individual-specific across all time points in the cultured communities
(Table 2). In all subjects studied, there was a statistically significant (p???0.05) trend observed, where the viruses in each of the cultured communities were
significantly individual-specific.

Fig. 3. Heat matrices of the percentage of assemblies from each donor (a-e) that contained contigs from each time point and sample type

Table 2. Chemostat and fecal virome homologues within and between subjects

We identified thousands of assemblies from all donors constructed from many different
time points (Additional file 3: Figures S7 and S8). Each of these assemblies had identifiable phage sequence similarities.
For example, in donor #9, we identified a 36,261 bp contig with numerous sequence
similarities to phage genes across its length including hydrolase, helicase, and tape
measure genes (Fig. 4). Similar results could be found for all donors; however, not all time points contributed
equally to each assembly (Additional file 3: Figures S10, S11, S12 and S13). Interestingly, many of the assembled viruses had
identifiable restriction/modification genes, which corroborate our findings of a high
number of contigs with significant similarities to restriction/modification enzymes
in the chemostat viromes (Additional file 3: Figure S6). M23 peptidases (Additional file 3: Figure S10), toxin-antitoxin genes (Additional file 3: Figure S11), S-layer, and platelet-binding proteins (Additional file 3: Figure S12) all had similar sequences identified in the phage genomes. A phage from
donor #8 shared some synteny with crAssphage 45] (Additional file 3: Figure S13).

Fig. 4. Assembly of contig 126 from all time points in donor #9. The portions of the contig
identified in each time point or the feces are represented by the colored boxes. Putative ORFs and their directions are represented by the yellow arrows, and their annotations are represented above. The length of the contig is denoted
at the top

We utilized the taxonomic information from the virome BLASTX hits to determine whether
the phages from fecal and cultured communities had similar profiles. We found that
the profiles of BLASTX hits differed between the different donors and varied based
on the time point examined (Fig. 5). Within each donor, the profiles were somewhat similar over time with the most substantial
profile differences between the chemostat cultures and the feces in most subjects.
The most abundant phyla identified were Bacteroidetes and Proteobacteria, but Firmicutes,
Fusobacteria, and Verrucomicrobia also were identified. There was a relatively high
number of Verrucomicrobia identified in donors 8 and 10, which represented the genus
Akkermansia. For comparison, we characterized the taxonomy of the bacterial communities
in fecal and cultured communities using 16S rRNA genes. As expected, operational taxonomic
units (OTUs) belonging to phyla Bacteroidetes and Firmicutes were amongst the most
highly abundant taxa identified (Additional file 3: Figure S14).

Fig. 5. Bar graphs demonstrating proportion of contigs with BLASTX significant sequence similarities
to phages that parasitize the specified bacterial phyla for fecal and chemostat viromes.
Each bar from left to right represents the day of culture, and the last bar for each donor represents fecal viromes

We utilized principal coordinates analysis (PCOA) based on the beta diversity between
the fecal and cultured viromes to determine if there were patterns of variation specific
to each subject and sample type. Each of the cultured viromes reflected the subject
from which they were derived (Fig. 6a). While the variation present in fecal viromes could be distinguished from the cultured
viromes, they clustered near the cultured viromes in each donor, indicating that there
were shared features between the fecal and cultured viromes. The patterns of variation
observed on PCOA were highly robust, as similar patterns were observed when the PCOAs
were constructed based on contigs contributing to assemblies (Additional file 3: Figure S15A) and BLASTX hit profiles (Figure S15B). The distinction between the
different donors was not as apparent when examining the bacteria using 16S rRNA genes
(Fig. 6b). These data indicate that much of the individual-specific character of human fecal
viromes was captured in chemostat culture systems.

Fig. 6. Principal coordinates analysis of beta diversity present in the viromes based on patterns
of similar contigs between each virome (a) and bacteria by 16S rRNA genes (b) of each subject and sample type. Fecal samples are represented by squares, and chemostat viromes are represented by circles. In a, ovals are drawn around the data points for each individual donor

Viral diversity in fecal and cultured communities

We developed tools for characterizing viral communities to discern whether richness
and diversity were conserved between fecal and cultured communities. The technique,
termed the Homologous Virus Diversity Index (HVDI), is primarily based upon the Shannon
Index 46] and is used for comparing diversity between different communities. The HVDI utilizes
contig spectra as a surrogate for population structures and corrects for the limitations
imposed on the contig spectra by assembly methods by assigning the spectra for highly
similar contigs to the same viral genotypes 47]. We validated the technique by randomly constructing viromes using viruses from the
NCBI virus and Phantome databases to meet specific genotype and evenness requirements.
Viromes were constructed with 10, 50, 100, 500, and 1000 different viral genotypes
across an evenness spectrum consisting of 0.10, 0.33, 0.50, 0.67, and 0.90. We created
10 separate viromes at each genotype and evenness value to ensure the data were reproducible
and used the number of genotypes and randomly sampled reads from each virome to calculate
the Shannon Index (Fig. 7). We then used the contig spectra from the assembled reads from each virome as input
for the HVDI. When only 10 different viruses were evaluated, the HVDI values approximate
the Shannon Index across all evenness levels (Fig. 7a), and similar results were found for 50 viral genotypes (Fig. 7b). At higher numbers of genotypes (100, 500, and 1000), the HVDI exceeded the Shannon
Index, although not considerably (Fig. 7c–e). The extent of the diversity overestimates were related to the evenness in the viromes,
where the lowest evenness value of 0.1 resulted in the highest overestimates of diversity
(Additional file 3: Figure S16). For evenness values of 0.33–0.9, the percentage differences between
the HVDI and the Shannon Index were 12 % or lower and were highly consistent across
the spectrum. These data indicate that the HVDI provided estimates of viral diversity
similar to those of the Shannon Index and demonstrated that overestimates of viral
diversity by the HVDI across different evenness levels were consistent and predictable.

Fig. 7. Bar graphs (±standard deviation) representing the Homologous Virus Diversity Index
(HVDI) and Shannon Index values for a group of randomly constructed viromes. Each
virome was constructed by randomly sampling amongst the viruses present in the NCBI
and Phantome databases, and each was constructed to meet specific evenness values.
The Shannon index was determined based on the actual sampling of the viruses in the
databases, and the HVDI was determined after assembly of the randomly constructed
viromes. For each evenness value, 10 separate iterations were performed on different
sets of randomly sampled genomes. The y-axis represents values for either the Shannon
Index or the HVDI, and the x-axis represents the evenness value to which the viromes
were constructed to meet. a–e represent the different numbers of virus genotypes that were sampled

We next used the HVDI to perform rarefaction analysis to determine whether the viruses
in the viromes had been adequately sampled and as a measure of whether the richness
of viruses differed substantially between fecal and cultured communities. In this
case, we calculated the HVDI based on the Chao1 index 48] because it penalizes more heavily for the presence of the rarer viral contigs in
each sample. We found that there was no association between the sample type and the
richness within the viral communities and that the diversity estimates approached
asymptote in many cases, indicating that little additional viral diversity would have
been identified through further sampling (Additional file 3: Figure S17).

We next compared the results of the HVDI using the Shannon Index to investigate whether
the diversity of the viral component of cultured communities was similar to that from
the feces in each subject. We found that for all subjects, viral diversity in the
cultured communities changed as a function of time. For example, in donor #2, viral
diversity generally increased from day 4 to day 24, while viral community diversity
generally decreased in donor #9. For donors #1, #2, and #10, the diversity present
in cultured communities on day 24 was highly similar to that in the feces of each
subject (Fig. 8a). For donors #8 and #9, the diversity in cultured communities diminished significantly
by day 18 which continued through day 24. Neither of the day 24 viromes in donors
#8 or #9 approximated the diversity present in the fecal viral communities. Evenness
values for all subjects and time points were highly consistent with the HVDI data
(Fig. 8b). Most of the evenness estimates ranged from 0.2 to 0.6, which represents the range
which the HVDI likely is to overestimate diversity by approximately 12 % (Additional
file 3: Figure S16). The evenness values for viral communities in donors #8 and #9 diminished
significantly, which suggests that these communities were populated by fewer viruses
that represented the majorities of the population. We repeated the virome preparations
from the cultured communities on days 24 from donors #8 and #9 and similar results
to those observed were found, indicating that there likely was a significant drop
in viral diversity in these chemostat cultures (data not shown). Interestingly, the
number of FSPs was not different for these cultured communities in donors #8 and #9,
so the differences in evenness cannot be attributed to fewer total viruses being present.
Alpha diversity for the bacteria as determined by the Shannon Index increased from
day 4 to 24 and more closely approximated the feces by day 24 (Additional file 3: Figure S18). These data suggest that the relative loss of viral diversity in the
chemostat cultures was not due an overall loss of bacterial diversity.

Fig. 8. Bar graphs representing the Homologous Virus Diversity Index (a) and Homologous Evenness Index (b) for all subjects. The y-axis represents diversity, and on the x-axis each bar from left to right represents day of culture. The last bar for each donor represents the fecal viromes