Survey of clustered regularly interspaced short palindromic repeats and their associated Cas proteins (CRISPR/Cas) systems in multiple sequenced strains of Klebsiella pneumoniae

Survey of CRISPR/Cas system in K. pneumoniae genomes

Given that only a small number of K. pneumoniae genomes have been completely annotated and reported we used also completed but not
assembled genomes in this study. In total, 52 complete and draft genomes of K. pneumoniae were analyzed for the presence of components of the CRISPR/Cas system by using the
CRISPRFinder software 20]. This program was used with the genomes already loaded on its database and with draft
sequences that were uploaded manually after BLAST searches were performed as described
in the “Methods” section. CRISPR sequence arrays and cas genes were detected in two out of the eight complete genomes and in four out of the
44 draft genome sequences available (Table 1 and Additional file 1: Table S1). In some cases CRISPRFinder detected regions with CRISPR, such as strains
342 and JM45, but no adjacent cas genes were found. These sequences were not considered to have a CRISPR/Cas system
and were not included in the subsequent analysis. In order to corroborate CRISPRFinder
results, five 20 kbp random sequences derived from the strain 1084 were generated
(see “Methods” section) and analyzed. These data showed that no CRISPR sequences were
found in any of the sequences analyzed. Taken together these results demonstrate that
the CRISPR/Cas system is not homogenously distributed in all K. pneumoniae strains and it was found in only 12% of the analyzed strains.

Table 1. Klebsiella pneumoniae strains with a CRISPR/Cas system

Genomic context of CRISPR/Cas

In general, based on the MAUVE alignment (for details refer to “Methods” section)
it was observed that the region where the cas operon is located in all genomes, only one locally collinear block (LCB) was found.
Therefore, this region seems to be shared and syntenic. In the genomes of the strains
NTUH-K2044 [GenBank: NC_012731], WGLW2 [GenBank: NZ_JH930419], and WGLW5 [GenBank:
NZ_JH930428], this system was found encoded in the complementary strand. In contrast,
in the genomes of strains 1084 [GenBank: NC_018522], JHCK1 [GenBank: NZ_ANGH02000012]
and RYC492 [GenBank: NZ_APGM01000001], it was found encoded in the plus strand. We
also found that upstream and downstream sequences of the cas operon were variable in all cases, which shows the variability of the CRISPR sequences
(Fig. 1).

Fig. 1. Location of CRISPR/Cas system in the genome of diverse strains of Klebsiella pneumoniae. Alignment generated with Progressive MAUVE of the six genomes that contain CRISPR/Cas.
The region was grouped into a single locally collinear block (red). At the ends of the cas operon (empty or blank regions marked by yellow arrows), there is variability, probably due to the presence of CRISPR sequences.

When analyzing the region upstream of the CRISPR/Cas we observed that genes were identical
and encode for different subunits of an ABC type transporter (ID: AFQ65464, AFQ65465,
AFQ65466), multiple subunits of a formate dehydrogenase (ID: AFQ65461), malate dehydrogenase
(ID: AFQ65462), and amino acid transporters (ID: AFQ65453, AFQ65454, AFQ65460). Interestingly,
we also found genes that seem to code proteins for antimicrobial resistance such as
glyoxalase and efflux pumps (MdtM, multidrug efflux system protein) (ID: AFQ65457,
AFQ65459, EMH97621). On the other hand, and similarly that observed at the 5? end,
at the 3? end of the CRISPR/Cas region there was no variability. This region contains
genes related to antibiotic resistance, such as lactoylglutathione lyase (or glyoxalase,
which confers resistance to bleomycin) (ID: AFQ65478), and genes encoding different
subunits of proteins involved in cell metabolism, such as 2-gluconate dehydrogenase
(ID: AFQ65479, AFQ65480, AFQ65481), heme protein exporters (ID: BAH63785, BAH63784),
and proteins involved in the biogenesis of cytochrome C (ID: AFQ63377, AFQ63378, AFQ65482,
AFQ65483, AFQ65484, AFQ65485, AFQ65486, AFQ65487, AFQ65488) (Fig. 2a). In the draft genomes, most of the genes are annotated as hypothetical; however,
a detailed analysis of this region reveals that the size and sequence of the genes
is similar amongst all genomes with these systems. Taken together our analysis demonstrated
that those K. pneumoniae strains harboring a CRISPR/Cas system are syntenic.

Organization of the cas operon

As mentioned before, for the CRISPR/Cas systems there are always associated coding
genes to the CRISPR sequences. K. pneumoniae, draft genomes and complete genomes, this system consists of eight cas genes that are syntenic (Fig. 2b). The cas genes identified were, from 5? to 3? direction: cas3, cse1 also known as casA, cse2 also known as casB, cse3 or casE, cse4 or casC, cas5e, cas1 and cas2. As a whole, this suggests that the cas operon is conserved in those strains containing CRISPR/Cas systems and probably has
a common evolutionary history in all these Klebsiella strains. This finding suggests that the cenancestor of Klebsiella contained the CRISPR system.

Fig. 2. Genomic context of the CRISPR/Cas system in diverse strains of Klebsiella pneumoniae.a Genomic context of cas operon. Enzymes related to bacterial metabolism and some antibiotic resistance genes
are located in the vicinity of cas operon. b CRISPR/Cas organization. The cas operon consists of eight genes and the CRISPR sequences are located downstream from
cas2 and upstream from cas3 in those genomes containing two CRISPR arrays.

Analysis of the CRISPR sequences

In all genomes containing the CRISPR/Cas system, CRISPR sequences were found upstream
of the cas3 gene and downstream of the cas2 gene, in those genomes with two CRISPR arrays. Strains RYC492 and WGLW2, presented
only one CRISPR array. In the RYC492 strain, the array was located downstream of cas2 and contained 11spacers. In the WGLW2 genome the CRISPR sequence was upstream of
cas3 and had 3 spacers (Table 1; Fig. 2). Strains NTUH-K2044, 1084 and JHCK1 had two CRISPR arrays: The NTUH-K2044 strain
contained 22 spacers in one array (downstream of cas2) and three in the upstream of cas3; strain 1084 presented 14 and 8 spacers (downstream of cas2 and upstream of cas3), respectively, whereas strain JHCK1 contained 15 and 9 spacers (downstream of cas2 and upstream of cas3). The average length of the repeats was 29 bp whereas spacers had an average length
of 33 bp.

Subsequently, and based on the comparison of the cas operon of K. pneumoniae with that of E. coli (Type I-E or CASS2), we observed that K. pneumoniae strains have the same number of genes but with a difference in the location of cse3. That is, for E. coli cse3 is located downstream of cas5e while in K. pneumoniae it is located between cse2 and cse4 (Fig. 2). Whether this rearrangement influences the formation of the CASCADE complex involved
in the recognition of foreign genetic material in K. pneumoniae still unknown and a matter of future research. In order to characterize the DRs in
each CRISPR sequence we performed a detailed analysis by aligning all 10 of the DRs
obtained through the analysis derived from the CRISPRFinder. The consensus sequence
of these DRs showed a conserved GT(C/g)TTCCCC sequence at the 5? region and a conserved
GGGG(G/a)T(G/a)(T/a) (T/a)(T/c)C at the 3? region. The main changes were detected
in the middle of the sequence (position 12 to 15). Our results show that the DR sequence
was symmetrical and partially palindromic (Fig. 3). Given the immune role exerted by the CRISPR/cas system, it has been observed that
the spacer sequences are derived from HGT material 28]. In order to define the origin of the spacers in the systems identified in K. pneumoniae BLASTn searches were performed. This analysis showed that 38 of the 116 spacer sequences
(33%) have significant similarity to plasmids, phages or genome sequences in Klebsiella or other bacteria. The distribution of these sequences was: 13% (15/116) of the spacer
sequences had similarity to genes belonging to phages, 8% (9/116) corresponded to
gene sequences of plasmids, 5% (6/116) to genes of the Klebsiella spp. genome, while 7% (8/116) were similar to genes that belong to genomes of other
bacteria. The remaining 78 sequences (67%) showed no significant similarity to any
other sequence (Fig. 3). In addition, strains that share spacer sequences were not detected. These results
show a diverse origin of the CRISPR sequences, indicating that they were probably
acquired from diverse events involving the entry of foreign genetic material.

Fig. 3. Description of direct repeats and spacer sequences found in Klebsiella pneumoniae genomes. a Logo obtained in WebLogo of the direct repeats consensus sequences of CRISPR arrays.
The sequences are partially palindromic and symmetrical. b Match of spacer sequences with sequences of phages, plasmids and bacterial genomes
deposited in GenBank.