Diversity of CRISPR-Cas immune systems and molecular machines

In general terms, there are two main classes 64] of CRISPR-Cas systems, which encompass five major types and 16 different subtypes
based on cas gene content, cas operon architecture, Cas protein sequences, and processes that underlie the aforementioned
steps (Fig. 1) 65], 66]. The first class is defined by multiprotein effector complexes (Cascade, Cmr, Csm),
and encompasses types I, III and IV. In particular, type I systems are the most frequent
and widespread systems, which target DNA in a Cascade-driven and PAM-dependent manner,
destroying target nucleic acids by using the signature protein Cas3 26], 28], 67]–71] (Fig. 2). Many studies have led to extensive biochemical and structural characterization
of the effector proteins and protein–DNA–RNA complexes implicated in type I CRISPR-Cas
systems 20], 23], 24], 46], 72]–77]. Likewise, type III systems occur frequently in archaea and are characterized by
the multiprotein Csm 78]–82] or Cmr 16], 83]–95] complexes; they operate in a PAM-independent manner and can cleave DNA or RNA by
using the signature Cas10 protein together with effector nucleases such as Cmr4 (the
RNase within the Cmr complex for type III-B systems) 85], 95] and Csm3 (the RNase within the Csm complex for type III-A systems) 81], 82]. Interestingly, several recent studies have revealed that type III CRISPR-Cas systems
can actually target both nucleic acid types, through co-transcriptional RNA and DNA
cleavage 80], 82]. Specifically, distinct active sites within the Cas10–Csm ribonucleoprotein effector
complex drive co-transcriptional RNA-guided DNA cleavage and RNA cleavage 80]. Type IV systems are rather rare and still remain to be characterized in terms of
their distribution and function.

Fig. 2. Diversity of CRISPR-Cas molecular machines. Two main classes of CRISPR-Cas systems
exist, which are defined by the nature of their Cas effector nucleases, either constituted
by multiprotein complexes (class 1), or by a single signature protein (class 2). For
class 1 systems, the main types of CRISPR-Cas systems include type I and type III
systems. Illustrated here as an example, the Escherichia coli K12 type I-E system (upper left) targets sequences flanked by a 5?-located PAM. Guide RNAs are generated by Cascade,
in a Cas6-defined manner and typically contain an eight-nucleotide 5? handle derived
from the CRISPR repeat, a full spacer sequence, and a 3? hairpin derived from the
CRISPR repeat. Following nicking of the target strand, the 3? to 5? Cas3 exonuclease
destroys the target DNA in a directional manner. In the Pyrococcus furiosus DSM 3638 type III-B system (lower left), a short crRNA guide directs the Cmr complex towards complementary single-stranded
RNA in a PAM-independent manner. For the canonical type II-A Streptococcus thermophilus LMD-9 system (upper right), a dual crRNA–tracrRNA guide generated by Cas9 and RNase III targets a 3?-flanked
PAM DNA complementary sequence for the genesis of a precise double-stranded break
using two nickase domains (RuvC and HNH). For the Francisella novicida U112 type V system (lower right), a single guide RNA targets complementary dsDNA flanked by a 5?-PAM using Cpf1,
which generates a staggered dsDNA break. Cascade CRISPR-associated complex for antiviral defense, CRISPR clustered regularly interspaced short palindromic repeat, crRNA CRISPR RNA, dsDNA double-stranded DNA, L leader, nt nucleotide, PAM protospacer adjacent motif, ssRNA single-stranded RNA, tracrRNA trans-activating CRISPR RNA

By contrast, the second class is defined by single effector proteins and encompasses
types II and V. Type II systems are defined by the popular Cas9 endonuclease 22], which hinges on dual crRNA–tracrRNA guides 30] that direct the RuvC and HNH nickase domains to generate precise blunt DNA breaks
in target DNA sequences flanked by a 3? PAM 22], 31]–34], 96], 97]. Type V systems are rare, and characterized by the signature Cpf1 nuclease, which
is guided by a single crRNA that directs this RuvC-like endonuclease for staggered
dsDNA nicking to yield sticky-ends in target DNA sequences flanked by a 5? PAM 98].

Recently, several studies have shown that, although CRISPR-Cas systems generally function
in three distinct stages, involving peculiar molecular processes and various Cas molecular
machines, the adaptation and interference steps can actually be coupled 48], 99]–101], which is consistent with the priming hypothesis 48], 102]–104]. Specifically, differential binding determines whether cognate target DNA should
be destroyed as part of the interference pathway, or whether partially complementary
sequences should be directed towards the adaptation path 48]. The coupling of the adaptation and interference stages also reflects their co-dependence
on Cas9 and PAM sequences in type II systems 100], 101], 105], and implicates a ‘cut-and-paste’ model rather than ‘copy and paste’ 100].

Overall, a broad genetic and functional diversity of CRISPR-Cas immune systems occurs
in the genomes of many bacteria and most archaea. Common denominators include DNA-encoded
immunity within CRISPR arrays that yield small guide RNAs, which define sequence-specific
targets for Cas nucleases and subsequent nucleic acid cleavage. The universal cas1 and cas2 genes, implicated in polarized, sequence- and structure-specific integrase-mediated
spacer acquisition during the adaptation stage 106]–108], are present in all characterized types and subtypes in the two main classes. By
contrast, there is substantial variability between classes, types and subtypes concerning
the nature, sequence and structure of the CRISPR RNAs and Cas proteins involved, the
reliance on and location of PAM sequences, and the nature of the target nucleic acid.
Altogether, this illustrates the extensive multi-dimensional diversity of CRISPR-Cas
systems, their native biological functions, and the relative potential for various
biotechnological and industrial applications.

The diversity of CRISPR-Cas systems reflects their various functional roles. Although
the primary established function of CRISPR-Cas systems is adaptive immunity against
invasive genetic elements such as plasmids and viruses, several studies have independently
implicated them in other functions, including endogenous transcriptional control,
as well as resistance to stress, pathogenicity and regulation of biofilm formation
63], 109]–114].

Future studies are anticipated to determine the rationale for the distribution biases
in various phylogenetic groups, for the absence of CRISPR-Cas systems in so many bacteria,
and to unravel the functional links between immunity and other key biological processes
such as DNA homeostasis and repair. One intriguing conundrum about CRISPR-Cas systems
is their absence in approximately half of the bacterial genomes sequenced to date,
despite their intuitive evolutionary value. Another important consideration is whether
the observed biases in proto-spacer sampling during adaptation correlate with efficiency
biases for the interference stage. Specifically, spacer adaptation biases have been
repeatedly observed in type I systems 115], 116] and in type II systems 105], 117], implicating replication-dependent DNA breaks at replication forks, Chi sites and
interplay with the RecBCD DNA repair machinery, and so it will be important to determine
whether these also explain spacer efficiency variability during interference.