Global analyses of TetR family transcriptional regulators in mycobacteria indicates conservation across species and diversity in regulated functions

TFTRs are the most abundant type of HTH DNA binding proteins in mycobacterial genomes

The majority of HTH-containing DNA binding proteins are sub-divided into families
based on the structure and spatial arrangement of the helices 18]. InterPro 19] was used to identify the total complement of HTH DNA-binding proteins across 10 mycobacterial
genomes (see Methods) and to classify the mycobacterial HTH proteins into their different
families. The results, alongside the number of ORFs of each of the species are given
in Table 1.

Table 1. The total number of HTH proteins (including TFTRs) in mycobacterial genomes

We identified a total of 2338 HTH DNA binding proteins across the 10 mycobacterial
genomes. Of these 2338, 906 are TFTRs. For the mycobacterial species analysed, the
number of HTH DNA-binding proteins increases with increasing number of ORFs. In general
the soil-dwelling species such as M. gilvum and M. smegmatis have a larger number of ORFs and so might be expected to contain a larger number
of HTH DNA binding proteins. If we compare M. gilvum with M. marinum, two mycobacteria with similar genome size but one soil dwelling and one adapted
for survival in fish and amphibians, we see a reduction in the number of HTH DNA-binding
proteins in the host adapted species (272 for M. marinum compared to 328 for M. gilvum) indicative of a reduction in diversity of the conditions within the intra-cellular
environment.

TFTRs make up 26–48 % of the HTH DNA-binding capacity in all species (Table 1, column 3 in brackets). In order to determine if the TFTRs were the most abundant
type of HTH DNA-binding protein, the entire HTH complement across the 10 mycobacterial
species was classified into family using InterPro. A complete list of genes belonging
to each HTH family in all 10 genomes is given in Additional file 1: Table S1. The numbers of genes in each HTH family in 10 mycobacterial species are
shown in Fig. 1. Within the HTH superclass, TFTRs are by far the most represented in all mycobacterial
genomes. The next best represented HTH classes are GntR, enriched in M. smegmatis with 62 assignments but with a small number of representatives in the pathogenic
mycobacteria, and OmpR – 14–15 members in all mycobacteria excluding M. leprae.

Fig. 1. Numbers of HTH representatives in selected mycobacterial genomes grouped by family.
The results were obtained by performing a search in the non-redundant proteome of
each species using the Interpro signatures: AraC (IPR018060), RpiR (IPR000281), Lrp/AsnC
(IPR000485), GntR (IPR000524), MerR (IPR000551), Rok (IPR000600), LuxR (IPR000792),
MarR (IPR000835), LacI (IPR000843), LysR (IPR000847), Rrf2 (IPR000944), DeoR (IPR001034),
Xre (IPR001387), TFTR (IPR001647), CrP (IPR001808), ArsR (IPR001845), OmpR (IPR001867),
MetJ (IPR002084), FurR (IPR002481), HrcA (IPR002571), HxlR (IPR002577), PadR (IPR005149),
IclR (IPR005471), LexA (IPR006199), NtrC (IPR010114), CitB (IPR012830), ModE (IPR016462),
ArgR (IPR020900), IdeR (IPR022687), sigma 70 (IPR014284)

M. leprae has a drastically reduced genome and so a reduction in the number of TFTRs is expected.
In order to determine whether the level of reduction in TFTRs was proportional to
genome size we calculated the numbers of TFTRs as a percentage of open reading frames
(Table 1, column 5). Interestingly, the percentage of TFTRs in the M. leprae was only 0.6 %, far less than the other mycobacteria possibly reflecting a disproportionate
loss of this family in this species.

It is difficult to say whether mycobacterial genomes are enriched for TetR regulators
from this analysis but by way of comparison, E. coli encodes 261 DNA-binding transcription factors in its 4.6 Mbp genome, of which only
5 % are TFTRs 1]. Staphylococcus pyogenes, another intra-cellular Gram positive pathogen, encodes approximately 81 DNA-binding
factors, as part of its 1.85 Mbp genome, of which ~5 % are TFTRs. Soil dwelling bacteria
are known to have a large number of TFTRs and so the large numbers in the pathogenic
mycobacteria may be a reflection of their evolution from a soil dwelling ancestor
1].

Conservation of TFTRs among the mycobacteria indicates a role in survival for both
the environmental and pathogenic species

The advantage of assessing conservation at the genus level is that it might help to
distinguish those TFTRs that are involved in shared processes from those that are
required for the more adaptive functions. This is particularly important for mycobacteria
where different species have different hosts in addition to environmental representatives.
Conservation was assessed as described in the materials and methods. The results are
given in Additional file 2: Table S2.

When M. leprae is included in the analysis, there are five TFTRs that are conserved across all mycobacteria
analysed. These are shaded in blue in the Additional file 2: Table S2 (Rv0238, Rv0472c, Rv3050c, Rv3208 and Rv3855 (ethR)). The conservation of these regulators across all mycobacterial genomes, including
the drastically reduced M. leprae genome suggests that the functions of these regulators are required for survival
in both host adapted and environmental niches. The M. leprae gene ML2457 is divergently oriented to a pseudogene and may not have a physiological role in
this species. This group of regulators include ethR, a TFTR involved in antibiotic resistance that represses genes required for the activation
of the antibiotic ethionamide. Mutations in this regulator cause resistance 15]. Its conservation in M. smegmatis and M. gilvum suggests that it might be useful in this species as a defence mechanism against antibiotic
producers in the soil in the battle for resources.

Given that M. leprae has a much reduced genome and our previous analysis suggested a disproportionate
loss of TFTRs we re-assessed conservation across mycobacterial genomes but this time
excluded M. leprae. Those TFTRs that are conserved across all mycobacteria (excluding M. leprae) are shaded in green in Additional file 2: Table S2. The TFTRs present in M. tuberculosis are, in general well conserved with 22 of the 52 regulators having orthologs in all
species included in the analysis. This group of regulators include kstR (Rv3574) and kstR2 (Rv3557c), involved in cholesterol catabolism 10], 11], 20]. Their conservation in both pathogenic and environmental species suggests sterols
are likely to be encountered in the environment (phytosterols and ergosterols) as
well as in the host (host cholesterol). The conservation of the KstR regulators in
M. avium subspecies paratuberculosis suggests that cholesterol catabolism is also important for this intestinal pathogen.
This is supported by the recent observation that cholesterol is a carbon source for
M. avium subspecies paratuberculosis in the bovine intestine 21].

Conservation analysis identifies those TFTRs that are only present in the pathogenic
representatives

In order to identify those TFTRs that might be uniquely involved in pathogenic processes
(i.e. conserved in the pathogens but not conserved in the environmental species) we
identified those TFTRs that were missing from both M. smegmatis and M. gilvum but present in the pathogenic species (Additional file 2: Table S2).

Only one regulator (Rv0078, shaded purple in Additional file 2: Table S2), was present in all pathogens, including M. leprae. However, the ortholog in M. leprae (ML2677) is divergently oriented to a pseudogene and so it is possible that it does not have
a physiological role in M. leprae. Excluding the disproportionately reduced M. leprae from the analysis, three TFTRs (Rv0653c, Rv1167c and Rv1556) are conserved in the pathogenic species only and these are shaded in orange in the
Additional file 2: Table S2. These candidates might control functions uniquely important for survival
in the host.

Six genes were uniquely found in the species that cause tuberculosis (Rv0302, Rv0328, Rv0330c, Rv1534, Rv2160A and Rv3160c). These genes are shaded in red in Additional file 2: Table S2. With the exception of Rv3160c and Rv2160A, we currently do not have any experimental evidence of the functions that these six
TFTRs might control. There is a frame shift mutation in Rv2160A in M. tuberculosis that makes it non-functional in this species. Rv2160A is situated on a likely operon with upstream and downstream genes Rv2159c and Rv2161c, respectively. These flanking genes show higher expression in M. tuberculosis and differential expression might have an impact on host preference 22]. Rv2159c is annotated as an alkyl hydro peroxidase, whereas Rv2161c is a conserved hypothetical protein. Their role in the physiology of the bacterium
is unknown. Rv3160c and the neighbouring genes Rv3161c (a dioxygenase) and Rv3162c (a membrane protein) are induced upon exposure to antibiotics but the precise physiological
functions of these genes remain unknown 23].

The TFTR regulator Rv1255c is present in M. tuberculosis but missing from M. bovis and the vaccine strain M. bovis BCG Pasteur

The sequence of the M. bovis and M. tuberculosis genomes are 99.95 % similar and it has often been hypothesised that the widely different
host preference exhibited by these species is a reflection of changes in gene expression
rather than content. Aside from Rv1255c, the complement of TFTRs in M. bovis and M. tuberculosis is identical.

Rv1255c lies in the RD10 region which is part of a series of deletions that occurred in the
“ancestral” – TbD1?+???species in the Mycobacterium africanum???Mycobacterium microti???M. bovis lineage. The RD10 deletion is present in strains that show wide host diversities
and geography such as humans in Africa, voles in the UK, seals in Argentina, goats
in Spain, and cattle and badgers in the UK 24], 25]. This regulator is on a putative two gene operon with the cytochrome p450 cyp130 (Rv1256c), also within the RD10 region. Studies of the function and regulation of CYP130 in the “modern” – TbD1- strains
of human adapted M. tuberculosis might allow us to gain additional knowledge of some of the biochemical differences
between “modern” M. tuberculosis and “ancestral” and animal adapted species.

Similarly there are deletions in TFTRs in other strains of M. bovis BCG that might influence the efficacy of the vaccine. Rv3405c is in the RD16 region deleted from M. bovis BCG Moreau but a link between this deletion and vaccine efficacy is unknown 26].

Most mycobacterial TFTR regulators are divergently oriented to an adjacent gene

It has been recently reported by Ahn et al., that examination of the genome context of TFTRs can be a useful tool for the prediction
of the genes they regulate 16]. This study, which focused on Streptomyces, showed that TFTRs that are divergently oriented to their neighbouring genes and separated
by 200 bp or less can be reliably predicted to control the neighbouring gene. This
analysis showed that the functions of the neighbouring gene(s) were more diverse than
just drug efflux.

In order to examine the situation in mycobacteria, we analysed 663 TFTRs from M. tuberculosis, M. avium paratuberculosis, M. marinum, M. ulcerans, M. gilvum and M. smegmatis, for orientation, length of intergenic region and function of adjacent genes. The
regulators were classified into groups (A–C) according to the criteria laid down by Ahn et al.(A) divergent orientation with neighbour, (B) likely to be co-transcribed with upstream or downstream gene as they are in the same
orientation and the intergenic DNA separating them is???35 bp, and (C) show neither (A) or (B). The results are shown in Fig. 2.

Fig. 2. Classification of TFTRs according to relative orientation. 663 TFTRs from M. tuberculosis (MTB, 52 TFTRs), M. avium subspecies paratuberculosis (MAP, 110 TFTRs), M. marinum (MM, 124 TFTRs), M. ulcerans (MUL, 88 TFTRs), M. gilvum (MGIL, 129 TFTRs) and M. smegmatis (MSM, 160 TFTRs) were divided into three groups according to their genome context.
a 408 TFTRs (33 in MTB, 64 in MAP, 77 in MM, 51 in MUL, 87 in MGIL and 110 in MSM)
are encoded divergently to their neighbours. Here, the TFTR-encoding gene is located
on the left side, but the positions of this gene and its divergent neighbour are interchangeable.
b 146 TFTRs (13 in MTB, 31 in MAP, 26 in MM, 18 in MUL, 26 in MGIL and 32 in MSM) are
likely co-transcribed with their upstream or downstream genes as the intergenic DNAs
separating them are less than 35 bp. c 109 TFTRs (6 in MTB, 15 in MAP, 21 in MM, 19 in MUL, 16 in MGIL and 18 in MSM) show
neither of the two aforementioned orientations

In all six species approximately 60 % of the TFTRs are divergently oriented with their
neighbour and this is similar to the figure reported by Ahn et al., for Streptomyces species. The next most favoured arrangement is co-transcription with neighbouring
genes followed by an ambiguous arrangement.

For those that are divergently transcribed, the majority of regulators are separated
from their divergent partners by 200 bp or less (Fig. 3). So, for M. tuberculosis 25 out of the 33 divergently oriented genes are separated by 200 bp or less (76 %)
and such high frequencies are also observed in the rest of the mycobacteria (53/64
for M. avium paratuberculosis (83 %), 58/77 for M. marinum (75 %), 34/51 for M. ulcerans (67 %), 74/87 for M. gilvum (85 %) and 96/110 for M. smegmatis (87 %). These analyses suggest that the majority of the divergently oriented TFTRs
can be predicted to regulate the adjacent gene.

Fig. 3. Lengths of the intergenic regions of the divergently oriented mycobacterial TFTR regulators.
The intergenic regions from the 422 divergently oriented regulators from M. tuberculosis (Mtb), M. avium paratuberculosis (MAP), M. marinum (MM), M. ulcerans (MUL), M. gilvum (MGIL) and M. smegmatis (MSM) were analysed for length. Each dot represents an intergenic region and the
length is given on the y-axis. Each of the genes were assigned a number e.g. 1–33
for MTB, 34–97 for MAP, 98–174 for MM, 175–225 for MUL, 226–312 for MGIL and 313–422
for MSMEG. The assignation of number was done in gene number order in each organism
e.g. 1?=?Rv0067c, 2?=?Rv0078, 3?=?Rv0135c etc. and this is given on the x-axis. The
line represents a cut-off intergenic region size of 200 bp. The graph shows that the
majority of divergently oriented genes are separated from their neighbour by 200 bp
or less

Functional analysis of divergently oriented adjacent genes reveals that TFTRs control
a diverse range of metabolic functions not limited to efflux

We examined the functions of the genes divergent to the TFTRs in the six mycobacterial
genomes in order to determine the possible functions regulated. We only included those
genes that were separated from their divergent TFTRs by 200 bp or less. 340 genes
from four different genomes (M. tuberculosis, M. avium paratuberculosis, M. marinum M. ulcerans, M. gilvum and M. smegmatis) were analysed in total. The results are shown in Fig. 4.

Fig. 4. Functional classification of the products encoded by the divergent neighbouring genes.
Genes that were divergently oriented to TFTRs with an intergenic region of 200 bp
or less in M. tuberculosis, M. marinum, M. avium paratuberculosis and M. smegmatis were analysed as described in the materials and methods. Gene products that were
enzymes were classified according to class (EC 1 to EC 6). Non enzymatic products
were classified into membrane proteins, other proteins (e.g. transcriptional regulators),
and proteins of unassigned function

Fifty-eight percent of the divergently oriented genes are enzymes. The predicted enzymes
were sub-divided into Enzyme Commission (EC) number according to the reactions they were predicted to catalyse and
by the presence of domains associated with that particular enzyme class. The majority
of enzymes (40 %) are oxidoreductases (EC1) indicating that, in mycobacteria, the
majority of TetR regulators control the expression of enzymes involved in energy and
cellular metabolism, which may be crucial for metabolic adaptation.

Membrane proteins only account for 10 % of the functions of divergently oriented genes
and attempts to further classify these were made using Pfam (http://pfam.xfam.org/) and Superfamily (http://supfam.cs.bris.ac.uk/SUPERFAMILY/). 22 of the 35 membrane proteins gave either no hits or contain a conserved domain
of unknown function (pfam04286). 5 of the membrane proteins belong to the major facilitator
superfamily of transporters (cl18950), 2 are PPE family proteins (pfam00823), 1 contains
a mycobacterial membrane protein domain (pfam05423), 1 is a membrane bound histidine
kinase (pfam00672), 1 is a chloride channel protein (pfam00654), 1 is a sodium decarboxylate
symporter family (pfam00375), 1 is an ABC transporter family (pfam01061) and 1 is
an amino acid permease (pfam13906).

These results are in agreement with the findings by Ahn et al.16], and lend further support to the realisation that TFTRs do not just regulate efflux
pumps. Our analyses suggest that TFTRs regulate a diverse range of as yet uncharacterised
metabolic functions.

Analysis of the upstream region of divergent TFTRs identifies 11 novel putative binding
motifs

TFTRs typically bind to palindromic operators. The model TetR from E. coli binds as a dimer to a 15 bp palindrome while QacR from S. aureus binds as a tetramer to a 28 bp palindrome 3]. A number of TFTRs from M. tuberculosis (Rv3066, KstR, KstR2, BkaR) also bind to palindromic motifs 10], 11], 13], 14]. Motifs for Mce3R and EthR have also been described but these are larger, more complex,
in multiple copies and do not conform to the classical structure of a palindromic
sequence separated by a small number of bases 15], 27].

We used the programmes MEME and MAST to identify regulatory motifs in the intergenic
regions for those regulators that were conserved across a number of species and were
divergently oriented to the neighbouring gene 17]. A total number of 30 TFTRs were examined in the analysis, including the previously
experimentally verified motifs. The results are given in Table 2.

Table 2. Motif analysis of the intergenic regions of conserved divergently oriented TFTRs

The experimentally verified motifs show an e-value of??E-20 therefore we classified
the motifs into highly significant (e value is?=?or??E-20) and less significant (E-20).
Of the 30 motifs that were analysed, 11 passed the cut-off. These represent a set
of probable TetR binding motifs for the regulators listed.

Conservation analysis of the C-terminal domain of TFTRs in M. tuberculosis

Although the work presented here and elsewhere support the idea that it is straightforward
to predict at least one direct target gene for a previously unstudied TFTR, the real
challenge is in the determination of the small-molecule ligands that the TFTRs bind
to at the C-terminal end. Identification of the ligand is a means by which the biochemical
function(s) of the target genes can be elucidated as many of the ligands bound by
TFTRs are related to the biochemical functions of the target gene 28].

Previously phylogenetic analysis has been use to subdivide the GntR family of regulators
in M. tuberculosis into functional clades based on the amino acid sequence similarity of their effector
domain 29]. Additionally, phylogenetic approaches have been used to make general functional
predictions for transcription factors for the AraC family 30]. Potentially a similar analysis could be applied to the C-terminal ligand binding
domain of TFTRs to sub-divide the family into groups. Previously larger pan-genomic
studies have grouped TFTRs based on amino acid sequence including TFTRs with known
ligands and used this information to predict a ligand for a TFTR from Streptomyces,
a prediction that was experimentally verified 28].

We aligned the C-terminus of all TFTRs in M. tuberculosis and attempted to establish phylogenetic groupings using widely-employed methodologies
such as parsimony, maximum likelihood and neighbour joining. The alignments obtained
were poor due to very low sequence similarity, approximately 7 % average identity
between amino acid sequences. Conversely, the phylogenetic trees obtained (data not
shown) showed overall weak groupings and no evident relationships. A previous study
of the TFTRs reached a similar conclusion on the phylogeny of the C-terminus and found
that the average identity score of the effector domain is only 9 % between TFTRs of
known structure 31]. In contrast, an alignment of the N-terminal domain of the same TFTRs showed an average
of 27 % identity. This reinforces the notion of a more conserved N-terminal (DNA binding)
domain compared to a variable C-terminus.

Although amino acid sequences vary considerably, secondary structure prediction of
the C-terminal ligand binding domain reveals conserved features. We predicted the
secondary structures of each TFTR regulator in M. tuberculosis using JPred 3 and found a common architecture 32]. There are 6 ?-helixes in the C-terminal ligand binding domain (?4 to ?9) in most
of the 52 regulators (Additional file 3: Figure S3). A few deletions seemed to have occurred, as in the case of ?8 in one
of the Mce3R heterodimers and Rv3066. Some insertions also occur Rv1353c (after ?6)
and Rv0330c (after ?7). Although helixes are conserved in number, conservation of
amino acid residues is extremely poor among the same helix for different regulators,
with the exception of the first helix, ?4, which produces a notably better alignment
than the others. This could be expected considering that ?4 directly interacts with
the conserved HTH motif within the N-terminus and is part of the tetra-helical arrangement
of the DNA binding region of TFTRs 18].

Meta-analysis of published essentiality and expression studies triages those for further
study and indicates infection relevant physiological functions for a selection of
TFTRs

In order to determine those TFTRs that might have a role during infection in M. tuberculosis we performed a meta-analysis of selected published microarray studies to determine
those TFTRs that are either essential or show expression changes in infection models
or under in vitro conditions that mimic aspects of infection. The results of the analysis is shown
in Additional file 4: Table S4.

Twenty-four TFTRs showed expression changes in at least 1 of the experimental conditions
while 7 regulators were essential in at least one of the experimental conditions.
This analysis helps to prioritise those TFTRs that might be taken forward for further
study of the regulatory mechanisms involved in survival of M. tuberculosis.

Four regulators are essential for infection in the mouse model (Rv2912c, Rv3050c, Rv3574 (kstR) and Rv3855 (ethR)). The physiological role of kstR is in the catabolism of cholesterol as a carbon source during infection 10], 20], 33], but the physiological role of the other essential TFTRs are unknown. The role of
EthR in the control of ethA, an enzyme required for the activation of an anti-tuberculosis therapy ethionamide,
is well documented but its physiological role remains unknown 15], 34]–36]. Interestingly EthR is also induced under hypoxia and in dendritic cells. This analysis
suggests an infection relevant physiological function for this regulator.