Elucidation of the recognition mechanisms for hemicellulose and pectin in Clostridium cellulovorans using intracellular quantitative proteome analysis

To identify substrate recognition systems of C. cellulovorans, we carried out a quantitative proteome analysis of cells grown anaerobically on
media with the carbon sources-glucose, xylan, galactomannan (LBG), or pectin. Using
a quantitative proteome approach based on isobaric tagging, we obtained protein profiles
from cells grown on each carbon source. The workflow of the “intracellular” proteome
analysis is illustrated in Figure 1.

Figure 1. Experimental procedure of C. cellulovorans intracellular quantitative proteome analysis. Proteins prepared from cell lysates
of C. cellulovorans grown in the presence of glucose, xylan, galactomannan (LBG), or pectin were individually
reductive-alkylated and digested with trypsin. Tryptic fragments were labeled with
tandem mass tags (TMTs). The labeled peptides were mixed and injected into the LC–MS/MS
system with a long monolithic silica capillary column for mass measurement and collected
data were used for protein quantification.

Growth confirmation

To determine the growth of C. cellulovorans on each of the substrates, we conducted bacterial protein estimation (Figure 2). As has previously been shown, C. cellulovorans can grow on xylan, pectin, or galactomannan as the sole carbon source (Sleat et al.
1984]), although other cellulosome-producing Clostridia, such as C. thermocellum and C.cellulolyticum, cannot grow on pectin and galactomannan (Petitdemange et al. 1984]; Prawitwong et al. 2013]). Growth on glucose, xylan, and pectin was slower than that on galactomannan, but
cells were collected from all cultures at similar growth phases. From the growth analysis,
we selected a culture time of 36 h, as cells were in the late-logarithmic phase, which
was appropriate for proteome analysis in terms of growth phase and protein concentration.

Figure 2. Confirmation of growth of C. cellulovorans cultured with four different substrates. Growth of C. cellulovorans was measured by estimation of protein in cell lysates to determine an appropriate
culture time (Raman et al. 2009]). At 36 h, C. cellulovorans appears to be in the late-logarithmic phase in all four substrates (glucose, xylan,
galactomannan, pectin). These conditions were used for the proteome analysis. Error bars indicate SD (n = 3).

Qualitative and quantitative proteome analysis

LC–MS/MS equipped with a long monolithic silica capillary using a tandem mass tag
(TMT) 6-plex isobaric tag column was employed for intracellular quantitative proteome
analysis. C. cellulovorans has 4,254 protein-encoding genes in its genome (Tamaru et al. 2010]). For protein identification, we constructed a protein database built from the genome
of C. cellulovorans. In total, we could identify 734 proteins from all samples within our cutoff criteria
(Additional file 1: Table S1). To correct for variations in the amount of TMT-labeled peptides infused
into the mass spectrometer, we normalized all quantitative data to the median value
of that analysis. For all data analysis, we used these normalized relative quantification
values. We checked for reproducibility of quantitative protein profiles among three
biological replicates for each substrate by HCA (Figure 3). Each array was clustered individually based on each biological replicate. Our results
indicate that each sample was reproducibly quantified, and analytical and biological
replication of results was ensured. Furthermore, we separately carried out a qualitative
proteome analysis to compare the results of quantitative proteome analysis (Additional
file 2: Table S2).

Figure 3. Hierarchical clustering analysis represented proteome profiles of C. cellulovorans with four different substrates. To standardize the data, the quantitative data were
normalized using the median for each condition. The quantitative proteome data from
three biological replicates of each substrate were used for the hierarchical clustering
analysis. Each array was clustered in response to substrates, and each biological
replicate was grouped together. This strategy ensures that the proteome analysis was
biologically reproducible. Color bar indicates changes in protein abundance. Increased and decreased protein levels are
shown in yellow and blue, respectively.

Substrate-specific proteins

To discover substrate-specific proteins (those that showed a significant change between
different growth substrates), we carried out an empirical Bayes moderated t test (Smyth 2004]). P values were adjusted with the Benjamini–Hochberg method to avoid the problem of multiple
testing. The thresholds that we adopted were FDR-adjusted p value of 0.05 and fold-change of protein ratio 2.0 compared to glucose. Proteins
for which the levels significantly changed were defined by comparison between glucose
and each polysaccharide (xylan, galactomannan, and pectin) at these thresholds. All
substrate-specific proteins detected are shown in Table 1. Using KEGG analysis and cluster analysis based on genome analysis, we focused mainly
on metabolism-related and substrate recognition-related proteins for further analysis.

Table 1. Substrate-specific proteins

Profiles of metabolism-related proteins

First, we focused on profiles of metabolism-related proteins, such as enzymes involved
in substrate degradation and metabolism, and other characteristic metabolic pathway.
C. cellulovorans is known to change production of carbohydrate-related enzymes secreted into media
from exoproteome analyses (Esaka et al. 2015]; Matsui et al. 2013]) and alternation of production of different metabolic pathways depending on which
substrates are available in culture is also predicted.

Degradation and metabolism of each substrate

We constructed a substrate degradation pathway from KEGG pathway maps, and presented
the fold change of each protein (Figure 4). For xylan degradation- and metabolism-related proteins (Figure 4a), the levels of three proteins (Clocel_0590, 0592, and 2595) were significantly
elevated in the presence of xylan (Table 1), but production of Clocel_2900 (Endo-1, 4-beta xylanase) was not specifically elevated
(Additional file 1: Table S1).

Figure 4. Degradation and metabolism pathways for each substrate, constructed from KEGG analysis.
Each substrate degradation and metabolism pathway is shown: a xylan, b galactomannan (LBG), c pectin. For each protein, the fold change compared to glucose is shown. Asterisk denotes that protein levels are significantly elevated; as threshold, fold change
2.0, and FDR-adjusted p value 0.05 are adopted. ND protein not detected in this analysis. NA genes commonly assigned to the pathway, but not annotated. Fold change in glucose-grown
cells is shown with a gray bar (=1); fold change in xylan-grown cells is shown with a pink bar; fold change of galactomannan-grown cells is shown with a green bar; fold change in pectin-grown cells is shown with an orange bar. XI xylose isomerase, PMI phosphomannose isomerase, GALT galactose-1-phosphate uridyltransferase, PGM/PMM phosphoglucomutase/phosphomannmutase alpha/beta/alpha domain I, 1P 1-phosphate, 6P 6-phosphate.

For galactomannan degradation- and metabolism-related proteins (Figure 4b), the levels of 9 proteins (Clocel_2259, 2800, 3194, 3196, 3198, 3205, 4087, 4088,
and 4089) were significantly elevated in the presence of galactomannan (Table 1). By contrast, the levels of mannanase (Clocel_1134 and 4119) were not significantly
elevated (Additional file 1: Table S1).

For pectin degradation- and metabolism-related proteins (Figure 4c), levels of 8 proteins (Clocel_2250, 2251, 2254, 2256, 2259, 2262, 2263, and 3380)
were significantly elevated in the presence of pectin (Table 1). For the degradation and metabolism of pectin, both the “hydrolase/isomerase pathway”
and the “lyase/5-dehydro-4-deoxy-gluconate pathway” were found (Richard and Hilditch
2009]). Clocel_2254 is a member of the glycoside hydrolases (GH) 28 family (Cantarel et
al. 2009]), which is known to have endogalacturonase and exogalacturonase activities. Additionally,
Clocel_3380 is a member of the polysaccharide lyase (PL) 9 family. BLAST analysis
indicates that the protein has exogalacturonate lyase activity.

Genome and cluster analyses

To identify candidates related to substrate recognition systems, we performed genomic
analysis based on C. cellulolyticum TCS-related gene clusters (Xu et al. 2013]). For the thresholds of TCS-regulated cluster identification, we applied two criteria.
First, a cluster must contain more than two components of a TCS pathway: an integral
membrane histidine kinase (sensor histidine kinase), a transcriptional regulator (response
regulator), and an extracellular solute-binding protein (sugar binding protein). Second,
degradation-, metabolism-, or transport-related gene loci must be located within 2
of the TCS genes identified above. Using this threshold, we identified 14 candidate
clusters related to TCS.

Combining the information of our genome analysis and the substrate-specific proteins,
we identified xylan- and galactomannan-specific clusters that include TCS-related
genes. We identified clusters corresponding to two substrates; the xylan-specific
cluster included Clocel_2592, 2595, 2596, 2597, and 2598 (Table 1), and Clocel_2593 and Clocel_2594 (Additional file 1: Table S2). The galactomannan-specific cluster included Clocel_3194, 3196, 3198,
3200, 3201, and 3205 (Table 1), and Clocel_3195, 3197, 3199, 3202, 3203, 3204 (Figure 5; Additional file 2: Table S2). Each cluster includes three components of a TCS: a transcriptional regulator
AraC, an integral membrane histidine kinase, and an extracellular solute-binding protein.
We suggest that these genes are common components of a hemicellulose recognition system.

Figure 5. Candidates for gene clusters related to substrate recognition. We carried out genome
analysis and identified candidates for gene clusters involved in TCS. Combining these
results and substrate specific proteins, we found substrate-specific gene clusters
related to recognition of each substrate. Green arrows indicate metabolism-related proteins, pink arrows indicate signal transduction-related proteins, and blue arrows indicate transport-related proteins. White arrows indicate pseudogenes.

From pectin, we found increased levels of proteins Clocel_2250, 2251, 2253, 2254,
2255, 2256, 2259, 2262, and 2263 (Table 1) and Clocel_2249, 2257, 2258, and 2260 (Additional file 2: Table S2). We suggest that this is a large cluster for the degradation and metabolism
of pectin. For the degradation and metabolism of pectin, hydrolase/isomerase and lyase/5-dehydro-4-deoxy-gluconate
pathways are known. The pectin-specific cluster which we identified contains genes
related to both of these pathways, as well as for transporting pectin.

We also identified a xylose-metabolic cluster containing proteins Clocel_0589, 0590,
0591, and 0592 (Table 1) that had increased levels specifically in the presence of xylan. A galactose-metabolic
cluster containing proteins Clocel_4087, 4088, and 4089 (Table 1) and Clocel_4090 (Additional file 2: Table S2) had elevated levels specifically in the presence of galactomannan. The
xylose-metabolic cluster contained two xylose metabolism-related proteins, Clocel_0590
and 0592 (Table 1; Figure 4a), transaldorase (Clocel_0591; Table 1), which is related to the pentose phosphate pathway, and alpha-fucosidase (Clocel_0589;
Table 1). In the galactose-metabolic cluster, galactose metabolism-related genes are present
(Figure 4b) and increased levels of proteins Clocel_4087, 4088, and 4089 (Table 1) and Clocel_4090 (Additional file 2: Table S2) were found.