Co-regulation of translation in protein complexes

Translational parameters and PPIs

The following translational parameters for E.coli, S.cerevisiae and H.sapiens genes were downloaded from the Transimulation website 15] and summarized in Table 1: L, coding sequence length in codons; x, average number of transcripts in a cell; g, ribosome density in the number of ribosomes attached to a transcript per 100 codons;
w, the absolute number of ribosomes on a transcript; m, estimated mean lifetime of a transcript; I, mean time required for translation initiation; E, mean time required for translation elongation; e, mean elongation time of one codon of a transcript; and b, average number of proteins produced from one molecule of transcript during its lifespan.
The average number of total proteins produced from a gene was calculated as the product
of b and x and marked as B. However, since B does not take into account protein degradation rates, it cannot be treated as an
estimation of protein abundances. The latter were therefore obtained from several
high-throughput studies 22]-24] and referred to as parameter A (in protein molecules per cell). Additionally, a new parameter R was introduced, indicating the average number of proteins produced from a gene per
second (for derivation, see Methods).

Table 1. The summary of translational parameters calculated in the model and used in this research

For each organism the sets of its binary and co-complex protein-protein interactions
(PPIs) were taken from the references listed in Table 2. Due to the lack of other data for co-complex interactions in E.coli, we analyzed the putative complexes identified previously by clustering of physical
interactions networks 25]. As the results for this set did not agree well with those obtained for co-complex
interactions from other species (see below), we repeated the analysis using E.coli operons. Even though operon proteins need not interact physically 1],26], they are linked functionally, and thus are expected to be translationally co-regulated.
This set, referred to as intra-operon proteins, may also serve as a positive control
as operon genes in prokaryotes are typically transcribed as a polycistronic mRNA 1],27], and thus should have well-concerted transcript abundances x and lifetimes m. As a negative control we used a set of 3000 random interactions, generated for each
species separately with the exclusion of its binary and co-complex PPIs (or intra-operon
proteins). However, such random interactions are known to form networks dissimilar
to those observed in biological systems. In particular, the degree distribution of
random networks is often binomial (depending on how the network was constructed),
while biological networks are usually scale-free 28]. To eliminate any interference stemming from this fact, we used an additional negative
control – a set of interactions obtained by shuffling the nodes of the existing co-complex
interactomes. Thus, the created networks have similar characteristics as their corresponding
co-complex interactomes, but the connections within them are random.

Table 2. The summary of PPIs sets used in the analysis

Correlations of translational parameters among interacting proteins

For each translational parameter we calculated from a given set of PPIs the correlation
in its value between interacting partners. As the order of proteins in interacting
pairs is arbitrary, in about half of the cases the value for the first partner is
smaller than for the second, which results in uniform dispersion of data points below
and above the 45 degree straight line (Additional file 1). The obtained correlations reflect the noisiness of this linear relationship and
typically are the strongest (i.e. least noisy) for co-complex interactions. The only
exceptions are the co-complex PPIs for E.coli as they were obtained from putative complexes determined by network clustering rather
than by direct experiment. If replaced by intra-operon proteins, correlations’ strength
becomes similar to that observed in yeast or humans. Correlations for binary interactions
are almost always positive, but weaker than the corresponding ones for co-complex
PPIs. In some cases, though, their sign cannot be determined, or the effect may be
minuscule or indistinguishable from correlation sizes observed for controls. The 95%
confidence intervals (CI) and sample sizes for all calculated correlations are presented
in Figure 1.

Figure 1. Correlation of translational parameters’ values among interacting proteins. The plots
show 95% CIs for Spearman correlation coefficient calculated separately for each translational
parameter between the first and second partners from a given set of PPIs. For each
species, four sets of PPIs were analyzed: co-complex, binary, random, and extracted
from a shuffled co-complex network. For E.coli the analysis was repeated using intra-operon proteins instead of co-complex PPIs
(right bottom panel). For most cases, the strongest correlations are observed for
co-complex PPIs (or intra-operon proteins for E.coli), especially for the protein production rate R, number of produced proteins from a gene B, transcript abundance x, and in case of yeast also mean codon elongation time e; n – sample size.

Generally, the best agreement within co-complex interactions was obtained for the
values of parameters R, B (also b), A and x. For instance, the CIs for correlation of protein production rate R are 0.60–0.63 in yeast, 0.37–0.40 in humans, and 0.49–0.57 in E.coli operons. Similar CIs, all larger than 0.37, were observed for the number of proteins
produced from a gene – B, and transcript abundance – x. For protein abundance A, the obtained CIs are close to those of B, with the exception of E.coli for which the sample was too small (n =20) to guarantee sufficiently narrow intervals.
The results for parameters related to translation time, I, E and e, do not allow definite conclusions. Although mean translation initiation time I is moderately correlated among yeast co-complex PPIs and E.coli operons (CI: 0.36–0.4 and 0.33–0.42, respectively), its correlation in humans is
much weaker (CI: 0.14–0.17), yet still larger than in the control (e.g. CI: -0.01–0.06
for random PPIs). Analogically, although the correlation of mean codon elongation
time e in yeast seems quite strong (CI: 0.54–0.58), the results are much weaker for E.coli (CI: 0.10–0.21), or close to the control for humans (CI: 0.02–0.06, while the largest
absolute value of the control confidence limit is 0.05). The remaining parameters
are either moderately or weakly correlated and may exhibit noticeable inter-species
differences. An interesting example is provided by the case of mean transcript lifetime
m, for which the strongest correlation was reported for E.coli operons (CI: 0.37–0.48), while for yeast and humans PPIs its size never exceeds 0.25.
This may be explained by the fact that operon genes often share a common, polycistronic
mRNA – the moment it undergoes degradation, all operon ORFs should become dysfunctional,
which is reflected by similar values of m. Further, more detailed analyses were performed only for protein production rates
R in yeast.

Regulation of protein production rate global picture

Our next step was to study in detail the co-regulation of translation by analyzing
the differences in protein production rates R among interacting and random protein pairs. First, we calculated and compared the
medians of R fold change for four sets of protein pairs previously used: co-complex PPIs, binary
PPIs, random pairs and random pairs obtained from the shuffled co-complex interactome.
For each analyzed protein pair the R ratio was calculated. To facilitate interpretation, the larger R value was always in the numerator, so that all obtained ratios were ? 1. Such a procedure
enables inter-sets comparisons of modes (means or medians), which otherwise would
all be close to one. It is also justified by the fact that the analyzed protein pairs
are symmetrical and only the distance of R values between both partners is of interest, while their order is random and irrelevant.
As expected, all distributions of thus computed R fold change are positively skewed (Figure 2), but the medians for co-complex and binary PPIs are always smaller than for random
protein pairs. In particular, for a typical co-complex PPI the protein production
rate R for one protein is 1.77–1.85 times higher than for its partner (the numbers are median
95% CI limits), while for a typical binary PPI the fold change in R is higher and equals 2.66–2.95. In contrast, for random and shuffled control sets,
the fold change in R is 3.67–4.07 and 3.42–3.69, respectively (see Figure 2A). This indicates that for random protein pairs the R fold change is greater by at least 1.59 than for co-complex interactions, and by
at least 0.55 than for binary interactions (see Figure 2B). For comparison, the differences in R fold change medians between two control sets range from -0.56to 0.55.

Figure 2. Protein production rate fold change distributions and comparison of medians. The main
plot shows distributions of R fold change for four sets of protein pairs, as used previously. For each pair, the
R ratio was calculated with the larger value always in the numerator; dashed lines
mark median point estimators. Panel A: 95% CIs for distribution medians; n – sample size. Panel B: 95% CIs for differences in medians, with the compared medians indicated by arrows.
As both control sets are equivalent, the difference in their medians was calculated
twice (random PPIs median minus shuffled PPIs median, and conversely). For a typical
co-complex PPI the protein production rate of one protein is 1.77–1.85 times higher
than for its partner, while for random protein pairs this ratio is higher by at least
1.59 and equals about 3.40–4.00.

Except for medians, the distributions of R fold changes for PPIs and random protein pairs differ in standard deviation (sd).
To quantify this difference, for each pair a logarithm of fold change in R was calculated, which guarantees symmetrical distribution of values around mean 0,
as shown in Additional file 2 (no restriction in the choice of numerator is needed this time). For each data set
the 95% CI for sd was calculated (see Additional file 2, panel A). Not surprisingly, two control sets have higher sd, plausible values of
which lie between 2.17–2.31 and 2.31–2.41 for the random and shuffled data set, respectively.
In contrast, sd for real PPIs log fold change distributions amount to about 1.47–1.57
and 1.72–1.86 for co-complex and binary PPIs, respectively. The difference in sd between
real PPIs and random protein pairs is at least 0.64 for co-complex, and 0.35 for binary
interactions (Additional file 2, panel B). For comparison, the same difference calculated between two control sets
does not exceed 0.21.

Regulation of protein production rate case study

Although the obtained correlations for the key translational parameters seem reasonably
good, the agreement of parameter values for many interacting protein pairs is not
always perfect. Such discrepancies may be explained by the noise in biological data
or deficiencies in the computational model used to calculate values of the parameters.
However, it is also possible that some of them reflect the biological functions of
the proteins and their role in the complex or interactome. To illustrate this, we
analyzed the details of protein production rates R of the components of several well-known complexes in yeast: two general transcription
factors (GTFs) and a proteasome.

Transcription factors TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH constitute basal
transcription factors that bind to specific sites on DNA to activate transcription,
by forming an RNA polymerase II preinitiation complex. They are involved in, i.a.
proper positioning of polymerase„ assembly of complex components, transcription initiation
coordination and channeling the regulatory signals. The subunits of all GTFs and their
production rates are presented in Figure 3. As may be seen, they do not vary much – as the R values of all GTFs’ subunits lie between 0.03 and 0.58, while for the entire genome
between 0.0003 and 109. Nevertheless, some of the observed differences may be explained
by the functions of individual proteins, as best seen for the largest complexes, TFIID
and TFIIH.

Figure 3. Protein production rates of general transcription factors for RNA polymerase II. Top:
schematic structure of the transcription initiation complex in yeast. Bottom: composition
of basal transcription factors (after KEGG 41], accessed May 2014), along with the values of the protein production rate R for each subunit. R values correspond to the color intensity. Subunits forming other complexes (according
to SGD 29], accessed May 2014) are marked by blue symbols explained on the left. Protein production
rates are similar for the majority of TFIID and TFIIH components. Most of the observed
discrepancies in R may be explained by an additional biological function of a subunit in other complexes.

For instance, TFIID consists of a TATA binding protein (TBP) and several subunits
of TBP-associated factors (TAFs). Proper formation of the complex requires that all
subunits are present in stoichiometric proportions at the right moment, which may
be guaranteed by similar production rates of its protein components. As shown in Figure
3, such is the case for 7 out of 15 subunits of known R, for which it has the same order of magnitude ranging from 0.03 to 0.09. According
to the Saccharomyces Genome Database 29], all these subunits are known to participate only in the TFIID complex, and thus
their R level may be treated as the baseline for the entire complex. The remaining subunits,
with the exception of TAF4, are not TFIID specific. Five of them (TAF5, TAF6, TAF9,
TAF10 and TAF12), with R level ranging from 0.11 to 0.27, may also be found in the SAGA chromatin remodeling
complex, while the TAF14 subunit of R=0.50 is an important component of the SWI/SFN, NuA3 and INO80 chromatin remodeling
complexes. The highest R value is observed for TBP, which is not surprising as this is the only subunit that
participates in the formation of transcription initiation complexes specific to all
three types of polymerases. The second transcription factor, TFIIH, consists of ten
subunits, nine of which share similar R values, ranging from 0.05 to 0.10, while the tenth (TTDA) has a small outstanding
value of 0.20. All subunits of the TFIIH core (see Figure 3), plus MAT1, also form the nucleotide excision repair factor 3 (NEF3) complex; however,
the remaining kinase CDK7 and its associated cyclin CCNH are not dissimilar in R values.

Another example, the proteasome, is a cylindrical protein complex which degrades unneeded
or damaged proteins by proteolysis. Its core consists of two inner and two outer rings,
each composed of seven individual ? and ? subunits, respectively. The proteolytic activity of the core is controlled by binding
of the regulatory particle, built of a base and a lid of 9 subunits each, or by binding
of other regulatory factors, that recognize polyubiquitin tags and initiate the degradation
process. As shown in Additional file 3, protein production rates for most ? and ? subunits and most of the 18 subunits of the cap are all remarkably similar, and range
from 0.50 to 0.76. Few subunits, ?4, Rpt3, Rpn2, and Rpn3, show slightly lower R values of 0.47, 0.42, 0.35, and 0.40, respectively, whereas subunits ?5, Rpn13, and Rpn15 show elevated rates of respectively 1.04, 0.85, and 1.49. In case
of the Rpn15 subunit, this may be explained by the fact that it also forms a TREX-2
complex involved in mRNA export from the nucleus. In contrast, an important proteasome
activator PA200, which should be needed in smaller amounts than the proteasome itself,
has a one order of magnitude lower protein production rate (R=0.05).

Translation regulation in party and date hubs

We also calculated and compared the agreement in values of translational parameters
between party/date hubs and their interacting partners. For each of the 91 date hubs
and 108 party hubs identified previously in yeast 13], we extracted its PPIs from the co-complex and binary interactomes. As a control
we used the sets of random interactions of party and date hubs, prepared as described
in Methods. Next, for each translational parameter its agreement within protein pairs
was calculated as previously, i.e. by calculating 95% CIs for the Spearman correlation
coefficient. The results for five parameters, x, e, B, A and R, which had a correlation of at least 0.39 in the general analysis of co-complex associations,
are presented in the left panel of Figure 4, while the right panel shows 95% CIs for correlation differences between party and
date hubs for co-complex PPIs (i), for binary PPIs (ii), and for random PPIs (iii).

Figure 4. Correlations of translational parameters’ values within interactions of party and
date hubs in yeast. Left: 95% CIs for Spearman correlation coefficient calculated
for translational parameters x, e, B, A, and R between the first and second partners of the given set of PPIs. The PPIs sets were
prepared by extracting all interactions for party (dark colors) and date hubs (light
colors) from co-complex PPIs network (green), and binary PPIs network (magenta). Random
PPIs (gray) were prepared as described in Methods; n indicates the number of protein
pairs in each subset. Right: 95% CIs for difference in correlation coefficients between
party and date hubs; for each translational parameter the difference was calculated
separately for each type of PPIs. For all parameters, except protein abundance A, correlations are the strongest for co-complex PPIs of party hubs. The results for
the remaining parameters are shown in Additional file 4.

In case of four parameters, e, x, R and B, party hubs show stronger agreement than date hubs, but only within co-complex interactions.
The largest difference is observed for the protein production rate R and transcript abundance x, which is within 0.2–0.37 and 0.18–0.37, respectively. For comparison, the correlation
differences between party and date hubs random PPIs are not larger than 0.1 and 0.12
for R and x, respectively; moreover their signs cannot be determined. For the mean codon elongation
time e this difference may be a bit smaller, as its CI for co-complex PPIs ranges from 0.13
to 0.32, and is only a little above the upper confidence limit of the control correlation
difference. Furthermore, for the number of produced proteins B party-date correlation difference lies within 0.03–0.26 (for comparison, the control
CIis -0.18–0.07), and we cannot exclude the possibility that it is negligible. For
protein abundance A both party and date hubs exhibit moderate, yet very similar correlations, within
0.31–0.46 and 0.25–0.48, respectively; hence the sign of the difference between them
cannot be determined (CI: -0.11–0.16).

In case of binary interactions, the sample sizes are several times smaller, which
leads to wider CIs and may hide underlying effects. For instance, for all parameters
the observed correlations between hubs and their partners cannot even be claimed as
positive or negative. Nevertheless, taking into account their upper CI limits, as
well as the results presented in Figure 1, one should not expect effects larger than for co-complex PPIs. Also the results
for the remaining translational parameters L, g, w, m, I, E, and b are not informative – the sign of the obtained correlation difference cannot be determined,
or, in case of w, its size is very difficult to interpret due to statistical uncertainties (seeAdditional
file 4).