Population whole-genome bisulfite sequencing across two tissues highlights the environment as the principal source of human methylome variation


Study cohort, and data generation and processing

We performed WGBS on 34 adipose (seven MZ pairs, six DZ pairs, and eight singletons)
and 27 blood (seven MZ pairs, six DZ pairs, and one singleton) DNA samples derived
from a total of 43 female twins belonging to the MuTHER cohort 22], 23] (see “Methods” and Additional file 1: Table S1). We generated 11.5 billion 100 base pair (bp) paired-end reads covering
2.3 tera base pairs (Tbp) of sequence. We applied standard alignment methods and filters
to obtain a mean genome coverage of 6.3-fold (range 1.0-fold to 12.9-fold) for adipose
and 8.7-fold (range 0.7-fold to 29.0-fold) for blood (Additional file 1: Table S1). We compared the mean genome coverage to the number of overall detected
CpGs for each sample, observing that CpG-discovery was saturated at ~6-fold coverage,
detecting ~27 million sites (Additional file 2: Figure S1). The median bisulfite conversion efficiency of CpHs was determined to
be 99.4 % (range 97.4–99.8 %; Additional file 1: Table S1).

We have previously profiled the adipose samples on the Illumina 450 K array 14], which we used here for comparison. We observed sample-based correlations of Illumina
450 K and WGBS to saturate at 10-fold to 12-fold average genome sequencing coverage
(Pearson’s R ~0.94) (Fig. 1a), and at 12-fold per CpG coverage (Pearson’s R?~?0.94) for CpG-based correlations
(Fig. 1b). Additionally, we observed correlation to be highly dependent on strand-concordance
of methylation, i.e., we observed low correlation of CpGs displaying discordant methylation
between forward and reverse strands (Fig. 1c). We also noted low correlation across the two approaches for CpGs displaying abnormal
high coverage (see “Methods”). These CpGs were shown to be enriched in “blacklisted”
regions defined by the ENCODE project (http://hgwdev.cse.ucsc.edu/cgi-bin/hgFileUi?db=hg19g=wgEncodeMapability), which include genomic regions with artifactual high read counts across tissues
and cell lines in DNaseI, formaldehyde-assisted isolation of regulatory elements (FAIRE),
and ChIP-seq experiments 24]. Taken together, in all subsequent analysis we excluded CpGs not covered by at least
two reads per strand with an absolute strand difference in methylation of ? 20 %,
CpGs located within blacklisted regions defined by ENCODE or us (see “Methods”), and
CpGs not located on autosomes, leaving on average 7.9?×?10
6
unique and high-confidence CpGs per sample. The number of CpGs passing all filters
correlated well with overall sequencing depth with a ratio of approximately 1 million
detected CpGs per 1-fold mean coverage, saturating at ~20-fold coverage (Additional
file 2: Figure S1).

Fig. 1. Comparison of sequencing-derived and array-derived methylation data. 450 K methylation
array data were available for adipose tissue, and methylation values for CpG-sites
jointly interrogated by both whole-genome bisulfite sequencing and the array were
extracted for each individual. Shown are Pearson’s correlation coefficients derived
from comparing array to sequencing methylation data for jointly interrogated CpG-sites
for (a) samples at indicated mean genome coverage (black diamonds), with blue crosses indicating the number of jointly detected sites per sample, (b) CpG-sites at indicated coverage across all samples, (c) CpG-sites displaying 0 to 20 % (blue crosses), 20 to 40 % (red diamonds), 40 to 60 % (green points), or 60–100 % (purple triangles) methylation difference between forward and reverse strands at indicated sequencing
coverage

Global CpG methylation patterns

First, we aimed to characterize the global CpG methylation landscape per tissue and
thus combined all datasets. In total, we covered 23.6?×?10
6
and 25.0?×?10
6
CpGs in adipose and blood, respectively, with a mean methylation of 80 % irrespective
of tissue (Additional file 2: Figure S2A, B). Because private sequence variants mapping to CpGs (i.e., introducing
or removing) might bias methylation measures, we overlapped the two datasets with
dbSNP137 25] and noted that as much as 27 % of our detected CpGs in fact overlapped an annotated
single nucleotide polymorphism (SNP). Taking this into account, we covered 17.2?×?10
6
and 18.4?×?10
6
CpGs (Additional file 1: Table S2) in the combined variant-removed adipose and blood datasets, respectively.
We noted that removing SNPs only marginally impacted mean and median methylation levels
(Additional file 2: Figure S2C, D). Of these CpGs, 19 % and 22 % mapped to CGI-associated regions in
adipose and blood, respectively, leaving the vast majority mapping to regions with
lower CpG density. Focusing on the genic context we observed ~53 % of CpGs per tissue
to locate within gene regions (Additional file 1: Table S2; for region annotation see “Methods”).

Previous studies have identified areas with low CpG methylation (50 %) to comprise
active regulatory regions 17], 26], 27], which we aimed to study in more detail here using the combined datasets. To be conservative
we only kept CpGs with a minimum coverage of 12-fold (14.7?×?10
6
CpGs in adipose, 17.8?×?10
6
CpGs in blood). We identified on average 63,000 low-methylated regions per tissue
with ~30 % being unmethylated regions and the remainder being low-methylated regions
(Additional file 1: Table S3A). Irrespective of tissue, unmethylated regions displayed stable methylation
across regions nearing 0 % and spanning an average genomic size of ~2400 bp and on
average 122 CpGs per region. In contrast, low-methylated regions displayed methylation
levels between 5 % and 45 % and a size of ~750 bp (Additional file 1: Table S3B, Additional file 2: Figure S3A, B) with only on average 11 CpGs. We then compared these different unmethylated
and low-methylated regions across tissues and found that as much as 90 % of the detected
unmethylated regions were identified in both tissues, compared to the low-methylated
regions that displayed more tissue-specificity with an overlap across tissues of only
~45 % (Fig. 2a, b) 15]–17].

Fig. 2. DNA methylation footprint in adipose tissue and blood. Datasets were merged within
tissues, keeping only sites covered ?12-fold, and the MethylseekR software 26] was employed to identify unmethylated and low-methylated footprints in adipose and
blood. Shown are Venn diagrams describing tissue-specificity of identified (a) unmethylated regions (UMR) and (b) low-methylated regions (LMR). For overlapping areas, numbers indicate percentage of overlap based on adipose
tissue and blood as indicated. c Histograms display the number of tissue-shared and adipose-specific UMRs overlapping
with ranked H3K4me3 bins (see text; top 5 % are shown). d Adipose LMRs and adipose-specific UMRs overlapped with ranked H3K4me1 bins (top 5 %
are shown)

To further investigate the characteristics of the identified adipose unmethylated
and low-methylated regions, we used ChIP-seq data from the NIH Roadmap consortium
28] for the active promoter-associated H3K4me3 29] and the active enhancer-associated H3K4me1 30] marks in human adipocytes. We generated a normalized signal intensity ranked list
for these histone marks using a background-subtracted binning approach (see “Methods”
and 14]) and overlapped the top 1 % bins with the unmethylated and low-methylated adipose
regions identified in the combined dataset. In total, 78 % of the adipose unmethylated
regions overlapped an the H3K4me3 bin and this co-localization was further strengthened
when restricting to similar regions also identified in blood, corresponding to 84 %
(Fig. 2c). This pattern was further pronounced if we restricted to H3K4me3 bins located 1 kb
upstream of a transcription start site (TSS) where 91 % of the adipose unmethylated
regions overlapped. We also looked specifically at the unmethylated regions detected
only in adipose (N?=?2,320) and noted here that only 35 % overlapped a H3K4me3 bin
(Fig. 2c), but instead the majority (56 %) of these overlapped with a top 1 % H3K4me1 bin,
indicating these regions to be more enhancer-like (Fig. 2d). Focusing on the low-methylated regions in adipose instead, we observed enrichment
in the top-ranking H3K4me1 bins (Fig. 2d), with 25 % of shared and 26 % of adipose-specific regions overlapping with the top
1 % of H3K4me1 bins. However, this overall lower enrichment of low-methylated regions
in H3K4me1 bins may lie in the lower specificity of the enhancer-associated mark being
more generally associated with gene activity and also localizing, e.g., to promoters
30]. Indeed, we observed low methylated regions to be enriched in the top ranking H3K4me1
bins when compared to the percentage of overlap in bins ranking lower than the top
1 % (Additional file 2: Figure S3C).

The invariable CpG landscape

Our recent survey of methylation variation in the complete MuTHER adipose cohort (N?=?648)
using the Illumina 450 K array indicated that the majority of targeted CpGs show low
variability in methylation levels 14]. However, whether this low variability included also purely invariable methylation
states of CpGs was left unexplained owing to the technical limitations of array-based
methods. We thus aimed here to study this in greater detail using our variant-removed
datasets to look for constitutively unmethylated (0 %) or fully methylated (100 %)
CpGs. We defined population invariable CpGs as being detected in two or more individuals
per tissue and displaying methylation value standard deviation (SD) of zero across
individuals. We determined invariable CpGs for each tissue and distinguished between
tissue-specific (methylation SD?=?0 in one and SD??0 in the other tissue) and shared
(methylation SD?=?0 in both tissues). We detected 26 % and 6 % of CpGs to be invariable
in adipose and blood, respectively, highlighting that invariability in DNA methylation
is more frequent in adipose than in blood. This ratio changed only slightly at higher
CpG sequence coverage (Additional file 1: Table S4), excluding the possibility that lower average genomic coverage in adipose
leads to the discovery of fewer variant sites and indicating that higher cellular
heterogeneity of blood drives variance. Most invariable CpGs in adipose were tissue-specific
(10 % were shared with blood), whereas the opposite was seen in blood (i.e., 80 %
of invariable CpGs in blood were also invariable in adipose). Fully methylated CpGs
(100 %) comprised 80 % of all invariable CpGs within adipose, 35 % within blood, 85 %
of adipose-specific, 65 % of blood-specific, and 36 % of shared total invariable CpGs
(Additional file 2: Figure S4A), identifying tissue-specific invariable CpGs to be mainly methylated
whereas shared CpGs are predominantly unmethylated. The lower frequency of methylated
CpGs in blood could be a reflection of greater cellular diversity compared to adipose
tissue.

We also investigated the distribution of unmethylated (0 %) and fully methylated (100 %)
invariable CpGs within genomic features by overlapping with CGI features, genic regions
(see “Methods”), and with NIH Roadmap H3K4me1 and H3K4me3 ChIP-seq data (available
for adipose only). Compared to background (all CpGs detected in two or more individuals),
unmethylated CpGs in adipose displayed a 12-fold to 14-fold significant (Fisher’s
exact test p??1.0?×?10
-16
) enrichment in promoter-associated regions (CGIs, TSS200, exon 1, H3K4me3), and slighter
3.5-fold significant enrichment (p??1.0?×?10
-16
) in enhancer-associated regions (H3K4me1). Vice versa, these CpGs were significantly
depleted in low CpG-context and intergenic regions. Adipose-specific and blood-specific
unmethylated CpGs showed very similar genome feature association trends, whereas shared
CpGs showed an even stronger enrichment in promoter regions, ranging between 15-fold
and 17-fold (p??1.0?×?10
-16
; Additional file 2: Figure S4B).

Genome features of fully methylated CpGs displayed an opposing pattern: invariable
CpGs in adipose showed a ~3-fold significant (p??1.0?×?10
-16
) depletion in CGIs, TSS200, and exon1, and 43-fold significant (p??1.0?×?10
-16
) depletion in regions covered by H3K4me3. Additionally, a 2.6-fold depletion (p??1.0?×?10
-16
) was detected in the enhancer-associated H3K4me1 histone mark. Similar trends were
observed for methylated adipose-specific, blood, and shared invariable CpGs (Additional
file 2: Figure S4C).

When selecting invariable CpGs in a more stringent approach, requiring CpGs to be
detected and invariable in five or more, or 10 or more individuals, we observed the
same trends as described above with even more pronounced enrichment of unmethylated
CpGs in promoter-associated and enhancer-associated regions (Additional file 2: Figures S5 and S6).

Population differentially methylated CpGs

To identify CpGs displaying differential methylation in a healthy population, we estimated
the difference in CpG methylation of overlapping covered CpGs between any two samples
(Fisher’s exact test) to methylation levels detected at these CpGs to determine the
significance in sequence read distributions. We identified these CpGs as differentially
methylated CpGs in population (pDMCs). Using the sequence variant-containing datasets,
we detected median differential methylation to be 55 % in adipose and 46 % in blood
(Additional file 2: Figure S7A) with extreme differential methylation (60 % difference) detected at
5.0 % and 3.7 % of total CpGs in adipose and blood, respectively.

We then overlapped the significant pDMCs with the variant-removed dataset (i.e., dbSNP137-annotated
variants excluded) and found that ~25 % of CpGs displaying 10 % differential methylation
overlapped with a SNP, this being equivalent to the amount of overall CpGs overlapping
a sequence variant (see “Global CpG methylation patterns” above). However, with increasing
differential methylation we observed higher confounding variant bias, with pDMCs displaying
90 % of differential methylation being ~95 % confounded by an SNP (Fig. 3a). These findings were shared 31] and highlight the importance of variant filtering for accurate interpretation of
differential methylation analyses in unrelated individuals, as we recently showed
in public datasets 31]. Removing SNPs among the significant pDMCs reduced the detected extreme differential
methylation levels (60 % difference) to 3.3 % and 1.9 % in adipose and blood, respectively.
Concomitant with this we observed a drop in median methylation levels in both tissues
(median
Adipose
?=?50 %; median
Blood
?=?41 %; Additional file 2: Figure S7A). Overall, we determined the proportion of pDMCs on total CpGs on autosomes
to be 14 % in adipose (40 % of these were shared with blood) and 22 % in blood (26 %
of these were shared with adipose) after variant filtering (Fig. 3b). Compared to adipose, we observed lower pDMC methylation levels (Additional file
2: Figure S7B) and a higher percentage of pDMCs on total detected CpGs in blood, which
might be explained by the higher cellular heterogeneity of blood, which presumably
blunts methylation differences of individual cell populations and increases the number
of potentially detectable pDMCs, respectively. Similar differential methylation distribution
trends were observed for tissue-specific and shared pDMCs in both tissues (Additional
file 2: Figure S7C, D).

Fig. 3. Impact of confounding sequence variants on differential methylation and percentage
of population differentially methylated CpGs (pDMCs) on total CpGs. a Significant DMCs (Fisher’s exact test p??0.05) determined in filtered blacklisted
region-removed but SNP-containing datasets were overlapped with dbSNP137-annoated
variants for all samples. Shown is the percentage of DMCs overlapping with annotated
SNPs in indicated differential methylation increments. b The bar plot displays the percentage of pDMCs on total detected CpGs for adipose
and blood in variant-removed datasets. Shared pDMCs are indicated in purple

We next sought to determine the proportion of significant pDMCs that are due to different
cellular heterogeneity across the 27 individuals. We thus correlated methylation levels
of each pDMC with proportions of specific blood cell types (i.e., neutrophils, lymphocytes,
monocytes, and eosinophils) where at least 10 individuals were covered. We found that
17.5 % of the blood pDMCs were strongly correlated with different proportions of blood
cell types (Pearson’s R??0.5, p??0.05) and this relationship was stronger with increased variation in methylation
levels. More specifically, when we restricted the analysis to the top 1,000 pDMCs
based on SD, we noted that as much as 24.1 % of the pDMCs correlated with blood cell
type proportions (Pearson’s R??0.5, p??0.05).

We also investigated the genomic features of our significant pDMC (Fisher’s p??0.05; detected in one or more comparison) by overlapping their location with the
above introduced genomic regions (Additional file 2: Figure S8): we noted that compared to background (all CpGs detected in two or more
individuals) adipose pDMCs were significantly depleted in all CGI-associated and genic
regions (Fisher p??1.0?×?10
-16
), and vice versa enriched in low CpG contexts and intergenic regions (p??1.0?×?10
-16
). Strongest depletions were detected in gene promoter regions (3-fold to 5-fold in
TSS200, exon 1, H3K4me3; 22.7-fold in CGIs). Blood pDMCs displayed a similar feature
association although less pronounced significant depletions (p??1.0?×?10
-16
) were observed in CGI-associated regions, e.g., only 4.5-fold in CGIs. Tissue-specific
pDMC feature associations were similar to the ones obtained for the corresponding
“complete” adipose or blood, respectively, whereas shared pDMCs associations were
similar to the ones observed in adipose (Additional file 2: Figure S8).

Population differentially methylated regions

Given that our WGBS data were at a relatively low depth at an individual level, we
next took advantage of the complete population tissue cohorts to investigate regions
of clustered methylation variability, or pDMRs. We determined pDMRs by weighting the
variance of CpG methylation across individuals as well as the consistency of CpGs
within the region (see “Methods”). For this analysis we considered CpGs covered in
three or more individuals, leaving 14.7?×?10
6
CpGs in adipose and 17.8?×?10
6
CpGs in blood. We selected the top 10 % of pDMRs covering 23.8?×?10
5
regions in adipose and 26.2?×?10
5
regions in blood for further analyses (Additional file 1: Table S5). Of these top pDMRs, we noted that 34–37 % were shared across tissues
(Additional file 1: Table S5). Similar to pDMCs, we found that 24.5 % of CpGs mapping to pDMRs were
significantly associated with blood cell type proportions (Pearson’s R??0.5, p??0.05).

Genome feature association results identified pDMRs to be significantly (Fisher’s
p??0.05) depleted in gene promoter regions and enriched in enhancer regions irrespective
of pDMR category. For instance, we observed adipose pDMRs to be 3.4-fold depleted
in CGIs (p??1.0?×?10
-16
), and 3.8-fold depleted in H3K4me3-occupied regions (p??1.0?×?10
-16
) but 3.7-fold enriched in H3K4me1-occupied regions (p??1.0?×?10
-16
) (Additional file 2: Figure S9A). This observation is consistent with previous results showing enhancer
regions to be more variable than promoter regions 15], 16], 32]. Overall, pDMC and pDMR genome feature association trends are similar, with the notable
exception of adipose pDMCs (within tissue and tissue-specific) not showing enrichment
in H3K4me1-occupied regions.

We then focused on adipose pDMRs by separating regions based on methylation level:
15 % of pDMRs were identified to be lowly methylated (50 %) and 85 % of pDMRs were
highly methylated (?50 %). Compared to all CpGs detected in three or more individuals,
low-pDMRs were 12.5-fold enriched in enhancer-associated H3K4me1 regions (p??1.0?×?10
-16
); high-pDMRs displayed a 10.9-fold depletion in gene promoter-associated H3K4me3-occupied
regions (p??1.0?×?10
-16
) and, compared to low-pDMRs, only slight but significant enrichment in the enhancer-associated
H3K4me1-occupied regions (p?=?3.7?×?10
-9
) was noted (Fig. 4a). Observed low-pDMRs displayed strong enrichment in enhancer regions; to identify
TFs binding in these potentially open chromatin regions and associated functions,
we carried out a TF binding site (TFBS) motif analysis within these low-pDMRs mapping
to the enhancer regions against all remaining low-pDMRs (i.e., those not overlapping
such regulatory element) 33]. We found that the binding motif of estrogen-related receptor alpha (ERRA)—a TF known to be involved in adipogenesis, energy metabolism, and lipid synthesis
34], 35]—to be the most significantly (p?=?1.0?×?10
-97
) enriched TFBS in these regions. Additional significantly enriched motifs included
TFBSs for peroxisome proliferator-activated receptor ? (PPARG; p?=?1.0?×?10
-84
), retinoid X receptor (p?=?1.0?×?10
-65
), uclear factor 1/CAAT-binding transcription factor (p?=?1.0?×?10
-78
and p?=?1.0?×?10
-67
), and activator protein 1 (p?=?1.0?×?10
-53
), which are all TFs known to be involved in adipocyte differentiation 36]–38] (Fig. 4b, Additional file 1: Table S6).

Fig. 4. Genomic feature association of adipose-specific population differentially methylated
regions (pDMRs), and transcription factor binding site (TFBS) motif analysis for adipose-specific low-methylated pDMRs. We determined pDMRs using
a novel algorithm that weights the variance of CpGs methylation across individuals
as well as the consistency of CpGs within the region (see “Methods”). a We associated the top 10 % of adipose-specific pDMRs with genomic features. The fold
change of pDMR versus background (all CpGs detected in three or more individuals)
is shown for each genomic feature. The red line demarks the border between enrichment (relative change??1) and depletion (relative
change??1). b TFBS analysis was carried out for adipose-specific low-methylated pDMRs using the
Homer software 33]. Shown are selected TFBS motifs including overall rank in the Homer analysis, p-value, forward consensus binding sequence, associated function, and references. The
complete TFBS motif analysis result is available in Additional file 1: Table S6

Finally, focusing on adipose high-pDMRs we carried out a pathway analysis (Ingenuity)
for regions close to the TSS of genes and identified lipid metabolism as the most
significantly associated function (p?=?1.5?×?10
-5
). In addition to this stringent threshold, we also identified pDMRs with less stringent
thresholds, selecting the top 20 % and 25 % of CpGs per tissue (Additional file 1: Table S5), and observed genome feature associations similar but less pronounced
to the ones for stringently (top 10 %) selected pDMRs (Additional file 2: Figure S9B-E).

Differentially methylated CpGs of genetic versus environmental origin

Inter-individual methylation variation can be attributed to genetic and non-genetic
effects. Non-genetic effects can be further divided into familial (i.e., shared environmental)
and non-familiar (i.e., unique/non-shared environmental or stochastic) effects. Twin
studies provide the ideal model to calculate the proportion of methylation variation
attributable to these different factors. Here, we calculated intra-class correlations
(ICC) of the identified adipose pDMCs that were covered in at least five MZ and five
DZ adipose samples (N?=?4,798) to estimate additive genetic effects, shared environmental
effects, and non-shared environmental effects. We found that for 37 % of the pDMCs,
additive genetic effects accounted for 30 % of the total variance in methylation.
In contrast, shared environment seem to have a minor role; only 3 % of the pDMCs had
shared environment contributing 30 % to adipose methylation variation, indicating
that the remaining proportion of the non-genetic variance was due to non-shared environment
and/or stochastic factors. In fact, 60 % of the pDMCs were estimated to have non-shared
environment accounting for 90 % of the variance. Interestingly, these environmentally
driven pDMCs were shown to be depleted in promoters (p?=?2.4?×?10
-13
) but enriched in coding regions (p?=?3.9?×?10
-4
) when compared to all pDMCs detected in adipose tissue (Additional file 2: Figure S10).

We sought to characterize these abundant non-shared environmental DMCs in more detail;
for simplicity, we refer to these CpGs as “environmental” in origin or eDMC. For this
purpose we estimated DMCs within two MZ pairs per tissue sequenced at moderate coverage
(MZ2, MZ3; ?10-fold in adipose, ?19-fold in blood; Additional file 1: Table S1) and compared these with a similarly powered analysis in unrelated individuals.
First, at p??0.05 (Fisher’s exact test) we identified on average 1.13-times (adipose) and 1.3-times
(blood) more DMCs in unrelated samples versus MZ twins, confirming genetic effects
on CpG methylation variation when overlapped SNPs are accounted for. Second, as shown
in the heritability analysis, a large proportion of DMCs seemed to be of non-shared
environmental origin (i.e., eDMCs) and we estimated non-shared environmental effects
to account for over 75 % of methylation variation in blood (Additional file 2: Figure S11A). To study this pattern further we analyzed the identical set of blood
samples (MZ2 and MZ3) using a targeted capture-based approach for bisulfite sequencing
of functional elements, MCC-seq, that we recently introduced 27]. Carrying out the same analysis in this independent higher coverage dataset (MZ2:
22-fold; MZ3: 25-fold) we estimated eDMCs to account for 64 % of all detected DMCs.
This lower proportion of eDMCs in targeted MCC-seq versus WGBS was in fact not due
to differential coverage but rather due to the predominance of enhancer/promoter regions
targeted in the former method. More specifically, when we filtered the WGBS-derived
DMCs to only cover overlapping MCC-seq functional elements, we found the same proportion
(65 %) of total DMCs (Additional file 2: Figure S11A). These findings are in agreement with genome feature annotation of
purely environmentally driven adipose pDMCs from the twin analysis, showing genic
regions to be influenced to a larger extent by unique environmental or stochastic
factors (Additional file 2: Figure S11B).

Functional characterization of differentially methylated CpGs of non-shared environmental
origin

We next investigated whether eDMCs cluster in regions and can be linked to potential
functional relevance. We screened 500 bp regions upstream and downstream of each eDMC
identified at nominal significance (p??0.05), and defined regions containing three or more eDMCs displaying unidirectional
differential methylation as eDMRs, with overlapping eDMRs being merged. In adipose,
we identified 54 and 75 eDMRs, comprising 0.27 % and 0.32 % of eDMCs in the MZ2 and
MZ3 pairs, respectively (false discovery rate [FDR]???0.1 % based on permutation test).
In blood, we observed 923 and 386 eDMRs, comprising 1.66 % and 0.82 % of eDMCs in
the MZ2 and MZ3 pairs, respectively (FDR???0.1 % based on permutation test). All identified
eDMRs were tissue-specific and not shared among MZ pairs. We then sought to test for
potential tissue-specific regulation of eDMR-associated genes and thus overlapped
eDMRs with RefSeq-annotated genes. RNA-seq-derived gene expression data were available
for subcutaneous adipocytes as well as for normal primary B-cells, T-cells, and monocytes
(all samples from an unrelated study cohort) 27]. First, we compared the expression level for eDMR-associated genes (eDMRs were mapped
to genic regions including 10 kb downstream of the TSS) between adipocytes and hematopoietic
cell types. eDMR-associated genes were significantly enriched in genes with higher
expression in adipose tissue than in blood, with enrichment increasing from 1.34-fold
for genes solely over-expressed (z-score derived p??0.001) in adipose tissue to 1.8-fold in genes 16-fold over-expressed (p??0.001) in adipose (Fig. 5a), suggesting that adipose eDMRs map to dynamic genes specific to adipose tissue.
For blood-associated eDMRs we did not observe similar enrichment in genes with elevated
expression in hematopoietic cells (Fig. 5b). However, pathway analysis of blood eDMR-associated genes identified hematological
system development and function (p?=?1.1?×?10
-3
) as well as humoral immune response (p?=?1.1?×?10
-3
) as the top associated bio functions.

Fig. 5. Characterization of differentially methylated CpGs of non-shared environmental origin
(eDMR). a eDMRs identified in adipose were associated with genes by overlapping with RefSeq-annotated
genes (defined as genic regions including 10 kb downstream of the transcription start
site); DeSeq normalized total stranded RNA-seq data were used to determine fold expression
changes for RefSeq genes between subcutaneous adipose tissue and T-cells, monocytes,
and B-cells. The number of genes significantly (DeSeq-adjusted p??0.05) overexpressed in adipose versus each hematopoietic cell type (1-fold, 4-fold,
16-fold higher expression in adipose) was determined for eDMR-associated and all
RefSeq genes; the latter to identify dynamic adipose-specific genes. Shown is the
ratio of eDMR-associated versus RefSeq genes averaged across the three hematopoietic
cell types corrected for background at indicated fold overexpression levels in adipose.
Stars indicate z-score-derived p-values 0.001. b Same as (a) but comparing eDMR-associated genes overexpressed in hematopoietic cells
to adipose. c (top) UCSC-derived scheme of the HSPA12B RefSeq gene locus and CGIs in the gene region (hg19). (bottom) Zoom into HSPA12B gene region overlapping CGI CpG: 121. For each twin within MZ2, methylation levels
are shown for each detected CpG site. The trendl ine was determined using a moving
average (period?=?2). Methylation values highlighted in black in co-twin 1 indicate significant eDMCs (Fisher’s exact test p??0.05), the associated eDMR is highlighted in rose. d (top) UCSC-derived scheme of the FAM171A2 RefSeq gene locus and CGIs in the gene region (hg19). (bottom) Zoom into FAM171A2 gene region overlapping CGI CpG: 136. Illustration as in (c)

These findings indicate that inter-twin eDMR count differences may arise from differences
in blood heterogeneity. To address this we performed a similar analysis as previously,
correlating pDMC methylation levels of each CpG in the defined eDMRs with proportions
of specific blood cell types (i.e., neutrophils, lymphocytes, monocytes, and eosinophils)
where at least 10 individuals were covered. We found that eDMCs were to a lesser extent
confounded by different cell heterogeneity than all DMCs in the population (12.6 %
versus 24.5 %). Because we observed a striking difference in the number of blood eDMRs
between the two twin pairs (i.e., N
eDMR_MZ2
?=?923 versus N
eDMR_MZ3
?=?386), we also compared the confounding effect of cell type across the two pairs.
We found only a slightly higher number of CpGs impacted by cell type proportions in
MZ2 than in MZ3 (13.3 % versus 11.9 %, p?=?0.42), so the difference in eDMRs between the two twin pairs seem to be only to
a minor extent affected by differences in cell counts. The difference may rather be
explained by different environmental exposure impacting CpG methylation, such as tobacco
smoking, which represents the most commonly studied and proven environmental factor
impacting methylation 39]–42]. Indeed, both MZ2 siblings have been smoking for the past 40 years, whereas the MZ3
siblings had not consumed cigarettes within the last 20 years. To further investigate
the direct impact of smoking on the blood methylome in our study cohort, we compared
MZ2 and MZ3 DMCs to replicate array-derived loci differentially methylated in smokers
versus never-smokers 41], 42], based on the Zeilinger et al. report that methylation levels of former smokers regain
levels observed in never smokers 41]. We observe significant (Fisher’s p??0.05) enrichment of significant DMCs overlapping
known smoking loci compared to all jointly detected CpGs for three out of four comparisons,
and all observed smoking-associated DMC methylation trends match previously reported
ones (Additional file 2: Table S7). Interestingly, within MZ2 siblings we identified a blood eDMR comprising
37 significant eDMCs within a CpG island overlapping the HSPA12B locus (Fig. 5c) reported to be involved in asthma with interaction of environmental tobacco smoke
43].

Finally, further investigation of blood eDMRs identified one region comprising 21
eDMCs overlapping a CpG island in FAM171A2 (Fig. 5d). Interestingly, FAM171A2 was linked to platelet count in a recent genome-wide association study 44]; comparing with all available MZs, we in fact observed a differential platelet count
of MZ2 to be in the highest third. Because this region was not covered in MZ3, this
finding can only be considered an indication that FAM171A2 differential methylation is associated with platelet count.

Inter-tissue epigenetic drift

Variation in CpG methylation within a tissue over time is often referred to as aging
epigenetic drift. Similarly, methylation variation across tissues could then be considered
inter-tissue epigenetic drift. We recently showed that a large proportion of CpGs
associated with common genetic variants are stable across tissues 14]. This, together with our finding that the majority of pDMC are purely of non-shared
environmental origin, may indicate that genetic factors have the ability to limit
inter-tissue epigenetic drift. To test this hypothesis, we first focused on DMCs that
were identified to be shared across tissues (i.e., adipose and blood). We again utilized
the two MZ pairs per tissue sequenced at moderate coverage (MZ2 and MZ3). We selected
all tissue-shared pDMCs (Fisher’s p??0.05) covered in all four adipose MZ2 and MZ3 samples (N?=?155,172). We then enriched
this set of tissue-shared pDMCs to reflect those impacted by genetic variation by
selecting the top 50 % (N?=?77,586) of CpGs displaying th highest “across twin adipose
methylation variance/within twin adipose methylation variance” ratio. We further restricted
our analysis to CpGs also covered in the four blood MZ2 and MZ3 samples and excluded
adipose and blood eDMCs, resulting in 31,167 CpGs. Reads for these CpGs were then
merged within each pair and tissue, followed by an MZ2 versus MZ3 pairwise comparison
that revealed 378 tissue-shared DMCs (Fisher’s p??0.05). Of note, 70 % (N?=?265) of these CpGs showed the same direction of effect
across tissues, which is a significant deviation from the expected distribution under
the null hypothesis (uniform random distribution expected; exact binomial p?=?3.4?×?10
-15
) and most likely reflects genetic effects. We then undertook a similar approach for
eDMCs and selected significant eDMCs (Fisher’s p??0.05) that were shared across tissue (MZ2: N?=?1,098; MZ3: N?=?1,228) and then
merged the reads across tissues. We performed new pairwise analyses within each pair
and noted that only ~50 % of the eDMCs remained significant per pair (MZ2: N?=?555;
MZ3: N?=?640); thus there was no significant deviation from expected effects under
the null hypothesis (uniform random distribution expected; MZ2 exact binomial test
p?=?0.74, MZ3 p?=?0.15).

Second, we used the ICC estimates of the 4,798 adipose pDMC included in the analysis
of the estimation of additive genetic, shared, and non-shared environmental effects
but divided those into adipose-specific and tissue-shared. We then compared the heritability
and non-shared environmental proportions across the two groups. We found the proportions
of heritable CpGs to be significantly larger in the tissue-shared group than in the
adipose-specific group (Fisher’s p?=?0.004-0.04, Additional file 1: Table S8). In contrast, purely environmentally driven pDMCs were more tissue-specific
(Fisher’s p?=?0.02, Additional file 1: Table S8).

Taken together, genetic variation seems to significantly constrain inter-tissue epigenetic
drift as opposed to non-shared environmental factors, which impact the methylome in
a tissue-specific manner.

The non-CpG methylation landscape

Earlier genome-wide investigations employing multiple tissues but few replicates per
tissue have identified that non-CpG and CpH methylation comprise up to 25 % of all
cytosine methylation in embryonic stem cells 2] and induce pluripotent cells 45], decreases upon differentiation, and are near absent in investigated differentiated
cells and tissues including whole blood 2], 45], with exceptional somatic CpH methylation observed in brain tissue and placenta 46]. Notably, the genome-wide CpH methylation in adipose tissue is unknown. Consequently,
we investigated CpH methylation from our adipose and blood WGBS effort. We restricted
our analysis to sites covered four or more reads applying SNP/blacklist filtering
(removal of 4.2 % of Cs). In total we detected 9.63?×?10
8
and 1.00?×?10
9
CpHs in adipose and blood, respectively, with even distribution of CpH coverage on
the forward and reverse strands (Additional file 1: Table S9). CpH methylation was then determined by applying a stringent detection
cut-off of 50 % methylation per CpH in two or more individuals to exclude random
bisulfite conversion artifacts biasing our analysis. Comparing to CpG methylation
of 50 % (in the filtered and variant removed datasets) for each individual, we determined
CpH methylation to comprise on average 9.3 % (SD?=?2.2 %) and 8.6 % (SD?=?3.2 %) of
total cytosine methylation in adipose and blood, respectively. Across the population
we identified CpH methylation to impact in total 1.7?×?10
6
(0.18 % of all CpHs) and 2.3?×?10
6
(0.23 %) CpHs in adipose and blood, respectively. We observed a dinucleotide methylation
frequency of CpA??CpT??CpC—a trend that is consistent with previous WGBS studies
2], 3], 18], 19], and roughly one third CpHpG and two thirds CpHpH methylation, with even overall
strand distribution (Additional file 2: Figure S12A). Genome feature association of CpH methylation compared to CpH background
(all CpH detected in two or more individuals) revealed in adipose and blood a 1.6-fold
to 2.9-fold depletion (Fisher’s p??1.0?×?10
-16
) in regions displaying no or low CpG methylation (CGI, TSS200, H3K4me3, H3K4me1)
(Additional file 2: Figure S12B). This indicates that CpH methylation in these two tissues occurs mainly
in the presence of CpG methylation, and might occur by potentially “erroneous/unspecific”
methylation through the CpG methylation machinery.

To study this in more detail we investigated the relationship between CpH methylation
and methylation of nearby CpGs and found for 95 % of methylated CpHs in adipose and
97 % in blood that the closest CpGs displayed ?50 % methylation, with an average distance
between CpH and CpG of ~150 bp. When restricted to the closest detected CpG within
10 bp only, these numbers increased to 99.7 % in adipose and 99.6 % in blood, underlining
the possibility of potential concomitant CpH methylation in close proximity to CpG
methylation. Next, we sought to characterize the remaining 5.2 % and 3.2 % of CpHs
in adipose and blood, respectively, that were ?50 % methylated but in proximity to
CpGs displaying 50 % methylation (highCpH-lowCpG). We found that 40 % of adipose
and 59 % blood highCpH-lowCpG methylation was shared between tissues. To investigate
whether highCpH-lowCpG methylation is clustered, we screened for concomitant methylation
within 500 bp around each site within individuals. Considering ?3 CpHs as a cluster
we detected 62 % of these sites to be clustered per tissue. As expected, genomic feature
association identified adipose and blood highCpH-lowCpG methylation as enriched in
promoter-associated and enhancer-associated regions (Fisher’s p??1.0?×?10
-16
, Additional file 2: Figure S12C). However, in both tissues we also observe as 2.5-fold enrichment in
CGI shores (Fisher’s p??1.0?×?10
-16
, Additional file 2: Figure S12C), suggesting that highCpH-lowCpG methylation might be observed in areas
of CpG methylation transition, i.e., be located between unmethylated and methylated
CpGs and be detected as highCpH-lowCpG as it is in closer proximity to unmethylated
CpGs.