An integrative systems genetics approach reveals potential causal genes and pathways related to obesity

Association of gene expression with degree of obesity

The detection of DE genes can lead to a better understanding of genetic and biological
differences between two different conditions and to the detection of predictive biomarkers
51], 52]. In this study, we have used the level of obesity as a continuous variable, whereby
we correct for the effect of sex. In total, we found 458 DE genes (FDR??0.05), with
a ? j,OI
ranging from ?0.42 to 0.48. All DE results are presented in Additional file 1.

The heatmap of the top 100 DE genes (Fig. 2) shows that mostly those from the lean and obese animals cluster together within
each group. However, those from intermediate animals cluster with both the obese and
lean groups. This is possibly due to the OI values of the intermediate group being
borderline between the two groups. The genes partly clustered based on the direction
of regulation (up or down). The top 10 DE genes include TAS1R3, CSGALNACT1, MAML3, ROM1, LRRC16B, EML5, RPS12 (all downregulated in obese animals), and ADAM9 (upregulated in obese animals). We discuss here only selected genes from this group.
The TAS1R3 gene encodes a taste receptor, which has been associated with sweetness responsiveness
in mice 53]. Taste receptors influence the perception of food 54], and therefore affect eating behavior 55], which might be associated with obesity 56], 57]. The CSGALNACT1 (chondroitin sulfate N-acetylgalactosaminyltransferase 1) gene plays a role in the
process of glucuronidation: the addition of glucuronic acid to a substrate. This gene
has been associated with Bell’s palsy (dysfunctioning of the facial nerve, resulting
in facial paralysis) and Morquio Syndrome b (a metabolic disease, characterized by
the inability to breakdown glycosaminoglycans). However, this gene also has an important
function in the development of osteoarthritis 58], a bone remodeling disease that has been associated with obesity. The 40S ribosomal
protein S12 is encoded by RPS12, which is associated with diabetic nephropathy in African Americans 59]. The final gene, ADAM9, encodes a transmembrane glycoprotein that functions in integrin binding and SH3
domain binding. The ADAM9 gene is involved in the formation of multinuclear cells and, consequently, with osteoclast
fusion 60]; its downregulation might result in reduced bone mass. Bone mineral density is closely
related to obesity: it has been suggested that the increased body load on joints resulting
from obesity might stimulate the formation of bone mass, even though data from studies
remain controversial 61]. However, body adipose tissue is also closely associated with the immune system and
bone remodeling (decreased bone mineral density), by the secretion of adipocytes 62]. In a previous study, we identified several genes that might play a role in these
associations 28]. The downregulation of these genes in obese animals might correspond to the decreased
bone mass via the secretion of adipocytes, but does not correspond to an increased
obesity-induced bone mass.

Fig. 2. Heatmap of the top 100 differentially expressed genes. The intensity of the yellow/red
colors represents the expression level. The rows and columns have been sorted according to the gene clustering tree. The left-hand bar is color-coded according to the upregulation or downregulation in obese animals.
The upper bars below the animal cluster represents the obesity status and the sex of the animals.
DE differentially expressed

Out of the 458 DE genes, 249 were downregulated and 209 were upregulated in obese
animals. Functional annotation analysis was performed for the downregulated and upregulated
genes separately. Surprisingly, almost all over-represented pathways were the result
of the presence of upregulated genes. The over-represented pathways were mostly associated
with the immune system (e.g., Ribosome, P?=?4.54E
?12
and leukocyte trans-endothelial migration, P?=?4.32E
?4
), except for starch and sucrose metabolism (P?=?6.67E
?4
) and osteoclast differentiation (P?=?6.67E
?4
). However, osteoclasts are derived from macrophages and therefore are also related
to the immune system 63], which was shown previously 28]. Within the GO terms, DE genes showed a striking relationship, mainly among the upregulated
genes, with the immune system, (e.g., immune system process, P?=?2.79E
?20
and immune response, P?=?7.92E
?18
). The downregulated genes were mainly involved with functions related to ribosomes
and the translational process (e.g., translational termination, P?=?6.56E
?17
and cytosolic ribosome, P?=?1.33E
?15
), which cannot be linked directly to obesity itself.

The location and annotation of eQTLs

The eQTL mapping led to the detection of 1,070 eQTLs: 987 cis-eQTLs and 73 trans-eQTLs (FDR??0.05). All detected eQTLs are presented in Additional file 2. SNPs used for eQTL mapping were filtered based on a high MAF (0.05). To ensure
that this did not affect results owing to the low number of animals, we investigated
the MAF of the eSNPs. From those results (not presented), it was evident that most
SNPs that were detected as eQTLs had a high MAF. More specifically, only 14 SNPs of
the detected eQTLs had a MAF between 0.05 and 0.10. Based on those results and the
statistical models used to estimate the eQTL effects, followed by appropriate significance
testing, we believe that the results are reliable.

The detected eQTLs were annotated as cis when the distance between the eSNP and gene was within 1 Mb. This was confirmed by
the results, which show that most of the cis-eQTLs were located close to the transcription start site, as expected, because this
is the region where most transcriptional regulation occurs (Fig. 3a). Further investigation of the effect of the eQTLs (Fig. 3b) showed that most were located in intergenic (46 %) and intronic regions (37 %).
In comparison with the complete set of SNPs that passed quality control, we found
that the eQTLs were located more often in intronic regions (8 %) and less often in
intergenic regions (10 %); the frequencies in all other regions were not altered.
These percentages agree with the results of previous studies, which showed that disease-associated
SNPs are mainly located within non-coding regions (~90 %) 64]–66]. We found comparable frequencies of intergenic and intronic SNPs between cis-eQTLs and trans-eQTLs, where SNPs from cis-eQTLs were also located in exonic regions (3? untranslated region [UTR], 5?UTR, mis-sense,
and splice regions). These intergenic and intronic SNPs might be in linkage disequilibrium
with a causative SNP, but might also have a regulatory function on the gene expression
of the eQTL. Of the detected trans-eQTLs, 10 SNPs were located on a different chromosome than the target gene. We further
investigated the distance between the SNP and target gene of the trans-eQTLs that were located on a similar chromosome, and found that a large part of the
trans-eQTL would be detected as cis by increasing the window size by several megabases (Fig. 4a). Furthermore, we investigated the number of target genes per detected trans-eQTL and found that most SNPs were targeting up to five genes (Fig. 4b).

Fig. 3. a Histogram depicting the distance from the expression single nucleotide polymorphism
(eSNP) to the affected transcript of all cis-expression quantitative trait loci, with the transcription start site as a base (distance?=?0 kb).
b Visualization of the variant effects of the eSNPs (both cis and trans) as a percentage. Kb kilobase, UTR untranslated region

Fig. 4. Visualization of the detected trans-expression quantitative trait loci (e-QTLs). a The distance between the single nucleotide polymorphism and target gene of the trans-eQTLs. b The number of target genes per trans-eQTL

Functional annotation of eQTLs

Functional annotation analysis using all detected cis-eQTLs (987) revealed many metabolism-related GO terms and pathways (Table 2). We also found that several adipose tissues (e.g., adipocytes and subcutaneous adipose
tissue) were highly significantly associated with the cis-eQTLs.

Table 2. GeneNetwork results over all cis-eQTLs

In order to identify potential causal genes, we limited the number of eQTL association
tests by confining the eQTL mapping to DE genes (458 genes) and, secondly, to obesity-associated
SNPs resulting from the previously conducted GWAS on the OI (366 SNPs) 27]. Using the restriction of DE genes, we found a total of 36 eQTLs, among which GO
terms and pathways related to cholesterol transport and other lipid process were represented,
for example, protein–lipid complex (P?=?2E
?4
), and lipid digestion, mobilization, and transport (P?=?2E
?5
). Restriction to GWAS SNPs resulted in the detection of 24 eQTLs. Functional annotation
showed that these genes were mainly expressed in adipose tissue (subcutaneous fat,
abdominal; P?=?4E
?3
) and associated phenotypes also showed a link with obesity-related characteristics
(e.g., abnormal triglyceride level, P?=?1E
?3
). Other GO terms and pathways were related to transcription (e.g., viral transcription,
P?=?3E
?4
), metabolism (tyrosine metabolism, P?=?1E
?3
), or immunity (e.g., influenza infection, P?=?2E
?4
). Of those 24 eQTLs, the expression of one target gene was significantly associated
(P??0.05) with the OI: C15orf26. However, this gene does not seem to have any previously discovered association with
obesity or obesity-related diseases. Two other genes tended toward significance: RAB11A and USP36 (P??0.1) RAB11A has been shown to be an element in the GLUT4 trafficking machinery 67] and has been associated with glucose metabolism 68]. To our knowledge, USP36 has not been previously associated with obesity or obesity-related diseases. Unfortunately,
we did not identify any overlap of eQTLs between the restricted analyses of the DE
subset and GWAS subset.

From the 36 eQTLs detected among the subset of DE genes, 35 were also present in the
eQTLs using no restrictions. One of these eQTLs involves the ENPP1 (ectonucleotide pyrophosphatase/phosphodiesterase 1) gene (Fig. 5a) that encodes a type II transmembrane glycoprotein (P?=?1.43E
?4
). This glycoprotein plays a role in the regulation of pyrophosphate levels, and several
other functions are known: for example, bone mineralization and the regulation of
purinergic signaling. More importantly, this gene is highly expressed in adipose tissue
and has been associated with obesity, type-2 diabetes, and insulin resistance 69], 70] and also with eating behavior in a pig population 57]. Moreover, expression of this gene inhibits insulin signaling via reduced insulin-receptor
tyrosine kinase activity 71]. The data here showed a large difference between the AA and GG phenotypes, whereby
the AA animals were lean and GG animals (with an increased expression of ENPP1) were obese, which agrees with data in the literature. For example, the AA animals
(n?=?8) weighed 0.695 kg (standard deviation [SD]?=?0.13 kg) at birth and showed a
mean gain of 0.37 kg/day (SD?=?0.10 kg/day), compared with GG animals (n?=?10), which
weighed 0.923 kg (SD?=?0.12 kg) at birth and gained 0.535 kg/day (SD?=?0.12 kg/day).
At slaughter, the AA animals contained 24.88 mm of backfat (SD?=?7.59 mm) whereas
the GG animals had 39.25 mm of backfat (SD?=?7.19 mm). However, we found no difference
in the fasting glucose levels between the genotypes.

Fig. 5. Barplots of the selected expression quantitative trait loci (eQTLs) with one selected
phenotype per eQTL (with P-value representing the level of significance for difference in phenotype between
the two homozygous genotypes). a–ccis-acting eQTLs. dtrans-acting eQTL. * significantly differentially expressed genes, ** significantly associated
genome-wide with obesity, DXA fat fat estimated using a dual-energy X-ray absorptiometry scan

Another eQTL that has been associated with obesity and insulin resistance is CTSL (cathepsin L), a lysosomal cysteine proteinase (Fig. 5b, P?=?5.55E
?6
). Several studies have investigated the role of CTSL and have shown, for example, that inhibition of CTSL results in limited adipogenesis or lipid accumulation 72] by reducing the levels of pivotal transcriptional mediators of adipogenesis. Moreover,
the pharmacological inhibition of CTSL resulted in reduced body weight gain, and levels of CTSL were elevated in patients who were obese and diabetic 72]. Our results confirm these findings; for example, the AA animals (n?=?11) weighed
0.93 kg (SD?=?0.16 kg) at birth and showed a mean weight gain of 0.44 kg/day (SD?=?0.10 kg/day),
whereas the GG animals (n?=?2) weighed 0.64 kg (SD?=?0.15 kg) at birth and gained
0.32 kg/day (SD?=?0.04 kg/day). Furthermore, the AA animals contained 2.73 kg (SD?=?0.898 kg)
of fat at dual-energy X-ray absorptiometry (DXA) scanning and 2.48 kg (SD?=?1.30 kg)
of leaf fat at slaughter, whereas the GG animals had 1.60 kg (SD?=?0.694 kg) of fat
at DXA scanning and 2.20 kg (SD?=?1.63) of leaf fat at slaughter. The gene function
prediction of GeneNetwork also showed a clear role of CTSL in the regulation of cholesterol and lipids, for example, regulation of plasma lipoprotein
particle levels (P?=?2.98E
?16
) and lipid storage (P?=?3.80E
?14
).

Another detected eQTL was CIDE-C (cell death-inducing DFFA-like effector c), also called fat specific protein 27 (FSP27), which was more highly expressed in the TT genotype than the CC genotype (Fig. 5c, P?=?2.23E
?4
). This gene promotes triglyceride (lipid droplet) formation and has a negative regulatory
effect on adipocyte apoptosis 73], 74]. A CIDE-C knockout model in mice resulted in smaller lipid droplets 75]. Furthermore, CIDE-C is regulated by insulin via the Akt1/2-dependent and JNK2-dependent pathways in adipocytes
76]. Animals with the TT genotype (n?=?8) were heavier (8.29 versus 17.91 kg at DXA scanning)
and showed a higher mean daily gain (0.31 versus 0.51 kg/day) than CC animals (n?=?8)
and had a considerably higher amount of fat than in CC animals: 1.56 versus 3.20 kg
estimated by DXA scanning and 2.01 versus 3.01 kg at slaughter (weight of leaf fat).
These results differ from findings in other studies, where CIDE-C mRNA levels were lower in obese subjects and were negatively correlated with body
mass index and percentage fat mass, but increased in obese patients after weight loss
77]. However, the GEO database also contains studies that show a lower CIDE-C expression for high weight gainers versus low weight gainers [GEO: GDS2319] and a
higher expression in the adipose stem cells of morbidly obese individuals versus non-obese
individuals [GEO: GDS5056]. GeneNetwork identified many adipose-related GO terms and
pathways for CIDE-C using the predicted function, for example, the GO Biological Process triglyceride
metabolic process (P?=?1.29E
?76
) and GO Cellular Component lipid particle (P?=?1.27E
?88
).

In addition to the cis-eQTLs, we detected 73 trans-eQTLs and the functional annotation of this group of genes using GeneNetwork resulted
in the detection of (subcutaneous) adipose tissue as associated over-represented tissue
(P?=?6E
?4
). Furthermore, a wide variety of significant GO terms and pathways were detected,
which were not all directly linked to obesity, for example, excitatory synapse (P?=?10E
?5
). In general, trans-eQTLs provide a fundamental understanding of potential gene-to-gene regulatory architecture
of complex traits and diseases and can also be used to predict transcription factor
binding sites 78]. For the trans-eQTLs, genes and SNPs were restricted to DE genes and GWAS SNPs. Among the trans-eQTLs, only two genes overlapped between all trans-eQTLs and those among the DE genes (GFR?3 and MYH3), and only one gene overlapped between all trans-eQTLs and the trans-eQTLs among the GWAS-significant SNPs (ABHD12B). The GFR?3 gene encodes the artemin receptor, which is a neurotrophin with various functions,
such as nerve regeneration and tumor-cell migration 79], although no direct link has previously been found between GFR?3 and obesity. Similarly, no direct link has been found between MYH3 (encoding myosin, heavy chain 3, skeletal muscle, embryonic) and obesity. Myosin
converts chemical energy into mechanical energy via ATP hydrolysis. Growth characteristics
in cattle and pigs have been associated with MYH3 expression, in addition to a difference in muscle growth between and lean and obese
pigs, which suggests an association between MYH3 and adiposity. However, we observed no difference in obesity-related phenotypes according
to MYH3 expression. The third gene, ABHD12B (?/?-hydrolase domain containing 12B), has also not been previously associated with
obesity (Fig. 5d, P?=?2.36E
?9
). However, it plays a role in lysophosphatidylserine (LPS) metabolism, and ABHD12B knockout mice have a deregulated accumulation of proinflammatory lipids 80]. Furthermore, it has been shown that LPS stimulates glucose transport in adipocytes
81]. In this study, animals with the AA genotype on ALGA0006476 (n?=?12) were more obese
than animals with the CC genotype (n?=?8) and showed a mean daily gain of 0.47 kg/day
(SD?=?0.11 kg/day) compared to 0.38 kg/day (SD?=?0.12 kg/day), and a weight of leaf
fat at slaughter of 3.06 kg (SD?=?1.36 kg) compared to 1.81 kg (SD?=?1.23 kg). The
expression of ABHD12B was higher in CC, suggesting that upregulation or activation of this gene results
in leaner animals. To our knowledge, no other studies have shown significant effects
for the expression of ABHD12B.

Integration of eQTL results with gene co-expression network analysis

Previously, we investigated the RNA-Seq data from this study using a gene co-expression
network approach (WGCNA) 28] and we detected five modules that were potentially biologically associated with obesity-related
characteristics. We hypothesized that modules containing more eQTL genes than expected
by chance would pinpoint modules that are more likely to be causal for the trait under
study. Therefore, we investigated how many eQTLs (out of the 987 detected cis-eQTLs) were present in each of the five modules detected using WGCNA. We found five
eQTLs in the Green-yellow Module (47 genes), five eQTLs in the Brown Module (86 genes),
two eQTLs in the Blue Module (69 genes), and one eQTL in the Black Module (36 genes),
with no eQTLs in the Light-yellow Module. This represents 10.64, 5.81, 2.90, 2.78,
and 0 % of the number of genes in that particular module, respectively, which is unfortunately
not higher than expected by chance (hypergeometric test). The eQTL in the Black Module
is a novel gene (uncharacterized protein), with no known orthologs.

The Green-yellow Module was strongly associated with obesity-related phenotypes, but
in our previous study, the functional annotation did not identify a relationship with
obesity. We have now found five eQTLs in this module: ALDH1L2, GGTA1, KRR1, ME3, and OPTN. The OPTN gene encodes optineurin, a protein that has been investigated intensively in the
neuroscience field, and is associated with primary open-angle glaucoma and amyotrophic
lateral sclerosis 82]. Notably, it also plays a role in adipogenesis, and modulates the developmental switch
into brown preadipocytes 83]. GeneNetwork predicts the adipocytokine signaling pathway (P?=?1.5E
?7
) as the most likely associated KEGG pathway. For the other genes, no obvious association
with obesity or obesity-related phenotypes was found.

In the Brown Module, we found five eQTLs, representing four unique genes: ARF6, PMVK, MSRB2, and two eQTLs in RIN2. The ARF6 gene encodes a small GTP-binding protein that regulates vesicular trafficking actin
cytoskeletal dynamics 84]. The PMVK gene encodes an enzyme that functions in the cholesterol biosynthesis pathway, which
converts mevalonic acid-5P to mevalonic acid 5-pyrophosphate. Furthermore, it has
been shown to be critical in the regulation of the secretion of insulin in pancreatic
? cells 85]. The other genes have not been related to obesity or obesity-related phenotypes.

In our previous study, the Blue Module revealed a potential genetic association between
obesity, the immune system, and bone remodeling (osteoporosis). Therefore, we would
expect that the eQTLs in this module (LAT2 and IGSF6) have a more causal role in this genetic association. Both genes play a role in the
immune system, but have not been previously shown to be directly related to obesity.

Supervised gene co-expression network and integration with protein–protein interactions

Both DE genes and cis-eQTL genes were used as input for sWGCNA, with the focus on potential causal genes
for obesity. The Pearson’s correlations among 1,408 unique genes (987 cis-eQTLs and 458 DE genes) were calculated and raised to a power ?, of three, reaching
a scale-free topology index (R 2
) of 0.93. Using the TOM, we detected eight modules of at least 25 genes per module,
three of which showed strong significant correlations with the OI (MTR
OI
) and other obesity-related traits: the YellowsWGCNA
Module (MTR
OI
?=??0.74), BluesWGCNA
Module (MTR
OI
?=??0.71), and TurquoisesWGCNA
Module (MTR
OI
?=?0.69). The functional annotation of those modules showed strongly significant GO
terms and pathways for only the TurquoisesWGCNA
Module, which were related to obesity and the biological processes lipid localization
(Padj
?=?5.35E
?12
) and lipid transport (Padj
?=?2.39E
?10
) were most significantly over-represented. Based solely on the connectivity of the
genes in this module, ITGB2 (?2-Integrin) is the hub gene (highest connectivity) of this module; however, it
is not included in the module after gene selection based on the gene-trait correlation
(correlation of 0.59 with the OI). According to the gene network prediction of GeneNetwork,
ITGB2 is co-expressed with many other genes, which all possess functions within the immune
system (e.g., activation of immune response). This gene has also previously been associated
with obesity: mutations in ITGB2 have been associated with obesity in mice 86], and fat oxidation and insulin metabolism were impaired in knockout mice 87]. The ITGB2 gene has also been associated with obesity in humans: a polymorphism in this gene
is associated with obesity among Japanese individuals living in the US (westernized
diet) 88]. Based on measures other than connectivity, such as intra-modular connectivity and
gene-trait correlation, other important genes were detected within this module: NCKAP1L, S100A10, VSIG4, and CD68, which have all been associated with immune-related processes. The NCKAP1L gene is also strongly co-expressed with many other genes but is not expressed in
adipose tissue, and functions in immune-related processes. S100A10 encodes the protein p11, which functions in cellular processes such as exocytosis
and endocytosis, and has been associated with serotonergic signaling and, consequently,
with depression 89]. VSIG4 encodes a B7 family-related protein and negatively regulates T cell activation and
IL-2 production 90]. CD68 encodes a glycoprotein that is mainly expressed by monocytes and tissue macrophages
91] and binds to oxidized low-density lipoproteins on the cell surface, and might consequently
play a role in atherosclerotic lesions 92]. We further analyzed and visualized this module in Cytoscape and merged the turquoise
network with the known PPIs from IntAct. The resulting network consisted of 419 nodes
and 3,015 edges (Fig. 6a). We calculated the network statistics and clustered the genes and proteins within
Cytoscape (Fig. 6b). The largest cluster contained 137 proteins that interacted with a single gene:
CALCOCO2 (calcium binding and coiled–coil domain 2). This gene is significantly upregulated
in adipose tissue (P?=?0.0003) according to HumanMine (http://humanmine.org) and has been shown to be negatively correlated with the level of triglycerides in
muscle tissue in a mouse model for human obesity 93]. Two out of the four genes that were detected as potentially important or hub genes
(VSIG4 and CD68) were located within second largest module. The third gene (NCKAP1L) formed a cluster together with five other genes. The fourth gene (S100A10) formed a cluster by itself, with ten other proteins.

Fig. 6. Visualization of the TurquoisesWGCNA
Module in Cytoscape. a The complete network and (b) clustering of the network based on GLay community clustering algorithm. Genes in
the Turquoise Module are turquoise and interacting proteins are orange. The size of the nodes is dependent on the betweenness centrality of the node. Edges
are green to represent a gene–gene interaction (the darkness depends on the weight of the correlation)
and gray represents protein–protein and gene–protein interactions