Genome- and epigenome-wide association study of hypertriglyceridemic waist in Mexican American families


Study participants

Participants in this study were from the San Antonio Family Heart Study 28], 29], an ongoing prospective evaluation of Mexican American families living in San Antonio.
A total of 850 participants from 39 families were included in the analysis. The recruitment
and ascertainment of participants and their phenotyping and genotyping have been extensively
described elsewhere 29], 75]. This data was collected in the third wave of ascertainment (2002–2006). Peripheral
blood samples were collected following an overnight fast and extensive anthropometric
phenotyping was conducted.

Ethics, consent, and permissions

Written consent was obtained for all individuals in this study. This study was approved
by the Institutional Review Board at The University of Texas Health Science Center
at San Antonio.

Phenotypes

Our main phenotype of interest was HTGW, which does not presently have a uniform definition
across studies 5], 6], 39], 44], 45], 76]. In our study, we defined HTGW as high waist circumference (?90 cm in males and ?85 cm
in females) combined with high serum triglyceride concentration (?2.0 mmol/L in males
and ?1.5 mmol/L in females) 5], 33]–36]. Other phenotypes included in this study were: age, sex, systolic and diastolic blood
pressure, use of anti-lipid, anti-hypertensive and anti-diabetic medications, presence
of type 2 diabetes (diagnosed using the ADA criteria 77], fasting glucose ?7 mmol/L), triglycerides, waist circumference, and presence of
obesity (body mass index ?30 kg/m
2
). Methods used to measure these and other phenotypes have been described previously
29], 75].

Genotyping

Study participants were previously genotyped for approximately one million single
nucleotide polymorphism markers using several Illumina genotyping arrays, including
the HumanHap550v3, HumanExon510Sv1, Human1Mv1, and Human1M-Duov3. The Infinium Whole-Genome
Genotyping Assay was employed according to manufacturers’ instructions. Details of
the data cleaning and imputing steps for this genotypic data have been described previously
20]. Out of a total of 995,320 SNPS genotyped, we incorporated 759,809 SNPs into the
association analyses, which had ?97 % call rate, a minor allele frequency ?5 % and
a Hardy-Weinberg significance value ?0.001.

DNA methylation assays and data preprocessing

DNA samples (500 ng) obtained from peripheral blood cells were bisulfite-converted
using the EZ-96 DNA Methylationâ„¢ Kit (Zymo Research, Irvine, CA) and were subjected
to methylation profiling using the Illumina Infinium HumanMethylation450 BeadChip
assay (Illumina, San Diego, CA). The array interrogated 485,577 CpG sites across the
genome and incorporated both Infinium I and Infinium II bead types. The Illumina iScan
was used to scan the BeadChips and raw data was imported into GenomeStudio (V2011.1)
to extract image intensities, following background subtraction and normalization to
inbuilt controls on the arrays.

Methylation at each CpG site was quantified on a scale from zero (representing fully
unmethylated) to one (representing fully methylated) as a methylation score (?). Probes
that were located on the sex chromosomes (?=?11,648), that were non-CpG loci (?=?2,994) or that analyzed SNPs (?=?65) were excluded. To correct for differences due to Infinium I and Infinium II
probe designs, we used the BMIQ method implemented in the R-based software, BMIQ 78].

We included only those probes for which heritability analyses could be successfully
completed without convergence failures. For 12,154 (2.5 %) probes SOLAR was unable
to achieve convergence, leaving a total of 458,716 CpG sites available for analysis.
Of these, 1385 probes had detection p values 0.01 in 5 % of the samples and therefore we excluded these probes leaving
a total of 457,331 CpG sites that were finally included in this study. To minimize
loss of informative associations, we did not exclude SNP-containing or cross-reactive
probes but rather investigated whether the significantly associated CpG sites could
have been confounded due to these characteristics.

Pyrosequencing

For validation of the Illumina microarray data, we performed pyrosequencing on our
most significant association (cg00574958 in the CPT1A). For each sample, 500 ng of genomic DNA was bisulfite converted, PCR-amplified,
and subjected to pyrosequencing with the PyroMark96 MD (Qiagen, Valencia, CA). Percent
DNA methylation was determined using PyroMark CpG software 1.0.11.14. The PCR was
carried out at 95 °C for 5 min, followed by 40 cycles of 95 °C for 1 min, 56.6 °C
for 1 min and 72 °C for 1 min, and a final extension at 72 °C for 7 min. Pyrosequencing
was carried out according to the manufacturers’ instructions, and PCR and sequence
primers (designed using the Pyromark Assay Design 2.0 software) were designated as

Forward primer: GTTTTTGGTATTGAGGTAAAATTAA

Reverse primer (biotinylated): AACCTTTCCAAATTCTTTAAAAC

Sequence primer: TTTTTGGTATTGAGGTAAAATTAAT

Statistical analysis

To ensure compatibility with the variance component framework and that the observed
associations were unaffected by any undetected skew, we used an inverse normalization
preprocessing step for the BMIQ-normalized methylation ? values to circumvent any distributional aberrations. This step included ranking the
raw values, generating cumulative density functions, and determining z-values based
on the cumulative densities. All the transformed values were thus distributed as N(0,1)
and could be represented on a common, comparable metric of z-values.

Family studies such as the present one have an added advantage that they can shed
genetic light on the phenotypes under study. We first estimated the heritability (defined
as the proportion of variability explained by genetic similarity among individuals)
of HTGW using a polygenic regression model as follows:

where l(HTGW) is the liability function of HTGW, ? is the overall mean liability, b is the regression coefficient vector corresponding to the covariate matrix a , g i
is the polygenic effect, and e i
is the measurement error. Heritability was then estimated as the ratio of variance
due to genetic similarity (modeled as g i
in the equation above) and the total phenotypic variability, ?. The covariates used for the estimation of heritability of HTGW were: age, age
2
, sex, age × sex interaction, age
2
× sex interaction, use of anti-lipid, anti-hypertensive and anti-diabetic medications,
systolic blood pressure, diastolic blood pressure, and presence of type 2 diabetes
and obesity (body mass index ?30 kg/m
2
). Statistical significance of heritability (Ho
: heritability?=?0) was tested by constraining the heritability to 0 and comparing
the likelihood ratio statistics of the constrained and unconstrained models.

For genome-wide association analyses, we used polygenic regression models that accounted
for age, age
2
, sex, age × sex interaction, and age
2
× sex interaction, the top four principal components to quantify ancestry-based population
admixture, and use of anti-lipid, anti-hypertensive, and anti-diabetic medications.

Our methylation studies used blood which contains a mixture of cell types. Reinius
et al. 79] and Houseman et al. 80] have demonstrated the influence of differential cell proportions on DNA methylation
signatures using different array platforms. Jaffe et al. 81] have extended this procedure to the Illumina Infinium HumanMethylation450 array.
We estimated the proportion of CD4+ T cells, CD8+ T cells, B cells, natural killer
cells, and granulocytes in each sample using the procedure described by Jaffe et al.
81] and adjusted all the polygenic regression models for these estimated cell counts
as covariates. To test the association of DNA methylation at each CpG site with the
liability function of HTGW, we ran polygenic regression models for each CpG site.
In each model, we used age, age
2
, sex, age × sex interaction, and age
2
× sex interaction, Illumina Sentrix® ID, Sentrix® position, estimated cell counts,
and use of anti-lipid, anti-hypertensive, and anti-diabetic medications as covariates.
Additional covariates were used for specific analyses and are described in the Results
section. In particular, where warranted, we accounted for comorbidities (systolic
blood pressure, diastolic blood pressure and presence of type 2 diabetes and obesity)
that might influence the HTGW phenotype. Statistical significance for association
was tested by constraining the regression coefficient to 0 and comparing the likelihood
ratio statistics of the constrained and unconstrained model.

For heritability analyses as well as association analyses, we first estimated the
genomic inflation factor (?median
) which was defined as the median ?2LL
/invchi(0.5,1) where invchi(p,d) is the inverse ?2
function for probability (p) and degrees of freedom (d), and corrected the nominal significance values for the estimated ?median
. Additionally, we used the Benjamini-Hochberg procedure of false discovery rate (FDR)
control for multiple testing correction. Significance was assessed at a global type
I error rate of 0.05. The odds ratio (OR) for the association between HTGW and inverse-normalized
methylation score was determined as where ? represents the polygenic regression coefficient. The association of DNA methylation
with type 2 diabetes that was mediated through HTGW was estimated using Sobel’s parameter
32]. For this, we ran two regression models for each site—the first model contained HTGW
as the outcome and methylation (along with other covariates) as a predictor while
the second model used T2D as the outcome and HTGW, methylation and other covariates
as predictors. The regression coefficients from these two models were then multiplied
to derive a quantified measure of mediation. Standard errors for this parameter were
estimated as described by Sobel 32].

To parse out the genetic and environmental covariance, we used the methods entailed
under bivariate trait analyses 82]–84]. Under this analytical framework, the phenotypic covariance (? P2
) is regarded as a function composed of the additive genetic (? G2
) and environmental (? E2
) covariances between two traits (denoted below as i and j).

These parameters are estimated using an estimation-maximization algorithm by jointly
utilizing all available pedigree information with a multivariate normal model for
continuous traits and liability threshold model for discrete traits 85]–87].

Other statistical methods used were Spearman’s correlation scatter plots and Bland-Altman
plots to test for the agreement between different methods of measuring DNA methylation.
Heritability, association, and bivariate trait analyses were conducted using the SOLAR
software package 88], and all other statistical analyses were conducted using the Stata 12.0 (Stata Corp,
College Station, TX) package.