A novel approach for multi-SNP GWAS and its application in Alzheimer’s disease

Until recently linkage studies were the best approach to identify genes responsible for genetic diseases. However, linkage studies are most successful in monogenic disorders with highly penetrant variants, and were most often only possible in families. Unfortunately, the majority of genetic diseases are complex and therefore their genetic architectures could not be studied with linkage studies. Following completion of the human genome and the development of accurate SNP arrays, the Human HapMap project [1] cataloged the majority of common variants in the human genome. This catalog of SNPs facilitated the creation of genome-wide association studies (GWAS) [2]. In a GWAS, the co-occurence of a given SNP and a phenotype are assessed. SNPs present (or absent) significantly more often in individuals with a particular phenotype or more extreme phenotype, are reported as disease markers (i.e., genomic variation correlated with a trait, but not necessarily causative). GWAS can be used to study both quantitative and binary phenotypes, and were the first effective approach for studying the genetics of complex traits.

In 2005, Haines et al. [3] conducted the first GWAS, examining statistical significance between single SNPs and age-related macular degeneration. In the decade since, the success of this technique has been used to identify genetic factors associated with dozens of traits (e.g., coronary heart disease, type-1 diabetes, type-2 diabetes, rheumatoid arthritis, Crohn’s disease, bipolar disorder, hypertension, Alzheimer’s disease, and others [4]). The GWAS catalog co-curated by the National Human Genome Research Institute and the European Bioinformatics Institute contains reported associations for thousands of GWAS [5].

Unfortunately, GWAS have several limitations. First, most GWAS markers are thought to be non-functional, so while the marker may provide insights into regions of the genome important to a particular phenotype, unless the marker is functional it typically does not provide information about the specific biological mechanisms driving disease. Second, the majority of GWAS markers have only a very modest effect on risk for disease. Third, despite thousands of GWAS performed using progressively bigger datasets, collectively GWAS SNPs only explain a portion—often a small portion—of the total estimated genetic variance for a given trait [6].

A number of explanations exist for the unexplained genetic variance. One possibility is that gene-gene (i.e., epistasis) interactions are a feature of the genetic architecture of these traits. GWAS assume an additive model (i.e., SNPs confer disease risk independent of other SNPs) and therefore cannot be used to detect epistatic interactions. Numerous approaches have been attempted to identify epistasis including multifactor dimensionality reduction, regression (i.e., GWAS with interaction terms), and others [7], each with different pros and cons. In this manuscript we present a novel multi-SNP GWAS approach for identifying epistatic interactions and demonstrate its utility in Alzheimer’s disease (AD).

AD is the most common cause-of-death with no effective treatments and has a rapidly increasing incidence worldwide [8]. Additionally, AD is the ideal phenotype to use to demonstrate the utility of our approach for two reasons: 1) epistasis has a role in the genetic architecture of AD [7, 9, 10], and 2) despite very large GWAS (Table 1) and the identification of several rare SNPs [1113], a substantial portion of the genetic variance remains unexplained [14].

Table 1

Genes most highly associated with Alzheimer’s disease