Genome-wide association study of footrot in Texel sheep

Scoring method and traits

A 5-point scoring method for hoof lesions that follow the progression of footrot was
described elsewhere 27], and more recently was validated for use in the UK 18]. This method differentiates clean, unaffected hooves (score 0) from those with mild
inter-digital inflammation, (score 1), to inter-digital necrosis (score 2), to some
under-running of the sole of the hoof (score 3) and fully under-run to the abaxial
wall of the hoof (score 4). Each separate hoof (left and right hind, left and right
fore) was screened and hence each animal had a potential maximum total footrot score
of 16 per scoring event, with 16 being the worst score (SUM_FR). This latter trait
was the phenotype used to derive estimated breeding values (EBV) for the genomic association
analyses. Data were collected by two trained technical staff in 2006 and 2007 from
17 commercial farms across the UK on Texel sheep that belong to the national performance
recording scheme, Sheepbreeder28]. All farms were requested not to treat their sheep for footrot in the 4-week period
before the farms were visited.

Dataset, records and traits

A total of 3573 records were obtained from 2229 animals i.e. 2875 ewes of mixed age
and 698 lambs that were born between 1998 and 2006. The sheep were scored between
one and three times within an 18 month period averaging 1.60 records per animal. On
average, there were between 84 and 127 animals per farm for each scoring event. The
average age of the lambs at the time of footrot scoring was 158 days. A total of 11
048 individuals were included in the pedigree file with 2723 sires, 6642 dams and
2165 founders. The maximum pedigree depth was 22 generations with a mean of nine generations.
Heritability of footrot in the analysed population was 0.18 (unpublished results).

The first scoring event was carried out during late summer 2006, with scoring dates
between 10th July and 12th September, 2006, the second scoring event took place between 1st and 24th May, 2007 when the ewes were in mid-lactation and the third scoring event was between
18th July and 10th September, 2007. The data were generated as part of a collaborative project with
the Texel sheep society and could be made available for further analyses subject to
the conditions of the original collaboration agreement.

Genotyping

Blood samples from 336 animals were collected in order to extract DNA. Animals were
selected for extreme phenotypes (the highest scores) irrespective of the farm of origin
and these were matched with animals with a 0 score from the same farms. The minimum
number of genotyped animals per farm was 6 with the majority of farms with more than
10 genotyped animals. Animals with a 0 score were sampled from 15 farms with a minimum
of 3 and a maximum of 21 animals per farm. Animals were genotyped commercially with
the Illumina ovine SNP50 BeadChip at Ark Genomics (Edinburgh, UK). Animals with extreme
phenotypes i.e. with no footrot detected on any of the scoring events (179 animals)
or with a score of 8 and above on at least one scoring event (25 animals) were selected
for genotyping. This made up the core of the genotyped population consisting of 204
animals which were subsequently combined with 132 animals with intermediate scores
(between 1 and 7). The genotyped animals originated from 118 sires and 293 dams. They
included 22 full-sib families with two offspring, and 117 half-sib families with an
average of 3.2 animals per family.

SNPs were removed from further analyses if they were not in Hardy-Weinberg equilibrium,
had a minor allele frequency less than 0.05, were monomorphic, had a call rate less
than 0.95 or if the GC score was less than 0.6. Missing genotypes were imputed as
homozygous for the major allele.

Calculation of de-regressed breeding values

Basic statistics describing the data were estimated with R package 29]. After an initial investigation of fixed effects and co-variables, an appropriate
statistical model for footrot was determined by stepwise elimination of non-significant
interactions and main effects. Breeding values were estimated using all available
records (3573) from 2229 animals using the software package MIX99 30] applying the following model:

where y is the vector of footrot scores, b is the vector of fixed effects consisting of birth year (years, 9 levels), type (adult
ewe, lamb), farm (17 levels), and scoring event (3 levels), a is the vector of random animal effects, and e is the vector of random residuals.

Random effects were assumed to be normally distributed with zero means and the following
covariance structure:

where A is the pedigree-based relationship matrix, is the genetic variance, and is the residual variance.

The software package MIX99 was also used for de-regression, using a full animal pedigree
with effective offspring contributions (EOC) as weighting factors. EOC were calculated
as:

where reli is the reliability of EBV for animal i and h2 is the heritability of footrot.

The de-regressing procedure was performed to eliminate bias in the EBV from animals
with different numbers of offspring. De-regressed EBV were used as pseudo-phenotype
in the subsequent GWAS analysis in order to maximise the use of available information
since 100 of the genotyped animals had progeny with footrot records, and half of the
genotyped animals had repeated measurements of footrot (scored two or three times).

Genome-wide association analysis

GWAS was performed using the Multi-Locus Mixed Model (MLMM) algorithm 31] implemented in SNP Variation Suite v7.7.8 (Golden Helix Inc., Bozeman, MT). The
following model with a random polygenic effect and the genotypes at single SNPs as
fixed effects was used:

where y is the vector of de-regressed EBV for footrot, ? is a vector of coefficients for the SNP effects, a is the vector of random animal effects, e is the vector of random residual effects, and X and Z are incidence matrices relating observations to fixed and random animal effects,
respectively. Random effects were assumed to be normally distributed with zero means
and the following covariance structure:

where G is the genomic relationship matrix 32] calculated as:

where S is a centred incidence matrix of SNP genotypes, N is the number of SNPs, and pi is allele frequency of marker i.

Three different genetic models were used for this study: (1) an additive genetic model
where the major homozygous genotype was recoded to 0, the heterozygous genotype to
1, and the minor homozygous genotype to 2; (2) a dominant model, where the minor homozygous
and heterozygous genotypes were coded as 1 and the major homozygous genotype was coded
as 0; and (3) a recessive model, where the minor homozygous genotype was coded as
1 and the heterozygous and major homozygous genotypes were coded as 0. SNP positions
were determined using the map Ovis aries OAR 3.1 33].

Quantile-quantile plots

Quantile-quantile (Q-Q) plots were used to analyse the extent to which the observed
distribution of the test statistic followed the expected (null) distribution. This
was done to assess potential systematic bias due to population structure or analytical
approach.

Significance threshold

Bonferroni correction was applied to obtain significance thresholds. A SNP was significant
at the genome-wise level when the –log10(p-value) was greater than -log10 (0.05/N),
where N is the total number of markers. Chromosome-wise significant SNP had an associated
–log10(p-value) above –log10(0.05/n), where n is the number of markers on a given
chromosome.

Proportion of variance explained

Proportion of variance explained by SNP (pve) was calculated as:

where mrssh0 is the Mahalonobis Root Sum of Squares (mrss) for the null hypothesis and mrssk is the same for marker k.

Linkage disequilibrium

Linkage disequilibrium (LD) was measured as r2, which is the squared correlation of the alleles at two loci 34]:

where f(AB), f(A), f(a), f(B), f(b) are observed frequencies of haplotype AB and of alleles A, a, B and b, respectively.
LD was calculated for all syntenic SNP pairs (SNPs on the same chromosome). SNPs that
could not be mapped to any chromosome were excluded from these analyses. Average LD
was calculated as an arithmetic mean of r2 values for SNP pairs in 1 kb windows from all chromosomes. LD based on the marker
data was compared with an approximate expectation of r235]:

where Ne is the effective population size, c is the recombination distance in Morgans (we
assumed 100 Mb?=?1 Morgan) between SNPs. Assuming that LD at short distances depends
on long-term population history 36],37], the historic effective population size was estimated as:

where Nt is the effective population size t generations ago, with t?=?1/(2c) 36]. The Ne from five generations ago (250 individuals) was considered as the most recent
with c?=?0.1 Morgan.