Determining the genome-wide kinship coefficient seems unhelpful in distinguishing consanguineous couples with a high versus low risk for adverse reproductive outcome


Participants

Information on the in- and exclusion of couples and the number of participants is
presented in Table 1. The aim was to include 100 consanguineous couples (cases) with one or more children
affected by an autosomal recessive disorder and 100 consanguineous couples (controls)
with a family relation comparable to the cases and with only healthy children (at
least three). Information on the identity of all first-degree- to third-degree family
members of both partners of the couples was obtained as far as this was known to the
couple. Cases were excluded if another individual in the family affected by the same
disorder was known. Cases were not only included when molecular data were available,
but also if the nature of the AR disorder was beyond doubt because of clinical or
biochemical confirmation (which was the case in 7/73 cases). AR disorders included
ranged from rare to extremely rare. Further details on the methods of ascertainment
have been described earlier 3]. It was attempted to have equal numbers of case- and control couples of every ethnic
background. Moreover, similar distributions of pedigree relatedness among both the
case couples and the control couples were aimed for. There were ten cases where we
had a case- and control couple from the same family. The 168 couples originated from
10 different populations (Tunisia, Saudi Arabia, Turkey, Jordan, Morocco, Pakistan,
Iraq, Iran, Afghanistan, the Netherlands). Approval for the study was obtained from
the Medical Ethics Committee of the VU University Medical Center (the Netherlands),
le comité de Protection des Personnes de l’hôpital Charles Nicolle (Tunisia), the
Research Ethics Committee (REC) at KFSHRC (Saudi-Arabia), the Clinical Trials Ethical
Committee at Istanbul Medical Faculty of the Istanbul University (Turkey) and the
IRB committee of Jordan University Hospital (Jordan).

Table 1. In- and exclusion of couples

Sample preparation, genotyping and quality control

For the couples other than the Saudi Arabian couples, DNA was extracted from the individual’s
saliva samples according to standard procedures; for 54 case couples, DNA extracted
from blood was available. A total of 66 and 70 non-Saudi case- and control couples
respectively were genotyped according to manufacturer’s protocol using Affymetrix
6.0 SNP arrays. The Saudi Arabian couples were subjected to genotyping using Affymetrix
250 K arrays (6 individuals) and Affymetrix Axiom arrays (48 individuals), after DNA
extraction from whole blood (See Additional file 1: Table S1).

PLINK was used to perform post-genotyping quality control 4]. Individuals with a genotyping rate of 95 % were excluded. All genotype data were
merged for overall analysis; Tunisian, Saudi, and Turkish couples were also merged
into separate files for further analysis for each population. Duplicated SNPs were
removed and only SNPs were included with a genotype call rate of 95 %. Further quality
control included removal of SNPs with a minor allele frequency of less than 5 %. After
quality control, 73 case couples and 78 control couples genotyped for 143,512 markers
were available for the analysis (see Table 1). To obtain a pruned subset of markers with low linkage disequilibrium, the PLINK-indep-pairwise
option was used with parameters 50 5 1.5. (57,358 SNPs remaining). Multidimensional scaling was performed to analyse the population
substructuring by using the PLINK MDS plot option. Results were entered in the statistical
package R version 3.0.1 (http://www.r-project.org/) and SPSS version 20 for Windows 5]. The MDS plot was inspected for population clustering and case/control matching.

Analysis of pairwise relatedness

A kinship coefficient based on the pedigree reported for each couple was calculated
according to the method as described by Wright 6].

Although pairwise coefficients of relatedness in genomic data can be calculated based
on known allele frequencies in a population, these allele frequencies are frequently
not known and often calculated from the sample by estimators of relatedness 7]. Lack of homogeneity of the sample or sampling errors can lead to false estimates
7], 8]. Since in some populations, individuals are genetically more similar than in other
countries, to estimate the genomic pairwise relatedness of our sample we used three
different estimators (PLINK, King, IBDelphi) to account for the population stratification
as well as for the inbreeding in our sample. Moreover, we estimated the relationship
coefficients from the whole set of samples (overall analysis), as well as from separate
sets containing only couples from one population (population subgroup analysis). This
latter analysis was performed only for the Tunisian, Saudi and Turkish couples, given
the small sample sizes from the other populations.

PLINK uses a method-of-moments approach where the probability of sharing 0, 1 or 2
SNPs IBD is calculated. The total proportion of SNPs IBD is calculated based on the
estimated allele frequency of all SNPs and assumes homogeneity 4]. King uses the same approach, and offers two different methods: King homo, which
assumes homogeneity of the sample, and King robust, which provides robust relationship
inference allowing for heterogeneity of the sample by a robust approach that accounts for population stratification 8], 9]. Finally, IBDelphi is an algorithm that analyses raw data of high-density SNP genotypes
from a consanguineous couple by looking for homozygous regions of over 0.5 Mb in both
genomes that lack SNPs that exclude IBD 10].

In PLINK, pairwise relatedness between partners of each couple was calculated with
the –genome –rel-check command in PLINK. In King, the pruned subset of SNPs was used to calculate pairwise
IBD through the kinship parameter (for the overall analysis) and homo parameter (for
the population subgroup analysis). Finally, individual genotype files were entered
pairwise in IBDelphi, producing IBD measures. All estimates of pairwise relatedness
(pedigree, PLINK, King and IBDelphi) were entered in the statistical package R and
SPSS version 20. Pearson’s correlation coefficients were calculated for correlations
between the different estimates. Rgen represents the relatedness as derived from the
genotype, while Rped was calculated based on the pedigree information reported.

The ratio R?=?(Rgen-Rped)/Rped was used as a measure of the degree of similarity between
Rgen and Rped, with Rgen being the observed measure of pairwise relatedness (resulting
from our analyses by the four different approaches) and Rped the kinship coefficient
between the parents of a child based on the pedigree. If, for a couple, Rped is higher
(lower) than Rgen, R is negative (positive). By dividing the difference by Rped, we
consider the relative differences. The possible influence of population was ignored
first, and the alternative hypothesis was tested that the median of the distribution
of ratio R of cases (couples with affected children) is higher than the median of
the distribution of controls (couples with only healthy children) with the one-sided
non-parametric Mann Whitney test at level 0.05. Since most (96 of the 151 couples)
couples come from Tunisia, they were subsequently selected to filter out a possible
population effect, and the same test was performed based on these selected data. The
analyses were also done separately for the first cousin couples (based on pedigree)
as they are the most predominant consanguineous couples who seek genetic counselling.

Next, a mixed effects linear model was assumed. The outcome variable in the model
is equal to ratio R, the covariates consist of an intercept, the fixed effect 0–1
variable “whether the couple has an affected child (covariate equals 1) or not (covariate
equals 0)” (i.e. case or control), and a random effect “population”. The population
effect on the association between the outcome variable R and the case–control status
was investigated and the one-sided alternative hypothesis was tested regarding whether
the regression parameter for case–control status was positive, corrected for a possible
population effect if “population” is a confounder.