Intraspecific genetic lineages of a marine mussel show behavioural divergence and spatial segregation over a tropical/subtropical biogeographic transition


Ethics statement

Mussel collection were performed under permit number RES2014/12 issued by the Department
of Agriculture Forestry and Fisheries to the Department of Zoology and Entomology
at Rhodes University.

Genetic data

DNA extraction, amplification and genotyping

Mussels (adults, 4cm in shell length) were collected between February 2010 and May
2013 at six locations in southern Africa (Figure 1A and Table 1) in the middle portion of the species’ vertical distribution. Mantle tissue (20-30
mg) was dissected from each individual, preserved in 96% ethanol and stored at -20°C.
Total genomic DNA extraction was performed using a standard Proteinase K protocol
adapted from 21]. Molecular genotyping of a total of 269 samples was carried out using a set of 10
microsatellite loci in three multiplex reactions, with subsequent separation of the
PCR products using an ABI PRISM 3130xl DNA analyser (Applied Biosystems) with GeneScan
Liz 500 as size standard (Applied Biosystems) and following the procedure of Coelho
et al. 22]. Data were scored using PEAK-SCANNER v. 1.0 software (Applied Biosystems).

Table 1. Genetic diversity of each population

Location-based analyses

Scoring errors, large allele dropout and null alleles were checked employing the program
MICROCHECKER 23] and frequency of null alleles estimated based on the algorithm presented in Brookfield
24]. Additionally, tests for Hardy–Weinberg equilibrium were conducted using GENETIX
4.05 software 25] for all locus-location combinations by assessing the inbreeding estimator Wright’s
FIS and estimating significant values after 10,000 permutations. Tests for linkage disequilibrium
between all pairs of loci were performed according to the method of Black and Kraftsur
26] implemented in GENETIX.

For each location, allele frequencies, observed heterozygosity (HO) and expected heterozygosity (HE), plus FIS (significance based on 1,000 permutations and adjusted using q-value correction for
multiple comparisons; 27]) were computed with GENETIX. Standardized allelic richness (Â) was estimated by resampling
1000 times to standardize to the smallest sample size (n?=?30) to account for differences
between sampled locations, using StandArich R package 28]. Pairwise location estimates of genetic differentiation plus 95% bootstrapped confidence
intervals (10,000 replicates) were estimated using Weir and Cockerham’s 29] FST estimator (?) and Jost’s 30] DST. Calculations were executed in the Diversity R package 31].

Estimating admixture

To evaluate the extent of admixture between the two mitochondrial lineages using microsatellite
data, we used STRUCTURE 2.2 32]. Clusters (K) varied from one to a maximum of seven, corresponding to the number
of locations in the study data plus one. Twenty replicate runs [100,000 Markov Chain
Monte Carlo (MCMC) iterations and 10,000 burn-in] were performed under the admixture and the correlated allele frequency model. The
most probable value of K was inferred based on the ad hoc criteria L(K) proposed by
Pritchard et al. 32] and delta K by Evanno et al. 33]. Similar replicate runs were grouped based on the symmetric similarity coefficient
of 0.9 using the FullSeach algorithm in CLUMPP 1.2 34] and visualized with bar_plotter.rb 1.1 implemented in Ruby.

To complement the results of STRUCTURE, we used discriminant analysis of principal
components (DAPC, 35]), which can be performed when uncovering population admixture in the absence of assumptions.
This method first transforms the data using principal components analysis, which ensures
that the variables are not correlated and that the number of variables is smaller
than the number of individuals. We ran two DAPC analyses in ADEGENET R package 36], following Jonker et al. 37]: a location assignment and a genetic cluster assignment. We assigned each individual
a priori to its location of origin and obtained for each one the probability of assignment
to the correct sampling site. This procedure maximizes the among-location variation
and minimizes the within-location variation 35]. Using the find.clusters function (with 107 iterations), we ran successive K-means clustering of the individuals from K?=?1 to
K?=?6, and identified the best supported number of clusters through comparison with
the Bayesian Information Criterion (BIC) for the different values of K. We determined
the number of clusters and assigned each individual to a genetic cluster without providing
any a priori population assignment. In that way, the grouping factor is the genetic cluster and
not the sampling location, offering an unbiased interpretation of population structure.
To avoid unstable assignments of individuals to clusters, we retained 89 PCs (sample
size divided by three), but used all discriminant functions, in a preliminary DAPC
run. The results were then reiterated by the optim.a.score function with 100 simulations
to determine the optimal number of PCs, and a final DAPC was subsequently carried
out with the optimal number of PCs.

Estimating the directionality of gene flow

To compare different biogeographic hypotheses on migration rates of P. perna, we used the coalescent-based program Migrate-N MIGRATE version 3.1.3 38],39]. This approach assumes the Wright-Fisher model, where locations have a constant effective
size through time, the rate of mutation is constant, and locations exchange migrants
with constant rates per generation, but those rates can vary among locations. We conducted
the analyses on a random subsample of 30 individuals, with the dataset structured
into three groups according to the genetic clusters recovered (i.e. all populations
were assigned to one of the two clear clusters with the exception of PA, which fell
indistinctly into each of the two, and so was considered separately): West (sites
HM, PL), Centre (PA) and East (PE, DU, PO). We tested four variations of the three-group
(West, Centre, East) migration model: all directional migration (Model 1: West ? Centre
? East, full model, nine parameters), strict west to east migration (Model 2: West???Centre???East,
six parameters), strict east to west migration (Model 3: West???Centre???East, six
parameters) and from the centre to the margins (Model 4: West???Centre???East, five
parameters). Testing the directionality of gene flow is justified because the dominant
ocean current between the ocean basins, the Agulhas Current, runs westerly from the
Indian towards the Atlantic Ocean (Model 3) and is thought to play a role in limiting
marine dispersal in the opposite direction 40],41] though coastal counter-currents also occur 42] (Model 2). Models 1 and 4 consider indistinct migration among groups and migration
directionality towards the transition area between the two clusters respectively.
Initial values were calculated using FST. Mutation rates were set to be constant among loci. The Migrate-N run parameters
were calibrated on the full model for convergence of the Markov chain Monte Carlo
sampling method. The prior distributions were uniform for mutation-scaled population
size parameters theta (?), that are four times the product of the effective population
size and the mutation rate, and mutation-scaled migration rates M, that is, migration
rate scaled by the mutation rate, over the range of ??=?0–100 and M?=?0–100. These
settings resulted in converged posterior distributions with a clear maximum for each
estimate. The three replicate Bayesian runs consisted of one long chain with a total
of 6 million states and genealogy changes after discarding the first 10,000 genealogies
as “burn-in” for each locus. For all the analyses we used an adaptive heating scheme
with 4 simultaneous chains using different acceptance ratios (temperature settings
were 1.0; 1.5; 3.0; 1,000,000.0). Overall loci information was combined into a single
estimate by Bézier approximation of the thermodynamic scores as described by Beerli
and Palczewski 43] and we averaged the Bézier score over three different runs and used this as input
to evaluate multiple models using Bayes factors 44].

Ecological data

Attachment strength and gaping behaviour

All mussels used for the assessment of behavioural traits were collected from locations
known to support pure mtDNA lineages (18]). Specimens (shell length 3-4cm) were collected in February 2013 [at Brenton-on-Sea
(BS) for the western lineage and at Amanzinoti (AM) for the eastern lineage; Figure 1A) and acclimated in seawater at 17°C for two weeks prior each experiment. A second
trial was replicated in February 2015 with mussels from two additional locations [at
Robberg (RO) for the western lineage and at Port St. Johns (PJ) for the eastern lineage;
Figure 1A]. Common garden experiments involve the acclimation and comparison of individuals
from distinct geographical locations under identical environmental conditions. Such
experimental set-ups are widely used to disentangle the effects of genetic and environmental
conditions on observed phenotype differences (e.g. 45]).

Mussels were divided in three separate tanks (n =5 of each lineage in each tank),
where they were placed at least 10cm apart so that they maintained a solitary position
(i.e. horizontal to the substratum) and attachment strengths were not influenced by
nearby conspecifics. After three days, the attachment strength of each mussel was
measured by drilling a hole into one of the valves and using a spring balance to record
the force required to dislodge the mussel perpendicularly to the substratum 46].

Mussels (n?=?15 for each lineage and each measuring interval) were exposed to air
for 3h in a controlled environment chamber at 35°C and 60-80% humidity. For each one
hour interval, the number of valve movements (gaping behaviour) were noted by visual
observation, without recording the width of the gape 47].

Intertidal distributional limits

Temperature dataloggers (iButtons®, Maxim Integrated Products, Dallas Semiconductor,
USA) were used to relate the effective shore height of P. perna at two open coast sites on the east coast [Amanzinoti (AM) and Balito (BA); Figure 1A] and two within the western range of the species (BS and RO). At each site, two transects were established, approximately 50 m apart and running
perpendicular to the shoreline. Along each transect, two logger were deployed at the
upper and two at the lower limit of P. perna distribution. Loggers were programmed to record data at 10 min intervals between
February 1st to 14th of 2013 (western range) and March 17 to 30 of 2014 (eastern range), covering the
two-week tidal cycle in both regions (i.e. a neap and a spring tide). Temperature
logger profiles clearly reveal when loggers are first inundated by the returning tide
(sudden, sharp temperature drops; ? 3°C over 10 min; 48]). This was used to calculate the effective shore height using the online forecasts
service of SHOM database (http://www.shom.fr; settings: hauteur d’eau à une heure donnée). Because tide cycles are not in phase
across large geographical scales, Durban and Mossel Bay tidal elevations were used
for the east and south coast respectively. Where both loggers were recovered at a
transect, the values were averaged and used to estimate populations’ intertidal limits
at each transect.

Data analyses

Each trial was analysed separately. Attachment strength data were analysed using a
2-way ANOVA with Lineage (western or eastern) and Block (tank 1, 2 or 3) as fixed
and random factors respectively. Gaping behaviour data were analysed using a 2-way
ANOVA with Lineage (western, eastern) and Time (hour 1, 2 or 3) as fixed factors.

For each intertidal Limit (low, high), data were analysed under a nested design with
Range (west, east) as a fixed factor, Site (1, 2) nested within Range and Transect
nested within Range and Site.

Heteroscedasticity was tested using Levene’s test and post-hoc separation of significant means by SNK tests. Analyses were performed in SPSS (IBM
Corp., USA).