Increased efficiency in identifying mixed pollen samples by meta-barcoding with a dual-indexing approach


High throughput sequencing (HTS) has been shown to be successful and valuable for
taxonomic assessment of mixed pollen samples 7], 13], 15]. The drawbacks of existing protocols were the low number of samples processed simultaneously
or inefficient multistep library preparations. Recent developments in sequencing technologies
allow far larger multiplexing, given the enormous throughput already available with
desktop NGS devices. Highly multiplexed sample processing has already been established
for bacterial assessments using dual-indexing approaches with the MiSeq sequencer
16]. It was the goal of this study to transfer this knowledge to the field of plant meta-barcoding,
in our specific case of pollen samples.

By adapting the primer design to the ITS2 region, modifying the oligo scaffold design,
and adjusting the sequencing primers to be compatible with the MiSeq device, we successfully
established a fast pollen DNA meta-barcoding routine with high multiplexing capabilities.
For our test samples, the newly designed primers were used to sequence 384 mixed pollen
samples collected by solitary bees with a single sequencing run. In the original bacterial
dual-indexing protocol 16], the potential for higher multiplex rates than 384 samples is suggested depending
on required throughput to assess the diversity. Our sequencing results indicate that
for pollen samples at least a depth of 2,000–3,000 high quality reads per sample should
be reached to identify all taxa within the sample (plateau reached, Figure 2), which was comparable for the two bee species under study. However, this is of course
highly dependent on number of plant species in the samples, which may be dependent
on sample origin, foraging behaviour and the biodiversity of the ecosystem of interest,
but may serve nonetheless as a guideline for higher multiplex rates. Additional index
combinations for more samples are provided in the Additional files alongside the protocol
for the bacterial dual-index approach 16].

Beside our dual-indexing strategy, another HTS-based approach has been recently proposed.
There, PCR amplification and index labelling were conducted in separate steps 13], which is time and labour-intensive and introduces a further step where errors may
be introduced. In our protocol, PCR amplification and sample indexing occur simultaneously,
which is highly practical and requires no special reagents, such as additional expensive
library preparation kits or adapter ligation chemicals. In our protocol, the complete
workflow accounts for less than USD 20.00 for materials per sample, when processing
384 samples simultaneously. This is much lower than conventional pollen analysis under
the light microscope, which can reach several hundred USD per sample.

Most plant taxa detected could be successfully classified using the already shown
RDP classifier 7], 21], but also the recently developed UTAX algorithm 25]. Due to the missing confidence values for taxonomic assignments in UTAX version 8.0
(announced for version 8.1, http://drive5.com/usearch/manual/faq_taxconfs.html, accessed 2015/22/05), we compared the classifications to the RDP output as well
as the documented flower resources. UTAX and RDP showed high agreement between taxonomic
classifications, thus both may be used arbitrarily.

Approximately half of the genera found flowering near the nest sites were detected
in the pollen samples. This is attributable to bee foraging preferences, where not
all available resources might be used, especially for the oligolectic O. truncorum. Secondly, about three quarters of the reads were assigned to plant genera documented
near the nesting sites (50 m: all plant species, 50–600 m: mass-flowering plants
only). As bees are expected to forage also further away, the remaining reads are attributable
to pollen collected from undocumented plants or misclassifications.

According to our expectation, pollen composition patterns were very different for
the oligolectic and the polylectic bee species (Figure 3). O. truncorum samples were dominated by Asteraceae, whereas O. bicornis samples showed a wide pollen spectrum. Our data correspond to flower preferences
and foraging strategies known for these species 18], 19]. This supports the high quality of information obtained by pollen meta-barcoding,
as already intensively evaluated in another study 7]. It is noteworthy that even very rare taxa could be detected, which is of special
interest in the oligolectic O. truncorum and might be overlooked in light microscopy assessment of pollen samples.

We would like to point out that abundance data obtained from molecular approaches
should in general be interpreted with care and only as relative abundance (divided
by total number of reads in the sample to account for varying library sizes). Contradicting
results exist concerning the suitability of pollen meta-barcoding for quantification
purposes, with Keller et al. 7] and Kraaijeveld et al. 14] finding a positive significant correlation between genera by light microscopy and
meta-barcoding, whilst Richardson et al. 13] were not able to find such a connection. Due to the different steps in the workflow,
e.g. dilutions and PCR, biases can be introduced, leading to skewed data and over-
or underrepresentation of certain taxa. PCR bias is considered to be a random process
and can be accounted for by performing replicate PCR reactions for each sample 23], which are pooled subsequently. We followed this approach in this study likewise
to Keller et al. 7] to avoid PCR bias as far as possible. This may explain some of the discrepancy between
studies, although a recent study indicated that PCR replicates might not be necessary
in pollen meta-barcoding 14]. The reduced amount of individual processing steps of direct indexing, (as performed
here and in both studies identifying positive correlation 7], 14]) further reduces additional risks to introduce unwanted effects in comparison with
the study using adapter ligation that shows no correlation 13].

In this study, samples of the same bee species show high consistency in abundance
patterns of major taxa, which are easily biologically explainable. A good compromise
for most studies investigating foraging patterns might be to not use direct count
data, but conservatively categorising plant taxa into ‘abundant’ and ‘rare’ based
on a threshold, as proposed by Keller et al. 7]. Where more detail is needed, a subset of samples may also be analysed in parallel
by light microscopy for evaluation purposes 7], 13], 14].

One major advantage of pollen meta-barcoding is that no expert knowledge on pollen
morphology is required for taxonomic assignment. Additionally, species level assignment
is possible even for closely related plant taxa. However, successful taxonomic assignment
critically depends on the quality of the reference database. Our target marker was
the ITS2 region, but other genetic markers might also be considered for plant species
identification using meta-barcoding, e.g. trnL 14], 15] or rbcL plus trnH–psbA8], 9]. The described dual indexing approach 16] can also be applied to other genetic markers, provided some considerations are taken
into account as described for ITS2 in this study. On the laboratory side of the workflow,
firstly target and thereby primer choice should be appropriate for universal amplification
and plant species identification based on DNA sequence data. The amplified fragment
should be of the appropriate size for the chosen MiSeq sequencing chemistry, e.g.
no longer than ~480–490 bp for 2 × 250 v2 sequencing kits, allowing for some overlap
between forward and reverse reads. Given these conditions are met, primer design can
be performed following the guidelines from Kozich et al. 16] including the required modifications to the various oligonucleotides. However, as
mentioned before, successful plant species identification relies to a large degree
also on the underlying reference database and bioinformatical classification algorithm.
For most alternative markers comprehensive reference databases are currently lacking
and thus taxonomic classifications are mainly performed by a BLAST search 33] against sequences downloaded from GenBank 8], 9], 13]–15], locally managed alternative databases 9] and/or newly acquired DNA sequences 8], 9]. BLAST searches are based on local alignments that may only use parts of each sequence
(e.g. conserved regions) for classification, lack a hierarchy classification procedure
and results can be difficult to interpret 7], 17] especially when results show hits for multiple, different taxa. Setting up locally
managed databases is time- and labour-intensive a well as costly and makes it difficult
to compare independent studies with one another. In the case of the ITS2 region, we
benefitted from the already established ITS2 database 30], which contains annotated and trimmed ITS2 sequences from species worldwide and can
be publicly accessed, improving overall comparability across studies.

Although Chen et al. 17] reported high identification accuracies with ITS2 as a genetic marker, some plant
taxa could not be identified in recent studies on pollen meta-barcoding 7], 13]. These included the families Salicaceae, Lamiaceae 13] and Vitaceae 7] and the genera Lonicera13], Heracleum, Carduus, Phacelia, Convolvulus and Helianthus7], although they had been identified with microscopic pollen analysis. In this study,
we could detect all of these taxa. Failure to detect these families and genera with
DNA sequence data was most likely due to incompleteness of the reference databases
in these studies. Richardson et al. 13] used in total only 2,628 reference sequences, that described about half of the locally
occurring plant species. In the case of Keller et al. 7], we were able to directly compare the database then (73,853 sequences) and now (182,505
sequences), which revealed that for each of those plant taxa more reference sequences
were included after the database update presented here (Additional file 3: Table S2). This explains the positive detection for those plant taxa in this study
in contrast to earlier studies and again highlights the importance of a current and
comprehensive reference database for meta-barcoding purposes.

Our test samples comprised only pollen samples collected by bees, but in general ITS2
meta-barcoding can be applied to plant identification in other research fields where
mixed samples are encountered, such as diet analysis of herbivores 34], 35] and in palaeo-ecology 36]–38]. Furthermore, high-throughput DNA analysis of mixed plant samples can also prove
valuable in food safety issues 39], honey quality analysis 8], 9] as well as allergen load assessment 14]. For such applications, alteration of the provided protocol for library preparation
and sequencing is not needed, although the DNA extraction process may require alternative
kits or adapted protocols specific for the material of interest.