Expression patterns, molecular markers and genetic diversity of insect-susceptible and resistant Barbarea genotypes by comparative transcriptome analysis


Generation and annotation of the transcriptomes of P-type Barbarea vulgaris

The transcriptomes of G-type B. vulgaris under diamondback moth feeding and non-feeding conditions were pyrosequenced in our
previous study 29]. To perform a comparative analysis of the transcriptomes between susceptible and
resistant genotypes of Barbarea vulgaris, the transcriptomes of P-type seedlings under DBM feeding and non-feeding conditions
were pyrosequenced in the present study. A total of 13,684,884 and 11,415,972 clean
paired-end reads containing 2.764 and 2.238 gigabase pairs of clean nucleotides were
generated from insect infested and control P-type B. vulgaris, respectively. These data were assembled into a set of 33,721 non-redundant unigenes,
with a mean length of 896 nt and an N50 length of 1,440 nt, which were comparable
with the G-type transcriptome assembly (Table 1). The length distribution of the unigenes are listed in Additional file 1: Table S1.

Table 1. Summary of the sequencing and assembly of the G- and P-type B. vulgaris transcriptomes

Subsequently, we screened the unigene sequences against the NCBI non-redundant (Nr),
SwissProt, Gene Ontology (GO), Clusters of Orthologous Groups of proteins (COGs),
and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway protein databases using
BLASTX (e-value??0.00001). Protein function was predicted from the annotations of the most
similar proteins in those databases. As shown in Table 1, 31,715 (94.1 %) unigenes were annotated by at least one of these databases. Detailed
information on the Nt, Nr, SwissProt, GO, COGs and KEGG annotations is shown in Tables
S2–S7, respectively. The gene functional classification by GO analysis showed that
the largest GO terms were “cell”, “binding activity” and “physiological processes”
from the “cellular component”, “molecular function” and “biological process” ontologies,
respectively. The most abundant COGs terms were “general function prediction only”,
“replication, recombination and repair” and “transcription”. The distributions of
the functional categories were similar to that of G-type plants (Fig. 1).

Fig. 1. Function classification of unigenes. a, The gene ontology (GO) classification of P-type B. vulgaris transcripts; b, The comparison of clusters of orthologous groups of proteins (COGs) classification
between P- and G-type transcriptomes

Comparison of pest-induced transcriptome patterns between susceptible and resistant
B. vulgaris

Of the unigenes, 1,029 were differentially expressed in P-type B. vulgaris, including 530 up- and 499 downregulated, by DBM infestation (Additional file 1: Table S8; Additional file 2: Figure S1). This is far fewer than the number of genes affected by DBM in G-type
plants, which accounted for 2,102–4,685 up- and 1,254–5,188 downregulated genes at
a series of experimental time points 29]. The GO classification of the differentially expressed P-type unigenes indicated
that the “cell junction” and “extracellular region”, “electron carier activity” and
“enzyme regulator activaty”, “response to stimulus” and “localization” were the over-represented
terms from the “cellular component”, “molecular function” and “biological process”
ontologies, respectively. The most abundant COG class was “general function prediction
only”, followed by “amino acid transport and metabolism” and “carbohydrate transport
and metabolism” (Fig. 2). KEGG pathway analysis indicated that the over-represented pathways of the DBM-infected
P-type transcriptome were “Nitrogen metabolism”, “Phenylpropanoid biosynthesis”, “Photosynthesis–antenna
proteins” and “Flavonoid biosynthesis” (Table 2 and Additional file 1: Table S9). As shown in Additional file 2: Figure S2, the photosynthesis pathway genes were generally repressed by DBM infestation,
indicating that the pest not only consumed the existing assimilates, but also disrupted
the photosynthesis process. The phenylpropanoid biosynthesis and flavonoid biosynthesis
pathways were upregulated extensively (Additional file 2: Figures S3 and S4), similar to the results found in G-type B. vulgaris and Arabidopsis 29], 30], indicating that these kinds of secondary metabolites 11] were common response compounds to DBM infestation in plants. Generally, in susceptible
plants, the main DBM-affected genes are those engaged in nutrient, amino acid, carbohydrate
transport and metabolism, and photosynthetic processes. However, genes related to
certain secondary metabolism pathways such as glucosinolate biosynthesis, as well
as phytohormones and transcription factors, which were dramatically induced by DBM
in G-type B. vulgaris, showed less significant induction in P-type plants.

Fig. 2. The functional classification of differentially expressed unigenes in P-type B. vulgaris infested with DBM. a, gene ontology (GO); b, clusters of orthologous groups of proteins (COGs)

Table 2. The Kyoto encyclopedia of gene and genomes (KEGG) pathways affected by DBM in P-type
B. vulgaris

The saponin pathway in susceptible and resistant B. vulgaris

The genes of the triterpenoid saponin pathway (except P450s) were identified in both G- and P-type B. vulgaris transcriptomes, based on gene annotation. To limit the omissions, the saponin pathway
genes were BLAST searched against the transcriptome databases from both G- and P-type
plants. Seventy-one G-type and 44 P-type unigenes representing 22 enzymes catalyzing
20 metabolic reactions of the triterpenoid saponin pathway were identified (Additional
file 1: Table S10). The presence of more unigenes in the G-type plants could partially reflect
the higher heterozygosity among the sequenced individuals; polymorphisms within the
alleles could produce multiple potential unigenes during the assembly process. Particularly
evident were MDD and LUP2, which were represented by seven CL278.Contigs and five CL2531.Contigs, respectively
(Additional file 1: Table S10). The authenticity of these sequences requires further experimental analysis.
The expression abundances of these saponin synthesis genes were compared between G-
and P-type plants (Fig. 3a and Additional file 1: Table S10); the genes upstream of SE showed similar expression patterns between the two genotypes. Unexpectedly, expression
of genes for the bottom first and third enzymes, the glucosyl transferase (UGT73C11)
and an oxidosqualene cyclases (OCS), was significantly upregulated, and these mRNAs
accumulated by more than 10-fold in the P-type compared with the G-type plant. The
over expression of these genes were also comfirmed by Q-PCR analysis, though the change
levels were not as high as RNA-Seq displayed (Fig. 3b). One of them, UGT73C11, is known to be functional in the P-type plant 23]. The dramatic induction of these two enzymes could result from the P-type plants
suffering more serious damage under insect infestation and because the regulator of
this pathway is still functioning. However, some of the enzymes upstream of UGT73C11,
most likely the uncharacterized enzyme-P450s, are perhaps dysfunctional or have gained
new functions, resulting in no anti-DBM saponin being produced in P-type plants.

Fig. 3. Comparative transcription profiles of the putative genes in the triterpene saponin
synthetic pathway in P- and G-type B. vulgaris. a, Heatmaps representing the expression levels of different genes/families determined
by RNA-seq. The color bar is shown on the top right. Data represent log (RPKM) of
each treatment. The metabolites of the pathway are shown in bright green on top and
the enzymes for each metabolic step are shown below in black using the following abbreviations:
MVA, mevalonic acid; MEP, 2-C-methyl-D-erythritol 4-phosphate; IPP, isopentenyl diphosphate; G3P, glyceraldehyde
3-phosphate; DMAPP, dimethylallyl diphosphate; GPP, geranyl diphosphate; FPP, farnesyl
diphosphate; 2,3-OS, 2,3-oxidosqualene; AACT, acetyl-CoA acyltransferase; HMGS, 3-hydroxy-3-methylglutaryl
CoA synthase; HMGR, 3-hydroxy-3-methylglutaryl CoA reductase; MK, mevalonic acid kinase;
PMK, phosphomevalonate kinase; MDD, mevalonic acid diphosphate decarboxylase; DXS,
deoxy-D-xylulose 5-phosphate synthase; DXR, deoxy-D-xylulose 5-phosphate reductase;
MCT, 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase; CMK, 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase; MDS, 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase; HDS, (E)-4-Hydroxy-3-methyl-but-2-enyl pyrophosphate (HMB-PP) synthase; HDR, HMB-PP reductase;
IDI, IPI isomerase; GPS, geranylgeranyl pyrophosphate synthase; FPS, farnesyl pyrophosphate
synthase; SS, squalene synthase; SE, squalene epoxidase; OSC, oxidosqualene cyclases;
P450, cytochrome P-450; UGT, UDP-dependent glycosyl transferases. Two stars indicate
the two significantly induced genes in P-type plants. b, Q-PCR analysis of three selected genes of the pathway. The ordinates show the relative
expression levels; the abscissas show the time points. Error bars indicate the SD
of three biological replicates

Gene divergence and SNP and SSR markers between susceptible and resistant B. vulgaris

The homologous genes were detected by BLAST searching the unigenes from P- and G-type
B. vulgaris against each other with a cutoff E-value??e
?10
. The most similar unigenes within the P- or G-type were treated as paralogous genes,
while the most similar ones between the P- and G-type were treated as orthologous
genes. 12,980 gene families containing 26,793 P-type and 36,944 G-type unigenes were
shared between P- and G-type B. vulgaris (Fig. 4). The 6,928 P-type and 2,587 G-type unique unigenes were mainly composed of 200–300-nt
short sequences. The less clustered but more unique unigenes in P-type than in G-type
reflected the fact that the P-type transcriptome contained a large fraction (7,857)
of 200–300-nt short sequences. We then analyzed the SNP and SSR markers among the
orthologous genes distinguishing the two plant genotypes. 38,397 SNPs were found within
the 9,452 orthologous genes between the P- and G-type plants (the SNPs within each
genotype are not included); thus, about 35.3 % unigenes contain SNPs, with an average
of four SNPs per SNP-containing unigene (the detailed SNP list is shown in Additional
file 1: Table S11). Among these SNPs, nucleotide transitions accounted nearly triple the
number of transversions, and there were many more A/T transversions than G/C transversions
(Table 3). The overall GC content is 47 % in this transcriptome. We detected 5,105 SSRs in
the P-type transcriptome; the majority comprised mono-nucleotide-repeats (2,477) and
triple-nucleotide-repeats (1,590), which represented 48.5 % and 31.1 % of the total
SSRs (listed in Additional file 1: Table S12). The SSR-harboring unigene sequences were BLAST searched against the
orthologous genes from the G-type plant. 1,657 SSRs with polymorphisms between the
two genotypes were identified by a manual check. Among these, 98.4 % comprised mono-
(793), double- (273) and triple-(564) nucleotide-repeats, which accounted for 47.9 %,
16.5 % and 34.0 % of the total, respectively (Table 4). From the polymorphic SSRs, 913 primer pairs were designed to detect SSRs with a
length divergence of more than two nucleotides between the two genotypes (the primers
and the product sequences are listed in Additional file 1: Table S13). As shown in Table 5, the SSRs with 3, 6, 9 and 12-bp variants comprised significantly larger fractions
than the other types, suggesting that the SSR type and repeat variant that did not
cause frameshift mutations were preferentially selected during evolution.

Fig. 4. Venn diagram showing unique and homologous genes in P- and G-type B. vulgaris

Table 3. Types of single nucleotide polymorphisms in Barbarea vulgaris

Table 4. Statistics of simple sequence repeats (SSRs) with polymorphisms in the P- and G-type
Barbarea vulgaris

Table 5. Summary of simple sequence repeat (SSR) primers for detecting the polymorphisms between
the P-and G-type Barbarea vulgaris

Genetic diversity of Barbarea germplasm

To test the utility of the SSRs produced in this study, 30 randomly chosen SSRs were
used to investigate the genetic diversity of germplasms from the Barbarea genus. Thirty accessions assigned by the supplying seedbank to four species (B. intermedia, B. stricta, B. verna, and B. vulgaris) and derived from seven countries (Austria, Belgium, Germany, Ireland, Norway, Poland
and Spain) were analyzed in addition to the two G- and P-type accessions used for
transcriptomics (Additional file 1: Table S14). On PCR, 99.7 % of the primer pairs produced clear peaks on electrophoresis
and generated 957 data points. The three (0.31 %) low-quality reactions were treated
as missing values in the analyses. The 30 SSR markers generated 92 alleles in the
population. Among these, 88 alleles displayed a frequency of more than 5 % in the
total sample. Rare alleles could have been missed because of the small sample size.
The summary statistics of the SSRs are shown in Table 6. The unweighted pair-group method with arithmetic means (UPGMA) tree was generated
using the SSR data. Based on the tree topology, the germplasms could be clearly divided
into 4 groups and the P-type alone (Fig. 5). The G-type and P-type accessions from this and a previous 29] transcriptomics study were widely separated as a logical consequence of their polymorphism
for all markers used. The B. verna and B. stricta accessions were also separated onto different branches of the tree. Interestingly,
the P-type from transcriptomics did not group with any other tested accession, while
the G-type from transcriptomics grouped with all the remaining seed bank accessions
of B. vulgaris, including twenty-two Western and Central European B. vulgaris accessions. All seed bank B. vulgaris accessions were also found to be resistant to the DBM. Some substructure was evident
in the B. vulgaris group, including a distinct group of three accessions (4, 8, 27). However, the seed-bank
assigned B. intermedia accessions were completely embedded in the G-type-like B. vulgaris germplasms. Morphological inspection of the two seed bank assigned B. intermedia accessions (as both rosette plants and flowering plants) revealed only modest morphological
difference from B. vulgaris (shape of the top few leaves on the scapes) and both B. intermedia accessions showed full DBM resistance (as for the G-type).

Table 6. Characteristics of the 30 analyzed simple sequence repeat (SSR) markers and the diversity
detected in 32 Barbarea accessions

Fig. 5. Bootstrapped unweighted pair-group method with arithmetic means (UPGMA) tree of Barbarea species. The P-type and G-type used for transcriptomics were compared to 30 seed
bank accessions from four Barbarea species. The numbers on the branches indicate bootstraps