Second generation physical and linkage maps of yellowtail (Seriola quinqueradiata) and comparison of synteny with four model fish

De novo transcriptome assembly

A cDNA library was generated from pooled RNA samples extracted from 11 tissues from
a single individual. Sequencing on the Roche/454 GS FLX Titanium platform generated
1,353,405 reads. The CLC Genomic Workbench carried out the de novo assembly. After trimming the adapters and filtering out the low-quality and short
reads, 1,345,753 high-quality reads were assembled into 56,449 contigs, with 276,945
reads remaining as singletons. The average length of the contigs was 782 bp, and the
N50 size was 959 bp.

Gene ontology analysis

Of 56,449 contigs, 24,035 (43%) had a significant hit and matched 15,280 unique protein
records in the nr protein database. Gene ontology (GO) analysis was conducted on these
24,035 contigs. 17,076 sequences were assigned to at least one GO term describing
three functional groups: biological process, molecular function and cellular component.
Summaries of the level 2 GO assignments are shown in Figure 1. Among the 17,076 sequences, the molecular function ontology comprised the majority
of GO assignments (83%), followed by biological processes (79%) and cellular components
(74%). In the molecular function category, binding and catalytic activity represented
about 80% of the total. For biological processes, sequences involved in cellular processes
(19%), metabolic processes (18%), and biological regulation (13%) were highly represented.
Finally, the cell and organelle term represented about 70% of the cellular component.
Transcriptome assembly using next generation DNA sequencing and GO analysis has been
reported in other fish species 33]-35]. Our GO analysis of yellowtail transcriptome revealed similar results to those of
other fishes.

Figure 1. Gene Ontology assignment for assembled contigs. (A) Molecular function, (B) Biological process and (C) Cellular component assignment.

SNP identification

Sequencing produced 570,846 raw reads derived from the full-length library and 456,482
raw reads derived from the 3?-anchored library. Quality-based variant calling using
the CLC Genomics Workbench detected 9,356 biallelic putative SNPs in 6,025 contigs,
with a minor allele frequency (MAF) ?25%. SNPs with a high allele frequency are more
suited for constructing a linkage map efficiently using genotyping of one family because
polymorphisms of SNPs decrease in one family. These contigs were registered to DDBJ/EMBL/GenBank
as accession number FX884179–FX890203.

Mapping of SNP markers to the linkage map

Direct sequencing identified 143 informative SNPs that were heterozygous in either
one of the parents, and these heterozygous SNPs were used for linkage analysis using
the F1 mapping progeny. In the SNPtype assay, 458 SNPs were mapped to the linkage map (Tables 1 and 2, Figure 2, Additional file 1). 2081 markers containing 601 SNPs, which were polymorphic in the F1 mapping progeny,
were mapped in the linkage map, and 606 markers were common for both sexes. In this
study, SNPs were mapped for the first time in a yellowtail linkage map.

Table 1. Summary of the yellowtail genetic linkage map

Table 2. Summary of the markers of the linkage map

Figure 2. Examples of two linkage groups (female and male). Squ1 has 53 and 48 markers, including
20 and 14 SNPs, in the female and male, respectively. Squ2 has 82 and 77 markers,
including 11 and 11 SNPs, in the female and male, respectively. Distances between
markers are shown in centiMorgans (cM).

In our genotyping analysis, many polymorphisms in wild yellowtail were observed; however,
using one family decreased the SNP frequency to about 10%. Moreover, we used a nanofluidic
dynamic array to perform high-throughput genotyping against targeted SNPs. We considered
that the nanofluidic dynamic array was useful to genotype SNPs in one family, as well
as sequencing analysis by Sanger’s method.

Construction of the RH map

PCR on a dynamic array produces high-throughput gene expression data that are essentially
identical in quality to conventional microliter qRT-PCR and are superior to publicly
available array data from the same tissue type 36]. In our previous study, 580 markers were mapped in the first RH map 32]. 1,563 markers, containing the previous 580 markers, were used to construct the RH
map. The two-point analysis, performed at a LOD score of 4.0 and a distance threshold
of 50, resulted in 75 groups using CarthaGene software 37],38]. Furthermore, with reference to the locations of several markers on the constructed
genetic linkage map, 1,532 markers (1433 EST markers containing SNPs and 99SSR markers)
were distributed to 24 linkage groups (Tables 3 and 4, Figure 3, Additional file 2). Thirty-one markers in eight groups were not distributed among the genetic linkage
groups because of their LOD score or a distance threshold error. The RH map was constructed
with a final set of 1,532 markers. In each group, the RH map ranged from 640.7 to
1,343.3 centiRays (cR), with an average of approximately 1,096 cR. The combined size
of all RH groups was 26,293.5 cR. The estimated size of the yellowtail genome is 800
Mbp 24], which inferred a value of 1 cR = 30 kbp (800 Mbp/26,293.5 cR). We have obtained
a large quantity of data in a short time using the BioMarkâ„¢ HD system since the construction
of the first RH map.

Table 3. Summary of the yellowtail radiation hybrid (RH) map

Table 4. Summary of the markers of the RH map

Figure 3. Examples of four radiation hybrid (RH) groups. Each group has 79 markers in SQ1, 70
markers in SQ2, 60 markers in SQ3 and 69 markers in SQ4. Distances between markers
are shown in centiRays (cR).

The RH map was compared with the linkage map to confirm the accuracy of the local
order of markers (Additional file 3). In linkage group 11 of mapped to 0 cM, because chromosome recombination is unlikely
to occur during meiosis. It is possible to map these genes accurately on the physical
map using RH. The position of some markers was different between the RH and linkage
maps, because the physical lengths of the RH map are different from the genetic lengths
of the linkage map. In addition, we supposed which region of the RH map is not recombined
at meiosis by comparing the RH and the linkage map. The accuracy of the local order
of markers will be confirmed when the whole genome sequence becomes available. Currently,
we are trying to map genome contigs of yellowtail onto the physical map.

Synteny relationship with model fish

The 1,433 yellowtail EST marker sequences of the RH map were compared with the cDNA
sequences of four fish species: medaka (Oryzias latipes), zebrafish (Danio rerio), three-spined stickleback (Gasterosteus aculeatus) and green-spotted pufferfish (Tetraodon nigroviridis) using the TBLASTX algorithm. Among the 1,433 yellowtail marker sequences, 1,036
genes (72.3%) had homologs in medaka, 1,064 genes (74.2%) had homologs in zebrafish,
1,073 genes (74.9%) had homologs in three-spined stickleback, and 1,032 genes (72.0%)
had homologs in green-spotted pufferfish (Additional file 4). These values were not significantly different among the four fishes. Oxford grids
between yellowtail and the four fish species are shown in Figure 4. The modal number of chromosomes in yellowtail is 48 32]. This number is the same as that of medaka; all chromosomes could be paired one-to-one
between yellowtail and medaka. The number of chromosomes in the two fishes is the
same and their chromosomal structures are similar. These results suggested that we
would observe conserved synteny between yellowtail and medaka, and that the syntenic
relationship between yellowtail and zebrafish would be rather low. However, they might
not be evolutionarily closer than the relationships between yellowtail and medaka.
Medakas are a member of the order Beloniformes, which includes freshwater and marine fish, such as Pacific saury and flying fish.
Zebrafish are a member of Cypriniformes, which consists exclusively of freshwater fish. Our synteny results reflected the
known taxonomic relationships of these fishes. In Teleostei, it is thought that a
whole genome duplication and eight subsequent major rearrangements occurred about
314–404 million years ago 39]. Moreover, medaka and zebrafish are thought to have diverged after the eight major
rearrangement events, after which medaka and green-spotted pufferfish diverged. The
chromosome number in three-spined sticklebacks is the same as that of green-spotted
pufferfish. Furthermore, the Oxford grid between yellowtail and green-spotted pufferfish
was similar to that between yellowtail and three-spined sticklebacks (Figure 4). However, in the Oxford grid between green-spotted pufferfish and three-spined sticklebacks,
the chromosome groups were not paired one-to-one (Additional file 5). Yellowtail SQ11 and 14 correspond to linkage group 1 of three-spined sticklebacks,
SQ5 and 18 correspond to linkage group 4, and SQ19 and 22 correspond to linkage group
7, respectively. Then, yellowtail SQ4 and 18 correspond to chromosome 1 of green-spotted
pufferfish, SQ6 and 7 correspond to chromosome 2, and SQ11 and 12 correspond to chromosome
3, respectively. These chromosomes in three-spined sticklebacks and green-spotted
pufferfish could be paired one-to-two with those of yellowtail. The number of chromosomes
in three-spined sticklebacks and green-spotted pufferfish is N=21 and that of yellowtail
is N=24. Thus, three-spined sticklebacks and green-spotted pufferfish have three fewer
chromosomes than yellowtail; each chromosome is thought to have merged after the divergence
from medaka about 191.8 M years ago. By analysis of the whole genome sequence, conserved
segments and/or conserved segment orders will be distinguished using the RH map, and
will aid studies of the dynamics of chromosome evolution between yellowtail and model
fish species.

Figure 4. Oxford grids showing conservation of synteny between yellowtail and four model fish.
A: medaka, B: zebrafish, C: three-spined stickleback, D: green spotted pufferfish. Each box is highlighted as follows: 0–4: white square,
5–10: yellow square, 11–20: green square, 21–30: sky blue square, 31–40: blue square,
more than 40: dark blue square.

A putative sex determination locus of yellowtail was located in the Squ12 linkage
group 40]; however, its related gene has not been identified. The sex determination gene in
fish varies according to fish species: Dmy in medaka 41], Amhy in Patagonian pejerrey 42], Amhr2 in fugu 43] and sdY in the rainbow trout 44]. Therefore, the identification of sex determination genes is difficult, especially
when the conservation of these loci is not high. Detailed chromosome information is
provided by analysis of the RH map and whole genome sequences. We are interested in
whether yellowtail appeared earlier in evolution than medaka, and when yellowtail
diverged from medaka after the major rearrangements of the chromosomes. Currently,
we are studying SNPs at the genome level and performing DNA chip analyses using SNPs
in ESTs.