Aquatic adaptation and the evolution of smell and taste in whales


Genome sequencing and assembly

Muscle tissue of Antarctic minke whale was purchased from a fish market in Japan,
and the genomic DNA was extracted following the protocol of our previous work 4]. A paired-end sequencing library with average insert size of 330 bp was constructed
and sequenced on an Illumina HiSeq2000 sequencer, and then assembled into scaffolds
using PLATANUS assembler 18] ver. 1.2.1. Details about genome sequencing and de novo assembly are described in Additional file 1 §1. The Antarctic minke whale genome assembly thus obtained was named KUjira_1.0.

Cow (Bos taurus, Artiodactyla) genome assembly (UMD_3.1 assembly) 19] were downloaded from the GenBank FTP site (ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Bos_taurus/Bos_taurus_UMD_3.1/). Bottlenose dolphin (Tursiops truncatus, Odontoceti) genome assembly (Ttru_1.4 assembly) 20] were also downloaded from the GenBank FTP site (ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Tursiops_truncatus/Ttru_1.4/).

Olfaction-related genes in the cow genome

The loci of the OMACS, NQO1 and OCAM genes in the cow UMD_3.1 genome assembly follow NCBI reference sequence (RefSeq)
annotations. The gene ID of each gene is as follows: OMACS, 100299006; NQO1, 519632; OCAM, 535613. We confirmed the RefSeq annotations by comparing translated amino acid sequences
with those of other mammals. The amino acid sequences of 15 mouse TAARs (TAAR1 (GenBank
accession no. NP_444435.1), TAAR2 (NP_001007267.1), TAAR3 (NP_001008429.1), TAAR4
(NP_001008499.1), TAAR5 (NP_001009574.1), TAAR6 (NP_001010828.1), TAAR7a (NP_001010829.1),
TAAR7b (NP_001010827.1), TAAR7d (NP_001010838.1), TAAR7e (NP_001010835.1), TAAR7f
(NP_001010839.1), TAAR8a (NP_001010830.1), TAAR8b (NP_001010837.1), TAAR8c (NP_001010840.1),
TAAR9 (NP_001010831.1)) and six human TAARs (TAAR2-1 (NP_001028252.1), TAAR2-2 (NP_055441.2),
TAAR5 (NP_003958.2), TAAR6 (NP_778237.1), TAAR8 (NP_444508.1), TAAR9 (NP_778227.3))
were used as queries and TAAR sequences were searched against the cow genome assembly
using TBLASTN program ver. 2.2.25 21] with e-value cutoff of 1e-20 and without filtering query sequences. All overlapping
sequences of hits with the same orientations were merged. The sequences thus obtained
were searched against the mouse protein database (downloaded from the following URL
on 14/Oct/2011: http://www.ncbi.nlm.nih.gov/protein/?term=%22Mus+musculus%22%5Bporgn%3A__txid10090%5D) using FASTY program ver. 35.04 22] and the sequence was discarded if its best hit was not a TAAR gene. Then we aligned all the remaining sequences using the L-INS-i program in the
MAFFT package ver. 6.240 23],24] and looked for the initiation and termination codons. If we could not find initiation
and/or termination codons in a sequence, we extended the sequence in the 5’ and/or
3’ direction to find them. If a sequence was interrupted by premature stop codon(s)
and/or frame shift(s), or if it lacked one or more trans-membrane (TM) regions completely,
the sequence was judged to be a functionless pseudogene. As a result, 17 intact TAAR genes and 14 pseudoegenes were found. The classification of intact cow TAAR genes into TAAR1-9 follows the phylogenetic tree shown in Additional file 1 §3. Deduced amino acid sequences of 142 class I and 828 class II intact OR genes
were retrieved from Niimura and Nei 25].

Olfaction-related genes in the whale and dolphin genomes

For the multi-exon OMACS, NQO1 and OCAM genes, we used the DNA sequence of each exon of the corresponding cow genes as a
query and searched against the minke whale and dolphin genome assemblies using BLASTN
with e-value cutoff of 1e-20 and without filtering query sequences. The sequences
thus obtained were searched against the cow genome assembly using BLASTN and the sequence
was discarded if its best hit was not its query. Several exons of the OMACS gene cannot be found following this method, and therefore we compared the genomic
regions encoding other exons with that of cow genome in order to confirm that these
missing exons are actually deleted from whale and dolphin genomes. For dot-plot comparisons,
GenomeMatcher ver. 1.75 26] was used with default settings, and the bl2seq 27] option was chosen to output figures (Figure 1). We followed the same methods which we used to identify cow TAAR genes to identify whale and dolphin TAAR, OR and V1R genes, using the query amino acid sequences as follows: 17 intact cow TAARs for searching
TAAR sequences, 970 intact cow ORs (142 class I and 828 class II) for OR sequences and 32 cow intact V1Rs identified by Grus et al.28] for V1R sequences. Because of fragmented scaffolds, we could not find initiation and/or termination
codons of several sequences which were not judged to be pseudogenes. We labeled such
sequences as truncated genes. Under these criteria, we found 324 OR genes (60 intact, 19 truncated and 245 pseudo) and 34 V1R genes (two intact and 32
pseudo) in the KUjira_1.0 assembly, and 166 OR genes (twelve intact, two truncated
and 152 pseudo) and 18 V1R genes (one intact and 17 pseudo) in the Ttru_1.4 assembly.
However, only five TAAR genes and pseudogenes were found from the KUjira_1.0 assembly and two in the Ttru_1.4
assembly. Therefore, we compared the genomic regions encoding a cluster of TAAR genes with that of the cow genome using the GenomeMatcher program in order to confirm
that the missing TAAR genes are actually deleted from whale and dolphin genomes. In the case of multi-exon
V2R genes, we also followed the same methods as described above but we searched only
3rd exons using 3rd exons of 79 intact rat V2Rs identified by Young and Trask 29] as queries. As a result, we could not find any sequences in the Ttru_1.4 assembly
but we found one sequence in the KUjira_1.0 assembly. However, premature stop codons
interrupt its open reading frame. Therefore, we conclude that this exon is a part
of a functionless pseudogene and that neither whale nor dolphin possesses intact V2R genes.

Figure 1. Dot-plot comparisons between Antarctic minke whale (horizontal, left) or dolphin (horizontal,
right) and cow (vertical) sequences.
Color scale bar indicates sequence similarity (%) of each dot. a. Comparisons of the genomic region where the OMACS gene is encoded. Cow sequence: chr. 25 (18,535,000-18,568,000 bp). Whale: scaffold100261
(1–18,000 bp, complement). Dolphin: scaffold4608 (95,000-115,000 bp, complement).
The OMACS gene consists of 13 exons, and the position and the coding direction of each exon
is shown as a triangle with an exon-specific color. Whale and dolphin have lost the
genomic regions where the 5th, 9th, 10th and 11th exons are encoded. In addition, dolphin has lost the 1st and 2d exons. Whale’s 1st exon is not included in this scaffold (see Additional file 1 for detail). Mesh size, 1 kbp × 1 kbp. b. Comparison of the genomic region where the NQO1 gene is encoded. Cow sequence: chr. 18 (36,908,355-36,927,688 bp). Whale: scaffold73885.
Dolphin: scaffold317 (290,000-300,000 bp). The NQO1 gene consists of six exons, and the position and the coding direction of each exon
are shown as a triangle with an exon number. Genomic inversion was confirmed in the
whale and dolphin genomes around the region where the 4th and 5th exons are encoded. Whale’s 1st exon is not included in this scaffold (see Additional file 1 for detail). Mesh size, 1 kbp × 1 kbp. c. Comparison of the genomic region where the TAAR gene cluster is located. Cow sequence: chr. 9 (71,400,000-71,850,000 bp, complement).
Whale: scaffold12993. Dolphin: scaffold181 (230,000-270,000 bp, complement). Positions
and coding directions of TAAR1-9 genes are shown. Pseudogenes are indicated by red oblique lines (Cow TAAR pseudogenes are not shown). Mesh size, 10 kbp × 10 kbp.

Classification of cetacean OR genes into class I/class II

As Niimura and Nei pointed out 30], mammalian OR genes can clearly be classified into class I and class II based on
the sequence similarity. Classification of OR genes into class I and II follows Glusman
et al.31] and Niimura and Nei 25], and the whale and dolphin intact OR genes identified in this study were classified
into class I or II based on a phylogenetic tree which consists of deduced amino acid
sequences of human (retrieved from HORDE database (http://genome.weizmann.ac.il/horde/) #43), cow, whale and dolphin intact OR genes (the phylogenetic tree is shown in Additional file 1 §3). In addition to 60 (minke whale) and twelve (dolphin) intact OR genes (Figure 2), we found 19 (minke whale) and two (dolphin) truncated OR genes. We added these
truncated genes one by one to the OR phylogenetic tree and confirmed that all these
truncated OR genes are classified into class II.

Figure 2. Phylogenetic relationships, divergence times and the numbers of intact chemosensory
receptor genes of cow (Artiodactyla), Antarctic minke whale (Mysticeti) and bottlenose
dolphin (Odontoceti).
Notes: a. taken from Niimura and Nei 25]; b. taken from Shi and Zhang 52]; c. taken from Jiang et al. 7].

Genes for the sense of taste

TBLASTN searches with e-value cutoff of 1e-5 and without filtering query sequences
were employed to identify TAS1R, TAS2R and GNAT3 genes. The amino acid sequence of cow GNAT3 is retrieved from GenBank (accession
no. NP_001103452). Using all amniote GNAT3 sequences annotated in Ensembl database
(http://www.ensembl.org/index.html) (release 73) as queries, GNAT3 genes were searched against KUjira_1.0 and Ttru_1.4 assemblies. TAS1R genes were also searched against UMD_3.1, KUjira_1.0 and Ttru_1.4 assemblies using
all vertebrate TAS1R sequences annotated in Ensembl database (release 70) as queries. In the case of TAS2Rs,
we used all intact Euarchontoglires TAS2Rs identified by Hayakawa et al. 32] as queries and searched against UMD_3.1, KUjira_1.0 and Ttru_1.4 assemblies. All
overlapping sequences of hits with the same orientations were merged. The sequences
thus obtained were searched against the human (GRCh37 assembly) 33],34] and the mouse (GRCm38 assembly) 35] genome assemblies using TBLASTX and the sequence was discarded if its best hit was
not a GNAT3/TAS1R/TAS2R gene. Because TAS1Rs and GNAT3 are multi-exon genes, the results of TBLASTX were also utilized for subsequent exon
annotations. Exon regions and splicing sites of the GNAT3 and TAS1R genes identified in this study were determined by comparing GNAT3 and TAS1R sequences of cetartiodactyls with that of humans and mice using E-INS-i program in
the MAFFT package. A taste receptor gene was considered a pseudogene or truncated
gene if the same criteria were met that we followed for odorant receptors.

Gene annotation information thus obtained is available as Additional files 2, 3 and 4.

The 6th exon of GNAT3 genes in several cetaceans and artiodactyls (Additional file 1: Table S7) were sequenced using a pair of primers shown in Table 1.

Table 1. Primers used for amplifying and sequencing the 6thexon ofGNAT3gene

Fossil investigation

Several fossil whale skulls were used in this study. A skull of the pakicetid cetacean
Ichthyolestes pinfoldi (Howard – Geological Survey of Pakistan, H-GSP 98134) was described by Nummela et
al. 36]. Further preparation of sediment from the olfactory region of this specimen using
an airscribe and dental tools, revealed the cribriform plate of this specimen. A specimen
of the remingtonocetid cetacean Remingtonocetus (Indian Institute of Technology, Roorkee, IITR-SB 2770) was described by Bajpai et
al. 37], and this specimen was CT-scanned, and 3D reconstructed using AMIRA software (FEI
Visualization Science Group) ver. 5.4.1 as described by Bajpai et al. 37]. 3D reconstructions presented in Figure 3 were produced by AMIRA.

Figure 3. Cribriform plate and olfactory bulb in extinct cetaceans.a. Skull of the pakicetid cetacean Ichthyolestes pinfoldi (H-GSP 98134, described by Nummela et al. 36]) in ventral view, rectangle indicates detail shown in (b). b. Detail of (a), showing the dorsal side of the cribriform plate with some of its perforations encircled,
and the lateral wall (LW) of the olfactory chamber. c. ventral view of the endocast of the cranial cavity of the remingtonocetid cetacean
Remingtonocetus (IITR-SB 2770, described by Bajpai et al. 37]) based on 3D reconstruction of CT-scans, showing impressions of olfactory tract (OT)
and olfactory bulb (OB), area in box is enlarged in (d). dg. impression of olfactory bulb in ventral, dorsal, lateral, and cranial view respectively,
dorsal and rostral view show midline dorsal crest (DC), and lateral view shows the
contrast between convex ventral side where cranial nerve I pierces the cribriform
plate (CP), and flat dorsal side.