The goose genome sequence leads to insights into the evolution of waterfowl and susceptibility to fatty liver

Genome assembly and annotation

We sequenced an individual Anser cygnoides genome using an Illumina HiSeq-2000 instrument, obtaining approximately 139.55 Gb
with small-insert-size libraries (200 bp, 500 bp, or 800 bp; average read length:
100 bp) and large-insert-size libraries (2 kb, 5 kb, 10 kb, or 20 kb; average read
length: 49 bp; Additional file 1: Table S1). Sequence data were assembled into a 1.12-Gb draft genome using SOAPdenovo
software (Table 1). Our assembly covered 98% of the transcriptome-assembled unigenes (Additional file
1: Table S2), indicating that the genome sequence was of high quality. The average
GC content of the goose genome is approximately 38%, similar to that of other birds
such as chicken, duck, turkey, and zebra finch (Additional file 2: Figure S1). By combining homology-based, ab initio prediction and transcriptome-assisted methods, we predicted 16,150 genes (Additional
file 1: Table S3), 75.7% of which are supported by homology-based evidence (Additional file
1: Table S4), and 77.7% are covered by transcriptome reads (Table 1). We found that 77.7% of the identified genes were well supported by public protein
databases (Additional file 1: Table S5). The repeat content of the goose genome is similar to that of chicken,
duck, turkey, and zebra finch (Additional file 1: Table S6). We also predicted 153 microRNAs (miRNAs), 69 rRNAs, 226 tRNAs, and 206
small nuclear RNAs (snRNAs) in the goose genome (Additional file 1: Table S7).

Table 1. Assembly and annotation statistics for the goose genome

Comparative genomic analysis

We compared genome synteny and orthologous relationships among bird genomes. The goose
genome has a high synteny with the duck genome 8], which covered approximately 81.09% and 82.35% of each genome, respectively (Additional
file 1: Table S8 and Additional file 2: Figure S2), whereas approximately 592 goose scaffolds with lengths 5 kb mapped
to and occupied 67.67% of the chicken genome 9] (Additional file 1: Table S8 and Additional file 2: Figure S3). In addition, we found that chromosomal rearrangements occur between
the goose and chicken genomes (Additional file 1: Tables S9 and S10 and Additional file 2: Figure S4). For example, scaffold 45 is a goose genome sequence fragment, but it
was in synteny with chromosomes 4 and 5 of the chicken genome. When comparing orthologs,
70% of the goose genes corresponded with 1:1 orthologs in the chicken gene-set (Additional
file 2: Figure S5). Of the 1:1 orthologs for goose vs. duck (8,322 orthologs), however,
26.62% share up to 90% identity (Additional file 2: Figure S5). For chicken vs. turkey, 48.33% of the 1:1 orthologs (9,378 orthologs)
share up to 90% identity (Additional file 2: Figure S5). For peregrine vs. saker, 57.87% of the 1:1 orthologs (10,569 orthologs)
share up to 90% identity (Additional file 2: Figure S5).

A phylogenetic tree of eight avian species (goose, duck, chicken, turkey, zebra finch,
pigeon, peregrine, and saker) was constructed using 4-fold degenerate sites from 5,081
single-copy orthologs. Analysis of the resulting tree revealed that geese and ducks
belong to a subclade that was most likely derived from a common ancestor approximately
20.8 million years ago (Mya), whereas the chicken and turkey diverged 20.0 Mya, and
the peregrine and saker diverged 1.3 Mya (Figure 1 and Additional file 2: Figure S6). Of the nine species, goose-specific gene families (other species lack
these families) have enriched gene ontology (GO) functions, such as zinc ion binding,
integrase activity, and DNA integration. Moreover, the olfactory receptor activity,
DNA metabolic processing, G-protein coupled receptor activity, and transmembrane receptor
activity GO categories exhibit the most significant gene-family expansion when compared
with others birds (Additional file 1: Table S11), indicating that these function were enhanced during goose evolution.

Figure 1. Divergence times for the nine species investigated in this study. A phylogenetic tree
based on 4-fold degenerate sites in single-copy orthologous genes is shown. The divergence
time estimates were calibrated using fossil data for lizard-bird and chicken-zebra
finch. The estimated divergence times and associated 95% CIs are shown.

Rapidly and slowly evolved GO terms

To identify the GO categories that have undergone rapid or slow evolution in waterfowl,
we compared two waterfowl (goose and duck) with terrestrial birds (chicken and turkey).
We searched for functionally related genes with exceptionally high or low selection
constraints in the goose and duck. For categories with at least 10 genes, the ? value
(??=?Ka/Ks, where Ka?=?number of non-synonymous substitutions per non-synonymous site,
and Ks?=?number of synonymous substitutions per synonymous site) was calculated for
these categories and normalized using the median ? of each species pair. We identified
191 GO categories with elevated Ka/Ks ratios at the specified threshold between the
waterfowl and terrestrial birds (Additional file 1: Table S12). Nineteen of these GO categories, including GTPase activity, galactosyltransferase
activity, chloride transport, and GABA-A receptor activity may have undergone significantly
rapid evolution (Additional file 1: Table S12).

Positive selection

Ortholog identification was performed for goose, duck, zebra finch, chicken, turkey,
and pigeon genome sequences, using the method applied for accelerated GO category
analysis. Alignments of 7,861 orthologous genes were used to estimate the ratio of
the rates of non-synonymous and synonymous substitutions per gene (?), using the Codeml
program under a branch-site model and F3x4 codon frequencies. We then performed a
likelihood ratio test and identified 21 positively selected genes (PSGs) in waterfowl
branches by means of FDR adjustment with Q-values 0.05 (Additional file 1: Table S13). Several of the PSGs, including eIF-3S1, GATA1, and eIF-3A, are involved
in transcription or translation regulation. Kinase (PIK3R, FGFR2) and signaling molecule
(KAI1) genes were also under positive selection, indicating that they may be involved
in adaptation to an aquatic environment (Additional file 1: Table S13).

The resistance of waterfowl to disease

The major histocompatibility complex (MHC) gene is widely expressed in jawed vertebrates,
and its function correlates with host disease resistance and immune responses 10]-12]. Transposable elements in the chicken MHC region are more prevalent compared to the
goose MHC region (54.62% in chicken vs. 15.11% in goose; Additional file 1: Table S14). Moreover, the distribution of the goose and chicken MHC region is different
(Additional file 1: Table S15 and Additional file 2: Figure S7). In addition, we found that the goose genome exhibits substantial copy-number
variations of innate immune response-related genes, as well as gene structures, when
compared with chicken, turkey, zebra finch, human, and rat genomes (Additional file
1: Table S16). RNA viruses that escape toll-like receptors and infiltrate the cytoplasm
are recognized by Retinoic acid-inducible gene I (RIG-I), a pattern-recognition receptor
that plays an important antiviral role 13]-16]. Results from recent studies have shown that RIG-I is present in most mammals and
some birds 17]-19]. We found that RIG-I genes aligned well between goose and zebra finch (Additional
file 1: Tables S17 and S18), but only fragments of the goose RIG-I aligned with the chicken
and turkey RIG-I genes (Additional file 1: Table S19). We constructed a phylogenetic tree based on these data (Additional file
2: Figures S8 and S9) and found that the RIG-I gene is absent in chickens and turkeys.
Compared to turkeys and chickens, some mammal and waterfowl species have increased
resistance to the influenza virus 20],21]. This phenomenon may be because most mammals have two Myxovirus resistance (Mx) genes,
while avian birds have only one. The Mx gene is a member of the guanine-3 phosphokinase
gene family, and its expression is induced by interferons 21]. Many Mx proteins have been shown to provide influenza virus resistance at the cellular
level 22]. Moreover, the different Mx proteins confer resistance to different diseases, and
single base mutations can affect the ability of the protein to confer resistance 21],22]. In addition, the phylogenetic tree shows that mutations at key sites in the chicken
and turkey Mx genes may inactivate the Mx protein, affecting antiviral activity and
leading to diminished viral resistance (Additional file 2: Figures S10 and S11).

The susceptibility of geese to fatty liver

The liver is a vital organ that plays an important role in lipid metabolism, digestion,
absorption, synthesis, decomposition, and transport. Under natural conditions, birds,
especially some wild waterfowl, are more likely to show non-pathological hepatic steatosis
as a result of energy storage before migration 23]. To identify the genetic mechanism underlying the occurrence of fatty liver, many
previous studies have focused on goose fatty liver formation 5]-7],24],25]. However, to date, the adaptive molecular mechanisms that induce higher synthesis
of hepatic lipids, especially unsaturated fatty acids, in response to carbohydrate-rich
diets remain to be understood in waterfowl species. To establish the molecular mechanism
responsible for fat deposition in goose liver, we analyzed goose liver tissues in
terms of cell morphology and plasma parameters, as well as performed tissue transcriptome
and microRNA sequencing and analysis. After 20 d of overfeeding, the body weights
of overfed geese were significantly higher than that of control geese. Liver weights
were considerably higher in overfed geese (P 0.01) and accounted for 8.44% of the overall body weight, compared with 3.26% in
the control geese (Additional file 1: Table S20). During the force-feeding period, overfeeding significantly increased
the glucose, total cholesterol (TC), triglyceride (TG), and free fatty acid serum
concentrations (Additional file 1: Table S21). Figure 2 shows that overfeeding of geese with a high-energy diet resulted in liver enlargement,
with several lipid droplets deposited in the liver cells. Transcriptome analysis showed
that the gene expression levels of key enzymes involved in hepatocyte fatty acid synthesis
(hk1, gpi, pfkm, pdh, cs, acly, mdh1, me1, acc, fasn, elovl6, scd, fads1, fads2, and dgat2) were significantly elevated (red italic lettering shown in Figure 3 and Table 2), while the activities of extracellular liver lipoprotein lipase (lpl) and the first key enzyme (pksG) involved in hepatic cholesterol synthesis were significantly reduced (green italic
lettering in Figure 3 and Table 2). The expression of fatty acid transport protein genes (fatp), which are responsible for the transport of exogenous lipids into cells 26], was significantly increased (Figure 3 and Table 2). In contrast, expression of apolipoprotein B (apoB), which is responsible for binding with endogenous lipids and promoting their diffusion
from liver cell membranes as very low-density lipoproteins (VLDLs) 27],28], was significantly attenuated (Figure 3 and Table 2). Previous studies have shown that lpl plays a major role in lipolysis of fatty acids from extracellular chylomicrons or
VLDL, which can then be used or deposited in fat or muscle tissues 7],23]. The reduction in lpl activity increases the tendency for a large amount of extracellular lipids to diffuse
into liver cells. These results suggest that the mechanism of goose fatty liver formation
is mainly attributable to an imbalance between the storage and secretion (as plasma
lipoproteins) of newly synthesized endogenous lipids and exogenous lipids in the cytoplasm.
The liver lipid secretion capacity cannot offset the storage of newly synthesized
cytoplasmic lipids, resulting in fat deposition in the liver.

Figure 2. Comparison of livers and liver tissue sections between overfed and control geese.
(A) Goose liver tissue section after 3 weeks of overfeeding (200×); (a) Goose liver after
3 weeks of overfeeding. (B) Normal goose liver tissue section (200×); (b) Normal goose liver.

Figure 3. Glucose and lipid metabolic pathways in goose liver. The diagram shown is based on
established KEGG pathways and hepatic lipid-metabolism findings from previous studies.
The solid lines represent single-step reactions, whereas the dotted lines indicate
multi-step reactions. Red and green italic letters represent increased and decreased
liver gene expression levels, respectively, when comparing a goose overfed a carbohydrate-rich
diet vs. the control group fed a normal diet. Gene symbols: acc, acetyl-Coenzyme A
carboxylase; acly, ATP citrate lyase; apoB, apolipoprotein B; cs, citrate synthase;
dgat2, diacylglycerol O-acyltransferase 2; elovl6, elongation of very long chain fatty
acids protein 6; fads1, fatty acid desaturase 1; fads2, fatty acid desaturase 2; fasn,
fatty acid synthase; gpi, glucose-6-phosphate isomerase; hk1, hexokinase 1; lep, leptin;
lpl, lipoprotein lipase; pdh, pyruvate dehydrogenase; pfkm, phosphofructokinase; scd,
stearoyl-CoA desaturase; fatp, fatty acid transporter protein; pksG, hydroxymethylglutaryl-CoA
synthase.

Table 2. Information on the expression of glucolipid metabolism-related genes in goose liver

In addition, we found that the copy numbers of some genes related to liver lipid synthesis
and transportation were significantly greater than those in other species. For example,
the goose has more than three times as many scd gene copies than that found in the Gallus gallus, A. cygnoides, and Homo sapiens genomes (Additional file 1: Table S22). The Scd gene is a key enzyme in the hepatic synthesis of monounsaturated fatty acids. Its
gene expression is independently regulated by insulin and leptin, which exerts different
regulatory effects: insulin promotes scd gene expression, whereas leptin plays an inhibitory role 29]-32]. Through the Jak2, ERK1/2, and p90RSK signaling pathways, leptin can regulate the
sp1 transcription factor downstream of the scd gene promoter to inhibit scd gene expression 33]. Moreover, some studies have reported that the loss or inhibition of SCD could be
of benefit for the treatment of obesity, hepatic steatosis, and other metabolic disorders
24],34]. However, our study showed that A. cygnoides may not possess the lep gene (Figure 4 and Additional file 1: Table S23). The existence of lep in birds (especially domestic fowl) remains an elusive and controversial question
35],36], although lep sequences have been identified in some birds (Peregrine falcon, F. peregrinus, mallard, and zebra finch) 37]. In this study, we downloaded all known sequences of the lep gene as reference sequences for comparison with the goose genome. However, no similar
fragments or reads were found that aligned to these genes. Considering that the GC
content in birds is much higher than in mammals, it is possible that the lep gene is present in the goose genome, but that it resides in a region that was not
sequenced. However, we were unable to clone this gene from the goose genome by PCR,
after multiple attempts. Likewise, despite numerous large sequencing projects accruing
more than 600 K EST sequences and the repeated assembly of the chicken genome sequence,
the lep gene has not been identified in the chicken genome or that of two other domestic
birds (ducks and turkeys) 37]. More effort should be dedicated to determining the presence or absence of the lep gene in future studies.

Figure 4. Proximal regions of the lep gene in H. sapiens. The figure shown depicts a region of H. sapiens chromosome 7 (126.0 to 129.4 Mb), for which the blue arrows indicate the orientation
of genes along the chromosome. Arrows pointing to the right or left represent genes
expressed from the positive or negative DNA strand, respectively. The relative position
of the H. sapiens LEP gene is shown with red labeling. The black areas along the chromosome represent regions
of co-linearity between H. sapiens and A. cygnoides that are generally considered to be conserved. The figure shows no co-linear regions
near the LEP gene in H. sapiens. The dotted line represents A. cygnoides genes, which were distributed in different scaffolds, with the links determined based
on the existence of homologous genes in adjacent regions of the H. sapiens chromosome. Numbers in the dashed boxes represent the number of homologous links
of H. sapiens genes found in A. cygnoides (BLASTP: e value set to 1e-5). The gaps in the dashed line represent corresponding genes in H. sapiens that lacked homologous genes in A. cygnoides. The diagram shows that during the evolutionary divergence of H. sapiens and A. cygnoides, genomic fragments near the lep gene may have been deleted from the goose genome, with the lep gene excluded from further replication as a result.

Results from previous studies have shown that the toxic and damaging effects of saturated
fatty acids (SFAs) in the liver are significantly stronger than those of monounsaturated
fatty acids (MUFAs) 38],39]. The implication of these findings is that the physiological transformation of SFA
into MUFA by scd enzymes could alleviate the toxic effects of excessive liver exposure to SFA. Furthermore,
the results of some studies have indicated that ob/ob mice (lep-deficient model mice) readily develop hepatic steatosis, but do not show spontaneous
progression to steatohepatitis or liver fibrosis 40],41] because leptin is an essential mediator of hepatic fibrogenesis 41],42]. We therefore hypothesize that deletions of the goose lep gene may result from positive selection, thus allowing the liver to adapt energy
storage mechanisms for long-distance migration, as observed in other wild birds 43]. In addition, our results indicated that microRNAs are closely related to goose liver
lipid metabolism in that multiple genes related to lipid synthesis or transport (lpl, fads1, pfkm, mdh1, pksG, fatp, acly, scd, cs, and elovl1) are regulated by single or multiple microRNAs (Additional file 1: Table S24), although this requires further verification.