High-quality permanent draft genome sequence of the Parapiptadenia rigida-nodulating Cupriavidus sp. strain UYPR2.512

Genome project history

This organism was selected for sequencing on the basis of its environmental and agricultural
relevance to issues in global carbon cycling, alternative energy production, and biogeochemical
importance, and is part of the Genomic Encyclopedia of Bacteria and Archaea, The Root Nodulating Bacteria chapter (GEBA-RNB) project at the U.S. Department of
Energy, Joint Genome Institute 25]. The genome project is deposited in the Genomes OnLine Database 14] and the high-quality permanent draft genome sequence in IMG 26]. Sequencing, finishing and annotation were performed by the JGI using state of the
art sequencing technology 27]. A summary of the project information is shown in Table 2.

Table 2. Genome sequencing project information for Cupriavidus sp. strain UYPR2.512

Growth conditions and DNA isolation

Cupriavidus sp. strain UYPR2.512 was grown to mid logarithmic phase in TY rich media 10] on a gyratory shaker at 28°C. DNA was isolated from 60 mL of cells using a CTAB (Cetyl
trimethyl ammonium bromide) bacterial genomic DNA isolation method 29].

Genome sequencing and assembly

The draft genome of Cupriavidus sp. UYPR2.512 was generated at the DOE Joint Genome Institute 27]. An Illumina Std shotgun library was constructed and sequenced using the Illumina
HiSeq 2000 platform which generated 29,312,424 reads totaling 4,396.9 Mbp 30]. All general aspects of library construction and sequencing performed at the JGI
can be found at the JGI web site 31]. All raw Illumina sequence data was passed through DUK, a filtering program developed
at JGI, which removes known Illumina sequencing and library preparation artifacts
(Mingkun L, Copeland A, Han J. unpublished). Artifact filtered sequence data was then
screened and trimmed according to the k–mers present in the dataset. High–depth k–mers,
presumably derived from MDA amplification bias, cause problems in the assembly, especially
if the k–mer depth varies in orders of magnitude for different regions of the genome.
Reads with high k–mer coverage (30x average k–mer depth) were normalized to an average
depth of 30x. Reads with an average kmer depth of less than 2x were removed. Following
steps were then performed for assembly: (1) normalized Illumina reads were assembled
using Velvet version 1.1.04 32] (2) 1–3 Kbp simulated paired end reads were created from Velvet contigs using wgsim
33] (3) normalized Illumina reads were assembled with simulated read pairs using Allpaths–LG
(version r41043)34]. Parameters for assembly steps were: 1) Velvet (velveth: 63 –shortPaired and velvetg:
-very clean yes –exportFiltered yes –min contig lgth 500 –scaffolding no –cov cutoff
10) 2) wgsim (-e 0 –1 100 –2 100 –r 0 –R 0 –X 0) 3) Allpaths–LG (PrepareAllpathsInputs:
PHRED 64?=?1 PLOIDY?=?1 FRAG COVERAGE?=?125 JUMP COVERAGE?=?25 LONG JUMP COV?=?50,
RunAllpathsLG: THREADS?=?8 RUN?=?std_shredpairs TARGETS?=?standard VAPI_WARN_ONLY?=?True
OVERWRITE?=?True). The final draft assembly contained 369 contigs in 365 scaffolds.
The total size of the genome is 7.9 Mbp and the final assembly is based on 839.6 Mbp
of Illumina data, which provides an average of 106.8x coverage.

Genome annotation

Genes were identified using Prodigal 35], as part of the DOE-JGI genome annotation pipeline 36,37] followed by a round of manual curation using GenePRIMP 38] for finished genomes and Draft genomes in fewer than 10 scaffolds. The predicted
CDSs were translated and used to search the National Center for Biotechnology Information
(NCBI) non-redundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases.
The tRNAScanSE tool 39] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against
models of the ribosomal RNA genes built from SILVA 40]. Other non–coding RNAs such as the RNA components of the protein secretion complex
and the RNase P were identified by searching the genome for the corresponding Rfam
profiles using INFERNAL 41]. Additional gene prediction analysis and manual functional annotation was performed
within the Integrated Microbial Genomes-Expert Review (IMG-ER) system 42] developed by the Joint Genome Institute, Walnut Creek, CA, USA.