High-quality permanent draft genome sequence of Bradyrhizobium sp. Ai1a-2; a microsymbiont of Andira inermis discovered in Costa Rica


Genome project history

This organism was selected for sequencing on the basis of its environmental and agricultural
relevance to issues in global carbon cycling, alternative energy production, and biogeochemical
importance, and is part of the Genomic Encyclopedia of Bacteria and Archaea, Root
Nodulating Bacteria (GEBA-RNB) project at the U.S. Department of Energy, Joint Genome
Institute (JGI). The genome project is deposited in the Genomes OnLine Database 20] and a high-quality permanent draft genome sequence in IMG 21]. Sequencing, finishing and annotation were performed by the JGI using state of the
art sequencing technology 22]. A summary of the project information is shown in Table 2.

Table 2. Project information

Growth conditions and genomic DNA preparation

Bradyrhizobium sp. Ai1a-2 was cultured to mid logarithmic phase in 60 ml of TY rich media on a gyratory
shaker at 28°C 23]. DNA was isolated from the cells using a CTAB (Cetyl trimethyl ammonium bromide)
bacterial genomic DNA isolation method 24].

Genome sequencing and assembly

The draft genome of Bradyrhizobium sp. Ai1a–2 was generated at the DOE Joint Genome Institute (JGI) using the Illumina
technology 25]. An Illumina standard shotgun library was constructed and sequenced using the Illumina
HiSeq 2000 platform which generated 21,669,974 reads totaling 3,250.5 Mbp. All general
aspects of library construction and sequencing were performed at the JGI and details
can be found on the JGI website 26]. All raw Illumina sequence data was passed through DUK, a filtering program developed
at JGI, which removes known Illumina sequencing and library preparation artifacts
(Mingkun L, Copeland A, Han J, Unpublished). Following steps were then performed for
assembly: (1) filtered Illumina reads were assembled using Velvet (version 1.1.04)
27], (2) 1–3 Kbp simulated paired end reads were created from Velvet contigs using wgsim
28], (3) Illumina reads were assembled with simulated read pairs using Allpaths–LG (version
r42328) 29]. Parameters for assembly steps were: 1) Velvet (velveth: 63 –shortPaired and velvetg:
?very_clean yes –exportFiltered yes –min_contig_lgth 500 –scaffolding no –cov_cutoff
10) 2) wgsim (?e 0 –1 100 –2 100 –r 0 –R 0 –X 0) 3) Allpaths–LG (PrepareAllpathsInputs:
PHRED_64 = 1 PLOIDY = 1 FRAG_COVERAGE = 125 JUMP_COVERAGE = 25 LONG_JUMP_COV = 50,
RunAllpathsLG: THREADS = 8 RUN = std_shredpairs TARGETS = standard VAPI_WARN_ONLY
= True OVERWRITE = True). The final draft assembly contained 247 contigs in 246 scaffolds.
The total size of the genome is 9.0 Mbp and the final assembly is based on 1,081.2
Mbp of Illumina data, which provides an average 119.7X coverage of the genome.

Genome annotation

Genes were identified using Prodigal 30], as part of the DOE-JGI genome annotation pipeline 31],32]. The predicted CDSs were translated and used to search the National Center for Biotechnology
Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and
InterPro databases. The tRNAScanSE tool 33] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against
models of the ribosomal RNA genes built from SILVA 34]. Other non–coding RNAs such as the RNA components of the protein secretion complex
and the RNase P were identified by searching the genome for the corresponding Rfam
profiles using INFERNAL 35]. Additional gene prediction analysis and manual functional annotation was performed
within the Integrated Microbial Genomes-Expert Review (IMG-ER) system 36] developed by the Joint Genome Institute, Walnut Creek, CA, USA.