Development of a gene synthesis platform for the efficient large scale production of small genes encoding animal toxins

Synthetic biology, an interdisciplinary branch of biology, is quickly becoming one of the most attractive areas of research thanks to the recent developments in gene synthesis technology. In combination with intelligent gene design, gene synthesis is emerging as a valuable tool to support recombinant protein expression. De novo gene design allows optimizing codon usage to the recombinant host system thus promoting the effective operation of the cellular translational machinery. In addition, in cases where the nucleic acid template is not available, gene synthesis allows creating DNA molecules de novo. The exponential growth of genomic and metagenomic databases and the current limitations in using this highly useful sequence information due to the lack of tangible DNA are promoting the rapid development of novel gene synthesis technologies.

In recent years, a variety of gene synthesis methodologies have been developed based on the assembling of oligonucleotides into complete genes. Early approaches advanced to synthesize nucleic acids used the enzymatic ligation of pre-formed duplexes of phosphorylated overlapping oligonucleotides [1]. Subsequently, self-priming PCR [2], PCR assembly [3], Polymerase chain assembly (PCA) [4] and template-directed ligation [5] were described as efficient concepts for de novo gene synthesis of nucleic acids. Recently, methods based on a two-step approach were reported for the production of long DNA sequences. Examples of these technologies are the PCR-based thermodynamically balanced inside-out technology (TBIO) [6], the two-step total gene synthesis method [7] that combines both dual asymmetrical PCR (DA-PCR) and overlap-extension (OE-PCR), the PCR-based two-step DNA synthesis (PTDS) [8] and PCR-based accurate synthesis (PAS) [9].

Lately, improvements in PCR-based gene synthesis methods, as exemplified by the development of the improved PCR synthesis (IPS) and the simplified gene synthesis (SGS) protocols [8, 9], have been described and incorporate significant simplifications over earlier strategies. SGS uses oligonucleotides of 40 nucleotides (nt) in length and 18–20 nt of overlap region, which are assembled in a unique PCR-assembly reaction leading to the direct construction of the full-length DNA molecule. The simplicity of this protocol combined with its relative low cost, since there no requirement for phosphorylation or purification of the oligonucleotides exists, are a solid base for the development of even more effective PCR-based methods. However, major drawbacks persist and effective improvements need to be implemented in current synthetic protocols to allow their translation to a large scale. One of the major bottlenecks of current gene synthesis protocols consists on the quality of the oligonucleotides used for nucleic acid assembly. It is known that all current gene synthesis methods accumulate errors in the final synthetic molecules. Sequence errors usually derive from the incorporation of imperfect synthetic oligonucleotides or result from low fidelity rates associated with the enzymatic assembling step. Current oligonucleotide synthesis methods produce sequences that are often prematurely terminated, or comprise internal mutations (error rates range from 1 to 10 mutation per kilobase (kb)) [10]. In addition, chemical synthesis of DNA molecules usually not only involve moderate to high error rates but also high costs. Moreover, the chemical synthesis of a desired gene also depends on the accuracy of the DNA polymerase used to assemble the oligonucleotides in a final DNA sequence. Therefore, DNA errors are inevitable and it is often necessary to remove the incorrect synthetic DNA molecules using enzymatic methods [11, 12]. Improvements in oligonucleotide quality, error correction and DNA polymerase efficacy are thus urgently required.

Conventionally, PCR-based gene synthesis is employed to produce a single gene at a time. Thus, development of automated platforms that effectively generate large libraries of nucleic acids is urgently needed. The different steps leading to a single PCR-assembly strategy need to remain simple, accurate and robust when extended to the assembly of multiple genes simultaneously. To develop large scale methods, many factors that affect the efficiency of gene assembly, such as DNA polymerases performance or oligonucleotide concentration and quality require optimization. This work describes different approaches carried out to optimize current gene synthesis protocols. The data was integrated to develop a novel platform which was applied to efficiently synthesize and clone a large number of nucleic acids encoding venom peptides. This automated platform can be translated to the rapid generation of complex gene libraries encoding different families of biotechnologically relevant and valuable proteins and peptides.