QuickNGS elevates Next-Generation Sequencing data analysis to a new level of automation

Test run on previously published RNA-Seq data

To illustrate the practical use of our software, we have re-analyzed 10 RNA-Seq samples
from a study on transcription factors in Drosophila where 4 mutant conditions were
compared to a control with two biological replicates 18]. After feeding the sample metadata into the QuickNGS database, we have linked the
20 FastQ files into the QuickNGS stack directory. These preparing steps took us an
overall time of 2 min. While waiting for the subsequent pipeline run to finish, we
were able to monitor the current status of the respective modules using the status
page on the QuickNGS database interface (Fig. 1b). The RNA-Seq workflow comprises an initial quality check using FastQC plus some
software which is unique to QuickNGS. The basic data processing consists of a splicing-aware
alignment using Tophat2 12] followed by reference-guided transcriptome reassembly with Cufflinks2 19]. Differential gene expression and differential exon usage are analyzed with DESeq2
(1], Genome Biol) and DEXSeq (2], Genome Res). After the processing finished overnight, we logged in to the QuickNGS
user interface and found a report which summarizes all results of the QuickNGS workflow
(Fig. 2). From the initial quality check, we received some basic read statistics (Table 2) as well as standard QC plots, a heat map (Fig. 3a) and a plot from a principle component analysis (Fig. 3b) for the 10 samples. The results of the core analysis for the comparison of atf3
mutants (atf3a_1 and atf3a_2) against controls (yw_1 and yw_2) are provided as Additional
files. At thresholds 5 and 0.01 for fold-change and p-value, we get a set of 93 differentially
expressed genes (Additional file 1) and a set of 168 differentially used exons (Additional file 2). Additional file 3 reports the p-values and fold-changes for differential gene expression (atf3a_1 and
atf3a_2 compared to yw_1 and yw_2) together with those for the comparisons of the
remaining three mutant conditions to control (atf376_1 and atf376_2, foxo_1 and foxo_2,
rel_1 and rel_2, each compared against yw_1 and yw_2). On the web interface, the same
three spreadsheet files are given also for these comparisons. All tables contain a
comprehensive selection of genomic and functional annotation. Visualisation of the
RNA-Seq wiggle files on the UCSC Genome Browser can be accessed by a hyperlink which
uses a local password-protected track hub for the browser. The FastQ files for these
test data are available from the NCBI Short Read Archive (SRA) at accession number
SRP011390.

Table 3. Comparison of the technical features of QuickNGS to those of other NGS analysis workflow
systems

Fig. 3. a Heatmap on the 10 RNA-Seq test data sets: The replicates of each genotype do not
perfectly cluster together in distinct subclusters. This is likely to be cause by
a combined effect of ribosomal contamination and batch effects. b The principle component analysis confirms that two samples (depicted in red) which
were processed in a separate batch and with ribozero treatment cluster distantly from
the remaining samples

Description of other QuickNGS workflows

Although the current QuickNGS release also comprises workflows for miRNA sequencing,
ChIP-Seq and whole-genome resequencing, we gave above a detailed description only
for the RNA-Seq workflow. However, the same level of efficiency and automation is
also achieved in all other QuickNGS workflows. The miRNA-Seq workflow comprises quantification
and differential profiling of 3p and 5p mature miRNAs using miRDeep 6] as well as statistics on miRNA families. Differential miRNA expression is profiled
with the DESeq2 package 1]. The ChIP-Seq workflow takes advantage of BWA 14] for genomic alignment of the reads and uses MACS2 5] for peak calling. Furthermore, QuickNGS identifies all genes which are 2000 bp up-
or downstream from the MACS2 peaks. The peak sequences are analyzed for enrichment
of transcription factor binding motifs using MEME-ChIP 15]. The results comprise lists of significant peaks and reports for motif enrichment.
For the whole-genome resequencing workflow, finally, the software uses BWA for genomic
alignment and calls single nucleotide polymorphisms with SAMtools 13] and structural variations with Delly 16]. The results are annotated and functionally classified with SNPeff 3]. Basic QC statistics and password-protected track hubs for the UCSC Genome Browser
with direct hyperlinks for visualisation are part of all workflows. The QuickNGS database
comes with ready-made metadata for additional test data which are available from the
SRA at NCBI at accession numbers SRP043191 (miRNA-Seq), SRP007261 (ChIP-Seq) and SRP020555
(whole-genome resequencing). Additional modules dedicated to cancer genomics and more
recent NGS applications such as CLIP-Seq (cross-linking immunoprecipitation followed
by sequencing) are currently under development.

Features of QuickNGS compared to other NGS workflow systems

In order to elaborate how QuickNGS performs in comparison to other NGS workflow systems,
we discuss the features that are unique to our solution as well as its limitations.
The degree of automation in QuickNGS is much higher than, for instance, that of an
appropriate workflow in popular data analysis frameworks like Galaxy 7], GenePattern 17] or Chipster 10]. This makes the system more efficient for the typical standard analyses, but also
less flexible to modifications. In particular, our system enables an extreme reduction
of the hands-on (not computation) time that staff have to spend for the basic NGS
data analyses. Data processing for tens or hundreds of samples can be initiated in
less than 10 min. While the subsequent analyses completely run in the background,
they can be monitored on the status website and, once finished, the results are ready
for immediate access by any scientist without specific IT skills. In contrast to all
other systems, the QuickNGS database is capable of organizing sample meta information
along with the analysis results, enabling a high degree of reproducibility and documentation
of what analyses have been done. This is essential whenever large numbers of samples
are processed. Our software is also the only one to summarize all analysis results
into user accounts with ready-to-deliver web reports. An overview of the features
of several NGS workflow systems compared to QuickNGS is given in Table 3.