Parameter advising for multiple sequence alignment

Parameter advising

We apply parameter advising to boost the true accuracy of the Opal aligner 4,5], where the advisor is using parameter sets found by the -approximation algorithm. Figure 1 shows the accuracy of the advisor for a parameter set of size k = 10, where the benchmarks are assigned to bins based on their accuracy using a default
parameter choice; the figure also shows the accuracies when using a single default
parameter choice, and an oracle. The number of benchmarks per bin is indicated above
the columns. An oracle is an advisor that knows the true accuracy of an alignment; its accuracy is shown
by the dotted line, which gives the performance of a perfect advisor. Notice that
in many cases the performance of the estimator is close to the oracle. This is most
clear on the bin which has lowest average accuracy, where advising increases the average
accuracy by almost 20% compared to using a single default parameter.

Figure 1. Advising accuracy of Facet within benchmark bins.

Figure 2 shows the average advising accuracy for parameter sets of various cardinalities using
as the estimator Facet 3], TCS 6], MOS 7], and PredSP 8], where in the average, benchmark bins contribute equally. The vertical axis is advising
accuracy on the testing data, averaged over all benchmarks and all folds using 12-fold
cross-validation. The horizontal axis is the cardinality k of the greedy advisor set. Greedy advisor set found by the approximation algorithm
are augmented from the exact set of cardinality ? = 1 (namely, the best single parameter
choice). Notice that Facet (the topmost curve in the plot) continues to increase in
advising accuracy up to cardinality k = 6. Notice also that while all of the advisors reach a plateau, for Facet this occurs
at a greater cardinality and accuracy than for other estimators.

Figure 2. Average advising accuracy of estimators on sets of varying cardinality.

Accuracy estimation

Our tool Facet (Feature-based Accuracy Estimator) 9] is an easy-to-use, open-source utility for estimating the accuracy of a protein multiple
sequence alignment. Facet evaluates the estimated accuracy of a computed alignment
as a linear combination of real-valued feature functions. We considered 12 features
of which we found an optimal subset of 5 that provide the best performance for alignment
advising. Many of the most useful features utilize information about protein secondary
structure. We find coefficients by fitting the difference in estimator values to the
difference in true accuracy for pairs of examples where the correct alignment is known.
This “difference fitting” approach is computationally efficient and yields an estimator
that works well for advising.

Facet is open-source software that allows users to estimate accuracy as either (1)
a stand alone tool, or (2) a software library that can be integrated into a pre-existing
Java application. The implementation provides optimized default coefficients and features.
These coefficients may also be specified manually and new features can also be added.
Figure 3 shows a simple example of using Facet within a Java application to choose between
two alignments of the same set of sequences. The secondary structure predictions are
computed on the unaligned sequences and can be reused between the two alignments.

Figure 3. Example of invoking Facet in Java.

The Facet website provides parameter sets that can be used with the Opal aligner (namely
substitution matrices and affine gap penalties), as well as scripts for structure
prediction.