Methods
The database, which consists of 18,990 sequence set files plus their reference alignments, and scripts used for benchmarking are available [32]. Plots showing BRALISCORE, SCI, and SPS versus APSI for all alignment sets (k ∈ 2, 3, 5, 7, 10, 15) and for all programs given in Table 5 can also be found there.

Reference alignments
For the construction of reference alignments we used "seed" alignments from the Rfam database version 7.0 [24,23]. In most cases these alignments are hand-curated and thus of higher quality than Rfam's "full" alignments generated automatically by the INFERNAL RNA profile package [40]. Alignments with less than 50 sequences were discarded to increase the possibility for creation of subalignments (see below). The SCI (see below) for scoring of structural alignment quality is based on a combination of thermodynamic and covariation measures. Thermodynamic structure prediction becomes increasingly inaccurate with increasing sequence length – e. g. due to kinetic effects – but is widely regarded as sufficiently accurate for sequences not exceeding 300 nt in length [41,42]. Thus we excluded alignments with an average sequence length above 300 nt to ensure proper thermodynamic scoring.
To each remaining seed alignment we applied a "naive" combinatorial approach that extracts sub-alignments with k ∈ {2, 3, 5, 7, 10, 15} sequences for a given average pairwise sequence identity range (APSI; a measure for sequence homology computed with ALISTAT from the squid package [43]). Therefore we computed identities for all sequence pairs from an alignment and selected those pairs possessing the desired APSI ± 10 %. From the remaining list of sequences we randomly picked k unique sequences. Additionally we dropped all alignments with an SCI below 0.6 to assure the structural quality of the alignments and to make sure that the SCI can be applied later to score the test alignments. This way we generated overall 18,990 reference alignments with an average SCI of 0.93; the data-set1 used in [22] consists of only 388 alignments with an average SCI of 0.89. For further details see Tables 1 and 6.
Table 6  Number of reference alignments for each RNA family
RNA family   k2   k3   k5   k7   k10   k15  ∑
5S_rRNA  1162  568  288  150  90  50  2308
5_8S_rRNA  76  45  17  5  3  0  146
Cobalamin  188  61  15  4  0  0  268
Entero_5_CRE  48  32  19  10  8  5  122
Entero_CRE  65  38  20  13  8  4  148
Entero_OriR  49  31  17  11  8  4  120
gcvT  167  67  22  12  3  1  272
Hammerhead_1  53  32  9  1  0  0  95
Hammerhead_3  126  99  52  32  17  12  338
HCV_SLIV  98  63  36  26  16  10  249
HCV_SLVII  51  33  19  13  10  7  133
HepC_CRE  45  29  18  11  7  3  113
Histone3  84  59  27  11  7  6  194
HIV_FE  733  408  227  147  98  56  1669
HIV_GSL3  786  464  246  151  95  61  1803
HIV_PBS  188  124  76  55  38  25  506
Intron_gpII  181  82  35  22  11  4  335
IRES_HCV  764  403  205  146  83  47  1648
IRES_Picorna  181  117  75  53  35  25  486
K_chan_RES  124  40  2  0  0  0  166
Lysine  80  48  30  17  7  3  185
Retroviral_psi  89  57  34  24  17  11  232
SECIS  114  67  33  16  11  6  247
sno_14q I_II  44  14  1  0  0  0  59
SRP_bact  114  76  39  19  12  7  267
SRP_euk_arch  122  94  42  21  12  6  297
S_box  91  51  25  12  7  2  188
T-box  18  8  0  0  0  0  26
TAR  286  165  92  62  42  28  675
THI  321  144  69  32  17  5  588
tRNA  2039  1012  461  267  143  100  4022
U1  82  65  26  16  6  0  195
U2  112  83  38  22  14  7  276
U6  30  21  14  7  1  0  73
UnaL2  138  71  43  20  7  0  279
yybP-ykoY  127  64  33  18  12  8  262
∑  8976  4835  2405  1426  845  503  18990

Scores
Just as in the previous BRAliBase II benchmark [22] we used the SCI [44] to score the structural conservation in alignments. The SCI is defined as the quotient of the consensus minimum free energy plus a covariance-like term (calculated by RNAALIFOLD; see [45]) to the mean minimum free energy of each individual sequence in the alignment. A SCI ≈ 0 indicates that RNAALIFOLD does not find a consensus structure, whereas a set of perfectly conserved structures has SCI = 1; a SCI ≥ 1 indicates a perfectly conserved secondary structure, which is, in addition, supported by compensatory and/or consistent mutations. The SCI can, for example, be computed by means of RNAZ [44]. To speed up the SCI calculation we implemented a program, SCIF, which is based upon RNAZ but computes only the SCI. SCIF was linked against RNAlib version 1.5 [46,47].
In [22] we used the BALISCORE, which computes the fraction of identities between a trusted reference alignment and a test alignment, where identity is defined as the averaged sequence identity over all aligned pairs of sequences. Because the original BALISCORE program has certain limitations and peculiarities, e. g. skips all alignment columns with more than 20 % gaps, we instead used a modified version of COMPALIGN [43] called COMPALIGNP, which also calculates the fractional sequence-identity between a trusted alignment and a test alignment. Curve progressions for scores computed by BALISCORE and COMPALIGNP are only marginally shifted. The COMPALIGNP score is called SPS' throughout the manuscript.
As both scores complement each other and are correlated, we use the product of both throughout this work and term this new score BRALISCORE.

Statistical methods
The software package R [48] offers numerous methods for statistical and graphical data interpretations. We used R version 2.2.0 to carry out the statistical analyses and visualizations of program performances. For a given APSI value, the scores of the alignments are distributed over a wide range (see for example, in Figure 3 the BRALISCOREs range from 0.0 to 1.2 at APSI = 0.45). Furthermore, the alignments are not evenly spaced on the APSI axis. Thus we used the non-parametric lowess function (locally weighted scatter plot smooth) of R to fit a curve through the data points. The lowess function is a locally weighted linear regression, which also takes into consideration horizontally neighbouring values to smooth a data point. The range in which data points are considered is defined by the smoothing factor. The curve in Figure 3 was computed by a smoothing factor of 0.3, which means that a range of 30 % of all data points surrounding the value to smooth are involved.
Figure 3  Lowess smoothing. The plot shows the scattered data points, each corresponding to one alignment, exemplified by the performance of PROALIGN with k = 7 sequences per alignment. The curve is the result of a lowess smoothing with a smoothing factor of 0.3. For statistical analyses we computed the BRALISCORE for each alignment. To rate the alignment programs or program options, we ranked these scores after averaging over all datasets. Because the score distributions cannot be assumed to be either normal or symmetric, we used as non-parametric tests the Friedman rank sum and the Wilcoxon signed rank test. R's Friedman test was accommodated to calculate the ranking. Afterwards the Wilcoxon test determined which programs or options pairwisely differ significantly. As already shown in [22] programs generally perform equally well above sequence similarity of about 80 %; that is, with such a similarity level the alignment problem becomes almost trivial. To avoid introduction of a bias due to the large number of high-homology alignments with a reference APSI > 80 %, we only used alignments with a reference APSI ≤ 80 % for the statistical analyses.

Programs and options
The following program versions and options were used:
ClustalW : version 1.83[27]
default: -type=dna -align
gap-opt: -type=dna -align -pwgapopen=GO -gapopen=GO -pwgapext=GE -gapext=GE
subst-mat.: -type=dna -align -dnamatrix=MATRIX -pwdnamatrix=MATRIX
MAFFT : version 5.667[25]
default: fftns
default: ginsi
old: fftns --op 0.51 --ep 0.041
old: ginsi --op 0.51 --ep 0.041
MUSCLE : version 3.6[16,26]
-seqtype rna
PCMA : version 2.0[49]
POA : version 2[50]
-do_global -do_progressive MATRIX
prank : version 270705b – 1508b[29]
-gaprate=GR -gapext=GE
ProAlign : version 0.5a3[51]
java -Xmx256m -bwidth = 400 -jar ProAlign_0.5a3.jar
ProbConsRNA : version 1.10[33]
Prrn : version 3.0 (package scc)[52]