Methods The database, which consists of 18,990 sequence set files plus their reference alignments, and scripts used for benchmarking are available [32]. Plots showing BRALISCORE, SCI, and SPS versus APSI for all alignment sets (k ∈ 2, 3, 5, 7, 10, 15) and for all programs given in Table 5 can also be found there. Reference alignments For the construction of reference alignments we used "seed" alignments from the Rfam database version 7.0 [24,23]. In most cases these alignments are hand-curated and thus of higher quality than Rfam's "full" alignments generated automatically by the INFERNAL RNA profile package [40]. Alignments with less than 50 sequences were discarded to increase the possibility for creation of subalignments (see below). The SCI (see below) for scoring of structural alignment quality is based on a combination of thermodynamic and covariation measures. Thermodynamic structure prediction becomes increasingly inaccurate with increasing sequence length – e. g. due to kinetic effects – but is widely regarded as sufficiently accurate for sequences not exceeding 300 nt in length [41,42]. Thus we excluded alignments with an average sequence length above 300 nt to ensure proper thermodynamic scoring. To each remaining seed alignment we applied a "naive" combinatorial approach that extracts sub-alignments with k ∈ {2, 3, 5, 7, 10, 15} sequences for a given average pairwise sequence identity range (APSI; a measure for sequence homology computed with ALISTAT from the squid package [43]). Therefore we computed identities for all sequence pairs from an alignment and selected those pairs possessing the desired APSI ± 10 %. From the remaining list of sequences we randomly picked k unique sequences. Additionally we dropped all alignments with an SCI below 0.6 to assure the structural quality of the alignments and to make sure that the SCI can be applied later to score the test alignments. This way we generated overall 18,990 reference alignments with an average SCI of 0.93; the data-set1 used in [22] consists of only 388 alignments with an average SCI of 0.89. For further details see Tables 1 and 6. Table 6 Number of reference alignments for each RNA family RNA family k2 k3 k5 k7 k10 k15 ∑ 5S_rRNA 1162 568 288 150 90 50 2308 5_8S_rRNA 76 45 17 5 3 0 146 Cobalamin 188 61 15 4 0 0 268 Entero_5_CRE 48 32 19 10 8 5 122 Entero_CRE 65 38 20 13 8 4 148 Entero_OriR 49 31 17 11 8 4 120 gcvT 167 67 22 12 3 1 272 Hammerhead_1 53 32 9 1 0 0 95 Hammerhead_3 126 99 52 32 17 12 338 HCV_SLIV 98 63 36 26 16 10 249 HCV_SLVII 51 33 19 13 10 7 133 HepC_CRE 45 29 18 11 7 3 113 Histone3 84 59 27 11 7 6 194 HIV_FE 733 408 227 147 98 56 1669 HIV_GSL3 786 464 246 151 95 61 1803 HIV_PBS 188 124 76 55 38 25 506 Intron_gpII 181 82 35 22 11 4 335 IRES_HCV 764 403 205 146 83 47 1648 IRES_Picorna 181 117 75 53 35 25 486 K_chan_RES 124 40 2 0 0 0 166 Lysine 80 48 30 17 7 3 185 Retroviral_psi 89 57 34 24 17 11 232 SECIS 114 67 33 16 11 6 247 sno_14q I_II 44 14 1 0 0 0 59 SRP_bact 114 76 39 19 12 7 267 SRP_euk_arch 122 94 42 21 12 6 297 S_box 91 51 25 12 7 2 188 T-box 18 8 0 0 0 0 26 TAR 286 165 92 62 42 28 675 THI 321 144 69 32 17 5 588 tRNA 2039 1012 461 267 143 100 4022 U1 82 65 26 16 6 0 195 U2 112 83 38 22 14 7 276 U6 30 21 14 7 1 0 73 UnaL2 138 71 43 20 7 0 279 yybP-ykoY 127 64 33 18 12 8 262 ∑ 8976 4835 2405 1426 845 503 18990 Scores Just as in the previous BRAliBase II benchmark [22] we used the SCI [44] to score the structural conservation in alignments. The SCI is defined as the quotient of the consensus minimum free energy plus a covariance-like term (calculated by RNAALIFOLD; see [45]) to the mean minimum free energy of each individual sequence in the alignment. A SCI ≈ 0 indicates that RNAALIFOLD does not find a consensus structure, whereas a set of perfectly conserved structures has SCI = 1; a SCI ≥ 1 indicates a perfectly conserved secondary structure, which is, in addition, supported by compensatory and/or consistent mutations. The SCI can, for example, be computed by means of RNAZ [44]. To speed up the SCI calculation we implemented a program, SCIF, which is based upon RNAZ but computes only the SCI. SCIF was linked against RNAlib version 1.5 [46,47]. In [22] we used the BALISCORE, which computes the fraction of identities between a trusted reference alignment and a test alignment, where identity is defined as the averaged sequence identity over all aligned pairs of sequences. Because the original BALISCORE program has certain limitations and peculiarities, e. g. skips all alignment columns with more than 20 % gaps, we instead used a modified version of COMPALIGN [43] called COMPALIGNP, which also calculates the fractional sequence-identity between a trusted alignment and a test alignment. Curve progressions for scores computed by BALISCORE and COMPALIGNP are only marginally shifted. The COMPALIGNP score is called SPS' throughout the manuscript. As both scores complement each other and are correlated, we use the product of both throughout this work and term this new score BRALISCORE. Statistical methods The software package R [48] offers numerous methods for statistical and graphical data interpretations. We used R version 2.2.0 to carry out the statistical analyses and visualizations of program performances. For a given APSI value, the scores of the alignments are distributed over a wide range (see for example, in Figure 3 the BRALISCOREs range from 0.0 to 1.2 at APSI = 0.45). Furthermore, the alignments are not evenly spaced on the APSI axis. Thus we used the non-parametric lowess function (locally weighted scatter plot smooth) of R to fit a curve through the data points. The lowess function is a locally weighted linear regression, which also takes into consideration horizontally neighbouring values to smooth a data point. The range in which data points are considered is defined by the smoothing factor. The curve in Figure 3 was computed by a smoothing factor of 0.3, which means that a range of 30 % of all data points surrounding the value to smooth are involved. Figure 3 Lowess smoothing. The plot shows the scattered data points, each corresponding to one alignment, exemplified by the performance of PROALIGN with k = 7 sequences per alignment. The curve is the result of a lowess smoothing with a smoothing factor of 0.3. For statistical analyses we computed the BRALISCORE for each alignment. To rate the alignment programs or program options, we ranked these scores after averaging over all datasets. Because the score distributions cannot be assumed to be either normal or symmetric, we used as non-parametric tests the Friedman rank sum and the Wilcoxon signed rank test. R's Friedman test was accommodated to calculate the ranking. Afterwards the Wilcoxon test determined which programs or options pairwisely differ significantly. As already shown in [22] programs generally perform equally well above sequence similarity of about 80 %; that is, with such a similarity level the alignment problem becomes almost trivial. To avoid introduction of a bias due to the large number of high-homology alignments with a reference APSI > 80 %, we only used alignments with a reference APSI ≤ 80 % for the statistical analyses. Programs and options The following program versions and options were used: ClustalW : version 1.83[27] default: -type=dna -align gap-opt: -type=dna -align -pwgapopen=GO -gapopen=GO -pwgapext=GE -gapext=GE subst-mat.: -type=dna -align -dnamatrix=MATRIX -pwdnamatrix=MATRIX MAFFT : version 5.667[25] default: fftns default: ginsi old: fftns --op 0.51 --ep 0.041 old: ginsi --op 0.51 --ep 0.041 MUSCLE : version 3.6[16,26] -seqtype rna PCMA : version 2.0[49] POA : version 2[50] -do_global -do_progressive MATRIX prank : version 270705b – 1508b[29] -gaprate=GR -gapext=GE ProAlign : version 0.5a3[51] java -Xmx256m -bwidth = 400 -jar ProAlign_0.5a3.jar ProbConsRNA : version 1.10[33] Prrn : version 3.0 (package scc)[52]