Prediction performance ExAlt's prediction accuracy was measured on exons with an exon counted correct when the predicted left and right boundary matched the test exon. Internal exons begin with an acceptor and end with a donor. Initial exons begin with a transcription or translation start site and end with a donor site. Terminal exons begin with an acceptor and end with a transcription or translation stop site. Single exons begin with a transcription or translation start site and end with the transcription or translation stop site. (Single exons in the test were of the intron retention splicing type.) Sensitivity (the percentage of the test exons correctly detected) and specificity (the percentage of predicted exons, which match the test set) were used to measure performance. Table 2 shows ExAlt's performance on the hold out set compared to the union of different publicly available gene predictions and an initial known exon given as input. This tests the ability to improve an existing annotation, where an initial exon and reading frame are known. Since many test sequences contained multiple overlapping exons, one exon was chosen at random and used as input. Experiments were repeated 10 times and the average taken. Results are listed in Table 2 as ExAlt-Exon for the ExAlt predictions informed by cross-species sequence conservation. Exon sensitivity and specificity are high since at least one predicted exon matched the test exon. For example, in the case of multiple splice site exons with two overlapping exons, a "naive" program predicting only the input exon would achieve 50% sensitivity and 100% specificity. When only a single exon isoform exists the naive program achieves 100% sensitivity and specificity respectively. For the results in Table 2 it was important to compare the decrease in specificity from the naive method in cases where only a single exon isoform occurs versus the gains in sensitivity when multiple overlapping exons occur. Two ab initio single isoform gene finders were included in the comparison, Augustus [34] and SNAP [35]. Also included is the single isoform gene finder, N-SCAN [36], which uses cross-species conservation with Drosophila yakuba, Drosophila pseudoobscura, and Anopheles gambiae [37]. Table 2 Prediction performance of ExAlt. Sensitivity (Sens) and Specificity (Spec) are shown for exons. Constitutive Cassete Multiple Splice Intron Retention All Exons Sens Spec Sens Spec Sens Spec Sens Spec Sens Spec ExAlt-Exon 100 96 100 89 67 94 61 89 84 94 N-SCAN-Exon Union 100 86 100 89 65 79 55 78 82 84 Augustus-Exon Union 100 82 100 81 63 77 52 73 81 79 SNAP-Exon Union 100 77 100 79 64 74 51 76 81 77 SNAP+N-SCAN-Exon Union 100 73 100 74 69 68 57 70 83 72 Augustus+N-SCAN-Exon Union 100 79 100 77 68 73 57 68 83 76 Aug.+SNAP+N-SCAN-Exon Union 100 70 100 69 71 64 60 64 84 67 Columns are organized by exon type: Constitutive, Cassette, Multiple Splice, Intron Retention, and all exons counted together (All Exons). Row 1 shows ExAlt performance using an input exon and default parameter settings (ExAlt-Exon). Rows 2–6 show the union of different combinations of three gene finders (N-SCAN, SNAP, and Augustus) plus the input exon. (Aug. = Augustus) The coordinates for start and stop codons were included as input to ExAlt but were excluded from input to the gene finders, making it potentially more difficult for the gene finders to accurately predict initial, terminal and single exons. Therefore, for the initial exons to be counted correct, a gene finder was only required to correctly predict the donor site. For terminal exons to be counted correct, a gene finder was only required to correctly predict the acceptor site, and for single gene exons to be counted correct, a gene finder only needed to predict an overlap with the known single exon. The gene finders were run on longer stretches of genomic sequence than ExAlt and have the added challenging task of determining gene boundaries. A gene finder may predict an initial, terminal or single exon to overlap an internal exon in the test set, which would be counted as an incorrect exon prediction. If the start and stop codon information were integrated into the gene finder prediction process, individual prediction performance for the respective gene finders would likely improve. However, since considerable effort has been taken to carefully train and tune the gene finders for annotating long stretches of genomic sequence, the current predictions serve as a reasonable baseline for measuring differences in prediction performance. Using the input exon plus the union of all three single isoform gene finders yields more of the correct multiple splice site exons (71% versus ExAlt's 67%) but at the cost of a large reduction in specificity (64% versus ExAlt's 94%). In the other cases, however, ExAlt matches or improves on the performance of the union of multiple gene finders. Table 3 compares the prediction performance of ExAlt-Exon in Table 2 to ExAlt predictions using different parameter settings. The impact of using the gene structure information as input (ExAlt-Exon) was compared to alternatives shown in Table 3 as ExAlt-Frame and ExAlt-Default. ExAlt-Frame makes predictions without using exon coordinates as input but is limited to predicting exons that maintain reading frame consistency with the rest of the known gene. ExAlt-Default is given no gene structure information and checks all three possible reading frames before selecting the exons from the highest scoring reading frame. As expected, starting with an initial known exon improved overall performance, but even when gene structure information is precluded from input, a majority of the exon coordinates were correctly recovered (67% overall). Table 3 Exon prediction accuracy using different ExAlt parameter settings. Constitutive Cassete Multiple Splice Intron Retention All Exons Sens Spec Sens Spec Sens Spec Sens Spec Sens Spec ExAlt-Exon 100 96 100 89 67 94 61 89 84 94 Ex Alt-Exon- ab initio 100 88 100 85 69 83 70 87 87 84 ExAlt-Frame 96 95 70 80 53 87 48 82 72 89 Ex Alt- Frame- ab initio 97 87 72 74 56 76 48 80 74 82 ExAlt-Frame-Single 96 97 69 85 45 92 31 92 66 94 ExAlt-Default 89 84 58 63 49 77 43 74 67 79 Ex Alt-Default-ab initio 89 75 58 55 50 67 36 58 65 69 ExAlt-Default-Single 89 90 56 66 41 84 28 83 61 85 N-SCAN 87 84 51 80 33 66 31 66 57 78 Augustus 75 77 27 53 27 59 26 57 47 69 SNAP 76 72 42 61 29 56 27 62 50 67 Columns are organized by exon type: Constitutive, Cassette, Multiple Splice, Intron Retention, and all exons counted together (All Exons). Rows 1–2 show ExAlt performance using an input exon and default parameters from Table 2 (ExAlt-Exon) and no informant species (ExAlt-Exon-ab initio). Rows 3–5 show ExAlt performance using an input coding frame with default parameters (ExAlt-Frame), no informant species (ExAlt-Frame-ab initio), and at most 1 exon predicted per test sequence (ExAlt-Frame-Single). Rows 6–8 show ExAlt performance using no gene structure information with default parameters (ExAlt-Default), no informant species (ExAlt-Default-ab initio), and at most 1 exon prediction per test sequence (ExAlt-Default-Single). Output is shown for three single isoform gene finders N-SCAN, Augustus, and SNAP. ExAlt-Exon, ExAlt-Frame, and ExAlt-Default were compared to the respective ab initio equivalent: ExAlt-Exon-ab initio, ExAlt-Frame-ab initio, and ExAlt-Default-ab initio. Each ab initio version is the GHMM equivalent to the PGHMM using only the target D. melanogaster sequence as input. The multi-species versions of ExAlt in all cases reduced the number of false positive predictions over the equivalent ab initio version, with little or no reduction in sensitivity. Finally, the trade off between predicting multiple overlapping exons versus predicting at most one exon per test sequence was measured. With the hold out set comprised of 57% constitutive exons, 18% MS exons, 17% SE exons, and 9% IR exons, both single exon prediction versions of ExAlt (ExAlt-Frame-Single and ExAlt-Default-Single) captured a large percentage of the exons by simply correctly predicting one exon per sequence. When ExAlt is given the coding frame and restricted to predict at most one exon, an exon is correctly predicted in 94% of the sequences (ExAlt-Frame-Single in Table 3). Allowing ExAlt to predict overlapping exons (ExAlt-Frame in Table 3) lowered specificity to 89% but increased the number of correctly annotated exons to 72%. The last three rows show single isoform gene finding performance for N-SCAN, Augustus, and SNAP, which provided an additional point of reference to measure how well conventional gene finders performed in the evaluated gene regions. Table 4 ExAlt results on the initial training and testing set in percentages. All Exons Sens Spec ExAlt-Exon 82/-2 94/-1 ExAlt-Exon-ab initio 84/-3 86/+2 N-SCAN-Exon Union 82/0 82/-2 Augustus-Exon Union 81/0 81/+2 SNAP-Exon Union 81/0 77/0 SNAP+N-SCAN-Exon Union 83/0 72/0 Augustus+N-SCAN-Exon Union 83/0 75/-1 Aug.+SNAP+N-SCAN-Exon Union 84/0 68/+1 ExAlt-Frame 70/-2 87/-2 ExAlt-Frame-ab initio 72/-2 79/-3 ExAlt-Frame-Single 65/-1 91/-3 ExAlt-Default 65/0 78/-1 Ex Alt-Default -ab initio 65/0 66/-3 ExAlt-Default-Single 60/-1 84/-1 N-SCAN 56/-1 76/-2 Augustus 47/0 71/+2 SNAP 49/-1 67/0 Included next to each measurement is the difference in percentage points compared to performance in the held out set in Table 2 and Table 3. (Aug. = Augustus) Prediction performance was much higher in constitutive exons than the other categories of alternatively spliced exons. Lack of sequence conservation partly explained the decrease in specificity. The highest specificity levels in the training set were found to occur when the two informant species D. yakuba and D. erecta were available. In the hold out set, 86% of the constitutive exons matched to D. yakuba and D. erecta sequences compared with 75% of the alternatively spliced exons. (In the remaining cases some other combination of one or two informant species were found.) Thus, in some cases ExAlt could not optimally use evidence of sequence conservation to limit false positive predictions. The ab initio versions of ExAlt (ExAlt- Frame-ab initio and ExAlt-Default-ab initio) also got a smaller percentage of cassette exons correct compared to constitutive exons. Many of the cassette exons were less than 100 bases long and the single species ExAlt (ExAlt-Frame-ab initio in Table 3) correctly identified a majority of these exons (62%). However, short exons (less than 100 bases) made up 69% of the cases where single species ExAlt did not get both splice sites exactly correctly. In contrast, all but 1 of the short constitutive exons were correctly identified. To ensure that the results in the hold out set represent performance that is expected to be repeatable on similarly randomly distributed data sets, performance numbers for the original training set were examined using 10-fold cross validation and are shown in Table 4. Performance results among the two test sets differ in sensitivity and specificity by at most 3%.