Prediction performance
ExAlt's prediction accuracy was measured on exons with an exon counted correct when the predicted left and right boundary matched the test exon. Internal exons begin with an acceptor and end with a donor. Initial exons begin with a transcription or translation start site and end with a donor site. Terminal exons begin with an acceptor and end with a transcription or translation stop site. Single exons begin with a transcription or translation start site and end with the transcription or translation stop site. (Single exons in the test were of the intron retention splicing type.) Sensitivity (the percentage of the test exons correctly detected) and specificity (the percentage of predicted exons, which match the test set) were used to measure performance.
Table 2 shows ExAlt's performance on the hold out set compared to the union of different publicly available gene predictions and an initial known exon given as input. This tests the ability to improve an existing annotation, where an initial exon and reading frame are known. Since many test sequences contained multiple overlapping exons, one exon was chosen at random and used as input. Experiments were repeated 10 times and the average taken. Results are listed in Table 2 as ExAlt-Exon for the ExAlt predictions informed by cross-species sequence conservation. Exon sensitivity and specificity are high since at least one predicted exon matched the test exon. For example, in the case of multiple splice site exons with two overlapping exons, a "naive" program predicting only the input exon would achieve 50% sensitivity and 100% specificity. When only a single exon isoform exists the naive program achieves 100% sensitivity and specificity respectively. For the results in Table 2 it was important to compare the decrease in specificity from the naive method in cases where only a single exon isoform occurs versus the gains in sensitivity when multiple overlapping exons occur. Two ab initio single isoform gene finders were included in the comparison, Augustus [34] and SNAP [35]. Also included is the single isoform gene finder, N-SCAN [36], which uses cross-species conservation with Drosophila yakuba, Drosophila pseudoobscura, and Anopheles gambiae [37].
Table 2  Prediction performance of ExAlt. Sensitivity (Sens) and Specificity (Spec) are shown for exons.
Constitutive  Cassete  Multiple Splice  Intron Retention  All Exons
Sens  Spec  Sens  Spec  Sens  Spec  Sens  Spec  Sens  Spec
ExAlt-Exon  100   96  100   89  67   94   61   89   84   94
N-SCAN-Exon Union  100  86  100   89  65  79  55  78  82  84
Augustus-Exon Union  100  82  100  81  63  77  52  73  81  79
SNAP-Exon Union  100  77  100  79  64  74  51  76  81  77
SNAP+N-SCAN-Exon Union  100  73  100  74  69  68  57  70  83  72
Augustus+N-SCAN-Exon Union  100  79  100  77  68  73  57  68  83  76
Aug.+SNAP+N-SCAN-Exon Union  100  70  100  69   71  64  60  64   84  67
Columns are organized by exon type: Constitutive, Cassette, Multiple Splice, Intron Retention, and all exons counted together (All Exons). Row 1 shows ExAlt performance using an input exon and default parameter settings (ExAlt-Exon). Rows 2–6 show the union of different combinations of three gene finders (N-SCAN, SNAP, and Augustus) plus the input exon. (Aug. = Augustus) The coordinates for start and stop codons were included as input to ExAlt but were excluded from input to the gene finders, making it potentially more difficult for the gene finders to accurately predict initial, terminal and single exons. Therefore, for the initial exons to be counted correct, a gene finder was only required to correctly predict the donor site. For terminal exons to be counted correct, a gene finder was only required to correctly predict the acceptor site, and for single gene exons to be counted correct, a gene finder only needed to predict an overlap with the known single exon. The gene finders were run on longer stretches of genomic sequence than ExAlt and have the added challenging task of determining gene boundaries. A gene finder may predict an initial, terminal or single exon to overlap an internal exon in the test set, which would be counted as an incorrect exon prediction. If the start and stop codon information were integrated into the gene finder prediction process, individual prediction performance for the respective gene finders would likely improve. However, since considerable effort has been taken to carefully train and tune the gene finders for annotating long stretches of genomic sequence, the current predictions serve as a reasonable baseline for measuring differences in prediction performance. Using the input exon plus the union of all three single isoform gene finders yields more of the correct multiple splice site exons (71% versus ExAlt's 67%) but at the cost of a large reduction in specificity (64% versus ExAlt's 94%). In the other cases, however, ExAlt matches or improves on the performance of the union of multiple gene finders.
Table 3 compares the prediction performance of ExAlt-Exon in Table 2 to ExAlt predictions using different parameter settings. The impact of using the gene structure information as input (ExAlt-Exon) was compared to alternatives shown in Table 3 as ExAlt-Frame and ExAlt-Default. ExAlt-Frame makes predictions without using exon coordinates as input but is limited to predicting exons that maintain reading frame consistency with the rest of the known gene. ExAlt-Default is given no gene structure information and checks all three possible reading frames before selecting the exons from the highest scoring reading frame. As expected, starting with an initial known exon improved overall performance, but even when gene structure information is precluded from input, a majority of the exon coordinates were correctly recovered (67% overall).
Table 3  Exon prediction accuracy using different ExAlt parameter settings.
Constitutive  Cassete  Multiple Splice  Intron Retention  All Exons
Sens  Spec  Sens  Spec  Sens  Spec  Sens  Spec  Sens  Spec
ExAlt-Exon  100  96  100  89  67  94  61  89  84  94
Ex Alt-Exon- ab initio  100  88  100  85  69  83  70  87  87  84
ExAlt-Frame  96  95  70  80  53  87  48  82  72  89
Ex Alt- Frame- ab initio  97  87  72  74  56  76  48  80  74  82
ExAlt-Frame-Single  96  97  69  85  45  92  31  92  66  94
ExAlt-Default  89  84  58  63  49  77  43  74  67  79
Ex Alt-Default-ab initio  89  75  58  55  50  67  36  58  65  69
ExAlt-Default-Single  89  90  56  66  41  84  28  83  61  85
N-SCAN  87  84  51  80  33  66  31  66  57  78
Augustus  75  77  27  53  27  59  26  57  47  69
SNAP  76  72  42  61  29  56  27  62  50  67
Columns are organized by exon type: Constitutive, Cassette, Multiple Splice, Intron Retention, and all exons counted together (All Exons). Rows 1–2 show ExAlt performance using an input exon and default parameters from Table 2 (ExAlt-Exon) and no informant species (ExAlt-Exon-ab initio). Rows 3–5 show ExAlt performance using an input coding frame with default parameters (ExAlt-Frame), no informant species (ExAlt-Frame-ab initio), and at most 1 exon predicted per test sequence (ExAlt-Frame-Single). Rows 6–8 show ExAlt performance using no gene structure information with default parameters (ExAlt-Default), no informant species (ExAlt-Default-ab initio), and at most 1 exon prediction per test sequence (ExAlt-Default-Single). Output is shown for three single isoform gene finders N-SCAN, Augustus, and SNAP. ExAlt-Exon, ExAlt-Frame, and ExAlt-Default were compared to the respective ab initio equivalent: ExAlt-Exon-ab initio, ExAlt-Frame-ab initio, and ExAlt-Default-ab initio. Each ab initio version is the GHMM equivalent to the PGHMM using only the target D. melanogaster sequence as input. The multi-species versions of ExAlt in all cases reduced the number of false positive predictions over the equivalent ab initio version, with little or no reduction in sensitivity.
Finally, the trade off between predicting multiple overlapping exons versus predicting at most one exon per test sequence was measured. With the hold out set comprised of 57% constitutive exons, 18% MS exons, 17% SE exons, and 9% IR exons, both single exon prediction versions of ExAlt (ExAlt-Frame-Single and ExAlt-Default-Single) captured a large percentage of the exons by simply correctly predicting one exon per sequence. When ExAlt is given the coding frame and restricted to predict at most one exon, an exon is correctly predicted in 94% of the sequences (ExAlt-Frame-Single in Table 3). Allowing ExAlt to predict overlapping exons (ExAlt-Frame in Table 3) lowered specificity to 89% but increased the number of correctly annotated exons to 72%. The last three rows show single isoform gene finding performance for N-SCAN, Augustus, and SNAP, which provided an additional point of reference to measure how well conventional gene finders performed in the evaluated gene regions.
Table 4  ExAlt results on the initial training and testing set in percentages.
All Exons
Sens  Spec
ExAlt-Exon  82/-2  94/-1
ExAlt-Exon-ab initio  84/-3  86/+2
N-SCAN-Exon Union  82/0  82/-2
Augustus-Exon Union  81/0  81/+2
SNAP-Exon Union  81/0  77/0
SNAP+N-SCAN-Exon Union  83/0  72/0
Augustus+N-SCAN-Exon Union  83/0  75/-1
Aug.+SNAP+N-SCAN-Exon Union  84/0  68/+1
ExAlt-Frame  70/-2  87/-2
ExAlt-Frame-ab initio  72/-2  79/-3
ExAlt-Frame-Single  65/-1  91/-3
ExAlt-Default  65/0  78/-1
Ex Alt-Default -ab initio  65/0  66/-3
ExAlt-Default-Single  60/-1  84/-1
N-SCAN  56/-1  76/-2
Augustus  47/0  71/+2
SNAP  49/-1  67/0
Included next to each measurement is the difference in percentage points compared to performance in the held out set in Table 2 and Table 3. (Aug. = Augustus) Prediction performance was much higher in constitutive exons than the other categories of alternatively spliced exons. Lack of sequence conservation partly explained the decrease in specificity. The highest specificity levels in the training set were found to occur when the two informant species D. yakuba and D. erecta were available. In the hold out set, 86% of the constitutive exons matched to D. yakuba and D. erecta sequences compared with 75% of the alternatively spliced exons. (In the remaining cases some other combination of one or two informant species were found.) Thus, in some cases ExAlt could not optimally use evidence of sequence conservation to limit false positive predictions. The ab initio versions of ExAlt (ExAlt- Frame-ab initio and ExAlt-Default-ab initio) also got a smaller percentage of cassette exons correct compared to constitutive exons. Many of the cassette exons were less than 100 bases long and the single species ExAlt (ExAlt-Frame-ab initio in Table 3) correctly identified a majority of these exons (62%). However, short exons (less than 100 bases) made up 69% of the cases where single species ExAlt did not get both splice sites exactly correctly. In contrast, all but 1 of the short constitutive exons were correctly identified.
To ensure that the results in the hold out set represent performance that is expected to be repeatable on similarly randomly distributed data sets, performance numbers for the original training set were examined using 10-fold cross validation and are shown in Table 4. Performance results among the two test sets differ in sensitivity and specificity by at most 3%.