Sequence conservation The training set confirmed that splice sites and protein coding sequence were conserved between D. melanogaster and each of the three informant species. 99% of the constitutive di-nucleotide splice sites (AG and GT) annotated in D. melanogaster were found in the matching aligned informant sequence. Alternative splice sites were less frequently conserved, but only by a small degree, with over 95% of alternative splice sites found in the matched informant species. Table 1 shows the percentage of exons with matches to each of the informant species missing a splice site categorized by exon type. Exons with multiple duplicate functional splice sites (MS and IR exons) less frequently shared all splice sites with the informant species. In D. simulans for example, 12% of the multiple splice site exons (MS in Table 1) and 8% of the exons with retained introns (IR in Table 1) were missing a splice site. The lack of conservation in alternative splicing in nearly every case affects only one exon isoform leaving another shared exon isoform in place. In the vast majority of cases, the lack of observed conservation is not due to misalignments and missing sequence, although a small percentage of cases are affected by this problem. Table 1 Percentage of D. melanogaster annotated exons missing at least one splice site in D. simulans, D. yakuba and D. erecta. D. simulans D. yakuba D. erecta CS 2 1 1 CE 1 4 2 MS 10,1 9,0 11,0 IR 8,0 16,2 20,3 Percentages are organized by exon type: constitutive exons (CS), cassette exons (CE), exons with multiple splice sites (MS), and exons with intron retention (IR). The second number associated with the MS and IR rows is the percentage of exons where the non-conserved splice site is constitutive (used in all isoforms).