An initial set of 606 regions annotated with alternative splicing were searched for duplicate sequences. WU-blastn 2.0 [39] was used for an 'all against all' search to remove repeat sequence when two exons match with an E-value < 10-21, leaving 600 exon regions. Each remaining non-redundant exon region was extracted from the originating genome location with flanking intron sequence of 400 bases (or the length of the adjacent intron, whichever is shorter). 400 was chosen as a cutoff to limit the potential for aligning long stretches of poorly conserved intron sequence, while maintaining reasonably long stretches of sequence to predict alternative splicing patterns. WU-blastn 2.0 is used to find potential homologs in the three informant species D. simulans, D. yakuba, and D. erecta. The D. simulans genome was downloaded from the UCSC genome browser [40]. The D. yakuba and D. erecta genomes were downloaded from [41]. (D. simulans and D. yakuba sequence was generated by the Genome Sequencing Center, WUSTL School of Medicine and D. erecta sequence was generated by Agencourt.) The best matching sequence with an E-value < 10-19 was retained along with 50 bases of flanking sequence for input to the multiple sequence alignment program muscle [42]. All aligned sequences were required to differ in length by at most 10% to the query D. melanogaster sequence. N-SCAN and Augustus predictions were downloaded from the UCSC Genome browser [43,44] and SNAP predictions were downloaded from [45]. Branch lengths were obtained from [46].