Figure 1  Schematic of the Altrans Algorithm
(A) Overlapping exons are grouped into exon groups where identical exons belonging to multiple transcripts are treated as one unique entity. Two transcripts, shown as connected brown and green boxes, result in two exon groups and three exons shown as blue boxes. Next, the unique regions of each exon, depicted as light blue boxes and a subscript u followed by the level of the exon, are identified. Because E2 has a region that is not shared by any other exon, it is assigned a “level” of 1, and the reads aligning to E2u,1 can be unambiguously assigned to E2. E1 does not have a unique portion, and therefore the level 1 exon, E2, is removed from the exon group and the whole of E1 becomes a unique portion, shown as an empty blue box, with a level of 2. These unique regions are used when assigning mate pairs to links as shown with the red lines where the solid portions of the line are the sequenced mates and the dashed part represents the inferred insert.
(B) The default method for calculating link coverage. Link coverage is necessary to normalize the observed counts for the length of the unique portions being linked and the insert size. The theoretical minimum and maximum insert sizes linking the two unique portions, represented as brown and green lines, respectively, are calculated and given the empirically determined insert size distribution, and the area under the curve between the minimum and maximum insert sizes is estimated. The link coverage equals the number of mate pairs linking the two unique portions over the ratio of this area to the area of the whole insert size distribution.
(C) The degrees of freedom method for determining link coverage. Here given a read length and insert size of 3 and two exons that are 6 and 5 bases long, there are three mate pair alignments that can link these two exons. Therefore, the degrees of freedom refer to the theoretical number of positions where a mate pair (given, in this case, 3+3+3 = 9 bp long fragment size) exists that links these exons on the mRNA, shown as black lines. The link coverage is the number of mate pairs linking the exons over the degrees of freedom.
(D) The equation to calculate F value for a link.
(E) A worked example of calculation of the F values. First the coverage of E2 to E3 link (CE2 − E3) is determined from level 1 unique regions (CE2u,1 − E3u,1), which is then subtracted from the coverage attained from the pseudo-unique E1 to E3 link (CE1u,2 − E3u,1) in order to calculate the true E1 − E3 coverage (CE1 − E3). In the forward direction, E1 and E2 become primary exons and in the reverse direction E3 is the primary exon and the corresponding F values are calculated as shown.