Identification and Removal of False Positives In the process of experimentally validating our predictions, we encountered three main sources of false positives (FP) from robust regression. First, we identified genes with probeset signal estimates that were poorly correlated and were not amenable to our method. As an example, the median probeset signal estimates in hESCs and hCNS-SCns of the FIP1L1 gene (gene identifiers BC011543, AL136910) had a Pearson correlation coefficient of 0.38, and the distribution of points was not amenable to robust regression (Figure 4A). To avoid inappropriate application of REAP and generating false predictions, we empirically determined that a gene had to have a Pearson correlation coefficient cutoff of 0.6 before being amenable to REAP analysis. Next, we managed two additional sources of FPs, namely “high-leverage” and “high-influence” points, which we were able to identify by computing the following metrics. For every point, we computed (i) the studentized residual (as described above), (ii) the influence, and (iii) the leverage (see Materials and Methods for more details). Leverage assessed how far away a value of the independent variable was from the mean value; the farther away the observation the more leverage it had. The influence of a point was related to its covariance ratio: a covariance ratio larger (or smaller) than 1 implied that the point was closer (or farther) than was typical to the regression line, so removing it would hurt (or help) the accuracy of the line and would increase (or decrease) the error term variance. Influence was computed as the absolute difference between the covariance ratio and unity. To illustrate further, a point was classified as an “outlier” if it had a large studentized residual (p < 0.01) and low leverage (boxed point “a”); as a “high-leverage” point if it had a low studentized residual and high leverage (boxed point “b”); and as a “high-influence” point if it had a high studentized residual, high leverage, and high influence (boxed point “c”; Figure 4B). Points that resembled boxed point “a” were designated as potential AS events. For example, four of the five boxed points in Figure 3C were “outliers,” and RT-PCR validation indicated that the exon represented by the probeset was indeed skipped in hCNS-SCns (EHBP1, Figure 7B). Points that were “high-leverage,” such as the five points in the CLCN2 gene, were experimentally verified to be a FP (Figure 4C; unpublished data). Points that were “high-influence,” such as the four of five boxed points in the ABCA3 gene were also experimentally verified to be a FP (Figure 4D; unpublished data). In conclusion, in order to reduce the FP rate, all points were evaluated according to the metrics described, and points that were significant “outliers” were considered putative AS events. Figure 4 Sources of False Positives (A) Scatter plot of points for the FIP1L1 gene and the line representing the robust regression estimate. (B) Boxed point “a” represents a significant “outlier” (with a significantly different studentized residual and low leverage). Boxed point “b” represents a “high leverage” point (low studentized residual and a high leverage). Boxed point “c” represents a “high influence” point (high studentized residual, high leverage, and high influence). (C) Scatter plot of points for the CLCN2 gene. Boxed points represent “high leverage” points. (D) Scatter plot of points for the ABCA3 gene. Boxed points represent “high influence” points. G