Results Simulation Results The general overview of the Altrans algorithm is provided in Figure 1. We first aimed to compare the results between Altrans, Cufflinks, and MISO using simulations. We compared six scenarios, one where the given annotation perfectly described the transcripts in the simulations and five others with 5%, 10%, 25%, 50%, and 75% novel transcripts absent from the annotation (see Material and Methods). Subsequently we quantified the six simulation results with both algorithms using the known annotation in all cases. For MISO we have quantified transcript abundances. This was done to assess how methods performed in cases of complete versus incomplete transcriptome knowledge. The transcript quantifications generated by Cufflinks and MISO were transformed into link quantifications to make them comparable to those generated by Altrans. The results of the simulation analysis are shown in Figure 2. We observe that Cufflinks performs better than Altrans when the annotation is perfect, but as the percentage of novel transcripts in the simulations increases, Altrans performs better because it suffers less from the imperfect annotation used in the quantification. In comparison, MISO performs less well than both methods. In order to produce a null random distribution for each method, we took the link quantifications for each gene and permutated these for 100 times within the links of this gene. We then measured the correlation of these random assignments with the simulated ones and find that Cufflinks and MISO fall to the levels of random assignment of link quantifications as the novel transcripts increase in the simulations. We estimated the proportion of novel transcripts by using split read mappings from a well-studied LCL transcriptome RNA-sequencing experiment7 and a less well-studied pancreatic beta cell transcriptome RNA-sequencing experiment.25 We observe that in the LCLs on average 25.8% (SD = 3.5%) and in the beta cells 34.7% (SD = 9.3%) of the junctions are not found in the GENCODE v.12 annotation. Therefore we conclude that in RNA-sequencing experiments where the annotation does not fully reflect the underlying isoform variety, Altrans is a sensitive method for quantifying exon junctions. cis-Alternative Splicing QTL Discovery and Replication between Populations The Geuvadis dataset comprises 373 European (EUR) and 89 African (YRI) samples and the cis-asQTL discovery was conducted separately in each population as described in the Material and Methods section. At an FDR threshold of 1%, we find 1,472 and 1,737 asQTL genes in the European population with Altrans and Cufflinks, respectively. For the Africans these numbers are 166 and 304, respectively (Table 1). There is a significant overlap between the methods in the asQTL genes, with Altrans finding approximately 45% of the genes identified by Cufflinks in the Europeans and about 25% in the Africans (Table 1). The relative decrease of overlap between the methods in the African population is due to the decreased samples size, hence power, in this cohort compared to the Europeans. When we plot the significant asQTLs distances from the TSS, we observe that for both methods the asQTLs that are shared between the two populations and asQTLs with stronger effects tend to be closer to the TSS than population-specific and weaker asQTLs (Figure 3A). As expected, given the sample sizes of each population, majority of the asQTLs genes in Europeans at this FDR threshold are unique to this populations (91% for Altrans and 86% for Cufflinks) whereas most of the African asQTLs genes are also found in the Europeans (81% for Altrans and 82% for Cufflinks) (Figure 3B). Using a more sensitive π1 approach,21 we estimate that 72% of the Altrans asQTLs in Europeans are replicated in Africans and 94% of the African asQTLs are replicated in Europeans. In the case of Cufflinks, these estimates are 78% and 93%, respectively (Figure 3C). We have taken the correlation coefficient as a proxy to the effect size of an asQTL and compared the absolute value distribution of the correlation coefficients of significant asQTLs identified by each method in both populations (Figure S1). Cufflinks asQTLs have significantly higher effect sizes than Altrans asQTLs (Mann-Whitney p < 1.69 × 10−5, indicating that Altrans is identifying associations with smaller effect sizes compared to Cufflinks and, together with changes in sample size, this contributes to slight decrease of replication of European Altrans asQTLs in Africans, compared to Cufflinks. Of note, when we discover asQTLs in Africans (smaller sample size) and replicate in Europeans (larger sample size), both methods achieve very high levels of replication (94% and 93% for Altrans and Cufflinks, respectively). In order to test the replication of asQTLs by each method independent of sample size and different populations, we have selected 91 European individuals belonging to the CEU population and replicated the findings of this cohort in the larger 282 remaining European samples. When we calculate the π1 statistic in this analysis, we observe that both methods attain very similar levels of replication (π1 = 95% for both methods) (Figure S2). Differences between Methods Given that both methods replicate at similar levels and Cufflinks finds more asQTLs, one can make the argument that this could be the method of choice. However, almost half of the asQTLs that are discovered with Altrans are unique to Altrans. Although the methodology in identifying splicing QTLs in the original Geuvadis analysis differs significantly from the process described here, we also checked the asQTL gene level overlap between the published lists of splicing QTLs7 and the ones identified here (Figure S3). We find that Altrans detects 258 out of the 620 asQTLs identified in the Europeans in the original study, and Cufflinks finds 348 overlapping asQTLs. The union of both methods used here identifies 395 genes as significant asQTLs out of the 620 in the original discovery. In the African population, the overlap proportions are similar, with Altrans finding 16 out of 83 asQTLs as also significant, whereas Cufflinks finds 35 common genes, and the union of Altrans and Cufflinks overlaps with 38 asQTLs in the original study. This is a confirmation of the complementary nature of asQTL discovery methods. We investigated the Altrans-specific asQTLs further. First we find that the majority of the Altrans-specific asQTLs originate from links between exons that are not annotated in the GENCODE v.12 annotation and therefore were never tested by Cufflinks (89% and 83% not annotated for Europeans and Africans, respectively; Figure S3). Next, we assessed whether Altrans-specific discoveries replicate, and to do so we tested the Altrans-specific discoveries originating from the 91 CEU individuals in the remaining Europeans, and these associations achieve a π1 statistic of 93%, indicating a high true positive rate in Altrans-specific asQTLs (Figure S4A). We also estimate that 63% of the Altrans-specific asQTLs in Europeans are replicated in Africans and 95% of the African Altrans-specific asQTLs are replicated in Europeans. Moreover, we compared the types of splicing events that are found to be significant by both methods (Figure S5) and observed that there are differences between the two methods. The majority (66%) of the signal that Altrans captures is due to exon skipping events followed by alternative 5′ and 3′ UTRs (15% and 11%, respectively). In comparison, Cufflinks has a more uniform distribution of significant event types, with the most common being alternative 5′ UTR (23%), followed by exon skipping (15%) and alternative first exons (14%). This difference in types of significant splicing events each method finds highlights their relative merits in identifying different types of splicing events and is one of the reasons for method-specific significant results. We have tested whether the exon skipping events identified by Altrans replicate between CEU discovery and remaining Europeans, and across populations, and we achieve high π1 values of 98% for CEU discovery replicated in remaining Europeans (Figure S4B), 70% for Europeans replicated in Africans, and 96% in Africans replicated in Europeans, which confirms that these events are enriched for true positives. Replication of Discoveries by One Method in the Other Method We wanted to assess how discoveries of one method compared to the other. For each significant variant-link pair in one population by one method, we calculated the p value of the same variant-link pair in the same population based on quantifications by the other method. For this we had to select common links identified by each method, and therefore many genes are not being tested for replication across methods. From these p value distributions, we calculated the π1 statistic, which indicates the proportion of true positives (Figure S6). We estimate that 94% of Altrans asQTLs in Europeans and 90% Altrans asQTLs in Africans are replicated by Cufflinks quantifications in the corresponding population, for the common links between the two methods. In contrast, replication in the other direction, Cufflinks asQTLs in Altrans, is lower: 57% and 51% for Europeans and Africans, respectively. When we are testing Altrans results in Cufflinks, we are testing 507 and 77 genes for Europeans and Africans, respectively, and when testing Cufflinks in Altrans, these values are 1,260 and 230, respectively. We then multiply the corresponding π1 values with these number of genes tested to get an estimate of the number of genes that replicate across methods and divide these with the corresponding number of asQTL genes found in the original discovery (e.g., for European Cufflinks in Altrans: 1,260 × 0.57 / 1,737 = 41%). In doing so we estimate the percentage of genes that are “discoverable” by the other method. This percentage is similar across methods and is in the Europeans 33% and 41% for Altrans and Cufflinks, respectively. In the Africans these values are 42% and 39%. This is due to the different space of alternative splicing that each method is best at quantifying and is another confirmation of the complementary nature of these methods. Functional Relevance of asQTLs In the absence of a known and true set of asQTLs, we can use the functional annotation of the human genome generated by the ENCODE project to assess whether the asQTLs discovered are likely to be biologically active. If the identified asQTLs are “real,” then we would expect them to lie in biochemically functional regions of the genome more often than expected by chance. We have tested this by overlapping asQTLs with functional annotations provided by the Ensemble regulatory build24 and comparing this overlap to that of random set non-asQTL variants, which were matched to the asQTLs based on relative distance from TSS and allele frequency (Material and Methods). We find significant enrichments for many transcription factor peaks (median 5.2× median p = 4.41 × 10−8 for Altrans and median 4.4× median p = 2.26 × 10−7 for Cufflinks), DNase1 hypersensitive sites (4.2× p = 1.53 × 10−46 for Altrans and 2.9× p = 5.05 × 10−23 for Cufflinks), chromatin marks for active promoters (median 4.3× median p = 1.51 × 10−51 for Altrans and median 4.0× median p = 5.15 × 10−54 for Cufflinks), as well as strong enhancer marks (median 3.9× median p = 3.46 × 10−40 for Altrans and median 3.4× median p = 2.76 × 10−35 for Cufflinks) in asQTLs identified by both methods (Figure 4). We also observe a significant depletion in repressor marks (3.3× p = 3.51 × 10−17 for Altrans and 5.0× p = 2.28 × 10−26 for Cufflinks). All together these results confirm the functional relevance of asQTLs and indicate that we are capturing true biological signal. Furthermore, we also observe strong significant enrichments for variants that are in splice acceptor (33.3× p = 8.36 × 10−9 for Altrans and 55× p = 2.25 × 10−10 for Cufflinks) and donor (10× p = 0.01 for Altrans and 30× p = 1.34 × 10−5 for Cufflinks) sites as well as variants in splice regions (12.3× p = 4.83 × 10−28 for Altrans and 12.8× p = 2.31 × 10−37 for Cufflinks), which also indicates that we are capturing variants involved in splicing machinery.