PMC:514539 / 12441-16920
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"15307894-16323966-8142008","span":{"begin":3641,"end":3643},"obj":"16323966"},{"id":"15307894-12702575-8142009","span":{"begin":4076,"end":4078},"obj":"12702575"},{"id":"15307894-12937144-8142010","span":{"begin":4079,"end":4081},"obj":"12937144"},{"id":"15307894-12702575-8142011","span":{"begin":4172,"end":4174},"obj":"12702575"},{"id":"15307894-12937144-8142012","span":{"begin":4301,"end":4303},"obj":"12937144"}],"text":"Methods\nWe have developed a test we call the Permutation Percentile Separability Test (PPST), which attempts to refute a null hypothesis that is slightly different from A = B, but which is capable of detecting AB, BA, ABA and BAB patterns. Under this test, we are interested in the question \"are there are statistically significant number of samples in group A (e.g., tumor) that exhibit expression intensities beyond a particular percentile of the observed expression intensities in group B (e.g., normal)?\" and vice versa. By 'statistically significant' we mean that the number of samples that exhibit apparent overexpression (or underexpression) exceeds that expected under the null distribution.\nTo test these hypotheses, we count the number of samples in both groups that are found beyond the nth percentile of the samples in the opposite group. This provides two scores, s1, and s2, for each gene (PPST scores). s1 is the number of samples in group A that are beyond the upper percentile (say, 95th) of group B plus the number of samples in group B that are below the lower 95th percentile of group A. This measure will tend to be large when all samples in both groups are significantly distinct from the alternate group in the same way (comparisons consistent with A \u003e B). It can also be significant when a surprising number of samples in only one group varies from the expression levels in the alternate group. s2 is the number of samples with correspondingly opposite pattern (comparisons consistent with B \u003e A). Sample class label permutations are used to generate an arbitrarily large number of permuted data sets. These scores s1 and s2 are calculated in each permuted data set to produce unique null distributions for each gene. For the sake of convenience of interpretation, we use -s2 when reporting s2 to denote underexpression. Genes with values of s1 beyond the specified acceptable Type 1 error risk (e.g., α = 5%) are determined to be significantly overexpressed in sample group A relative to B. Individuals in sample group A with expression intensity values over the 95th percentile of sample group B for a given gene may be considered overexpressed. Similarly, genes with values of s2 beyond the specified Type 1 risk for s2 are deemed underexpressed in sample group B relative to A. Varying the percentile threshold allows direct control over the false discovery rate.\n\nTest for ABA patterns (ABA Test)\nGenes that exhibit both significant s1 and s2 scores in this comparison may be considered 'ABA pattern genes' (Fig. 1); however, for stronger inference, permutation tests are also used to calculate s3, to determine, for a given gene, the number of samples from one group (A) that can expected to be distributed both in the upper and lower nth percentile tails of the intensity distribution of that gene in the other group (B); i.e., in the ABA (s3) or BAB (s4) pattern. These scores are not redundant to but rather allow for exploration of distribution-wise (upper and lower) false discovery rates. The application of the PPST test to find ABA patterns is called the 'ABA' test. Under the ABA test, differential expression of a gene may be deemed to be significant in both directions at once, i.e., simultaneously significantly over-expressed and under-expressed in a surprising number of patients in the case population. Both the PPST test and the ABA test will perform optimally when the variation in expression intensities in the normal sample population is well characterized.\nA collection of published microarray data sets we have placed 'on-tap' in the caGEDA (Gene Expression Data Analysis) web application [51] were subjected to the PPST test and the ABA test. To avoid idiosyncracies that can result from the study of extreme values, we ran the tests at a fairly relaxed Type 1 error risk (α = 0.05 in both tails, or α = 0.10 overall). To compare the self-consistency of the parametric t-test, the nonparametric t-test, the PPST test and the ABA test, we re-analyzed two published data sets from independent astrocytoma progression studies [52,53]. Details of these studies are available in the original papers. In brief, Khatua et al. [52] studied global gene expression profiles from 6 early stage and 7 late-stage astrocytoma patients, while van den boom et al. [53] studied global gene expression profiles from 8 early stage and 8 late-stage astrocytoma patients. We calculated the overlap in the gene lists using our online Overlap tool .\n"}