PMC:2644708 / 15965-23195 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"19055840-16646809-8161605","span":{"begin":2860,"end":2862},"obj":"16646809"}],"text":"Determining global rank-invariant set size\nWhen selecting the GRiS, we aim to minimize the rank order variation among transcripts in the set. We do not attempt to select a set with no rank variation because this would normally result in too few transcripts to define a smooth calibration curve. Therefore, when choosing the size of the global rank-invariant set, we must balance the desire for rank-invariant transcripts with the need for a sufficient number of calibration points. The effect of selecting too few transcripts is demonstrated in Fig. 3A. Here, multiple calibration curves (using different numbers of approximately global rank-invariant transcripts) are graphed for a single sample. The five red curves represent calibration curves generated using GRiS sizes of 100, 200, 300, 400, and 500. At this size range, the curves are erratic and segmented. The five green curves represent sizes of 2000, 4000, 6000, 8000, and 10,000. At this size range, the calibration curves smooth out and become more consistent. We conclude that GRiS sizes in the range of 100 to 500 are insufficient, but that sizes in the range of 2000 to 10,000 appear to be adequate.\nFigure 3 Selection of the global rank-invariant set size. A. The effect of selecting different sized Global Rank-invariant Sets (GRiS) on the calibration curves for a given sample. Each curve shows the GRSN calculated adjustment value as a function of expression value (for a given sample). Red curves are from GRiS sizes of 100, 200, 300, 400, and 500. Green curves are from GRiS sizes of 2000, 4000, 6000, 8000, and 10000. The blue curve is from the default GRiS size of 5000. GB dataset comparing Fanconi vs. Normal using MAS 5.0 processed data on left and RMA processed data on the right (notice different y-axis scales). B. The effect of changing the GRiS size on the selection of significantly regulated genes. Each bar represents the difference in the lists of significant genes found due to a change in the size of the GRiS. The red bar represents the default size of 5000 vs. none (not using GRSN) and is meant to show the magnitude of the effect of applying GRSN (a reference point for the other bars). The three blue bars represent, in order from left to right, the difference of 5000 to 10000, 10000 to 15000, and 15000 to 20000. Top Row – GB dataset comparing Fanconi vs. Normal using MAS 5.0 processed data on left and RMA processed data on the right. On left, GSE6802 dataset comparing R vs. C, using RMA. On right, 339RS dataset comparing TA vs. C using RMA. Next, we look at the effect of different GRiS sizes on the detection of statistically significant genes. For each candidate rank-invariant set size, we apply the GRSN method followed by a statistical analysis to identify lists of up and down regulated genes. We used the eBayes and topTable functions from the limma [14] package in BioConductor with a FC cutoff of 1.5 and a False Discovery Rate (FDR) cutoff of 0.05 (5%) to select statistically significant genes. The FDR method [15,16] applies to multiple hypothesis testing. It uses calculated P-values to control the rate of false positives expected from a set of statistical tests. To compare two candidate sizes for the GRiS, we compare these lists of genes. For the two up regulated lists, we count the number of genes that are in one or the other list, but not both. We do the same for the down regulated lists and add the results. This gives us the number of genes affected by the change in the rank-invariant set size. In Fig. 3B, we use a bar graph to report the numbers of affected genes when different rank-invariant set sizes are compared. As a reference point, we compare GRSN with a GRiS size of 5,000 to no GRSN normalization (red bar). This serves to quantify the effect of the GRSN method itself. To quantify the \"stability\" at reasonably sized rank-invariant sets, we compare 5 K to 10 K, 10 K to 15 K and 15 K to 20 K (blue bars). The effect of applying GRSN (red bar) is large while the effect of changing the rank-invariant set size above 5 K (blue bars) is small. In summary, the size of the rank-invariant set does not seem to be critical. Any value in the range of 5 K to 20 K should work equally well on the current high-density arrays. However, given that we want to minimize rank variance in our selected GRiS, we use a default size of 5 K (5000) for high density arrays with greater than 20,000 probe sets.\nThe choice of the smoother span supplied to the lowess function (see Methods section) can also effect the calibration curves. We have evaluated a range of values for this parameter (data not shown) and have chosen 0.25 as the default for GRSN. However, in a few cases, this default value may not be optimal. For example, some datasets (such as the simulation study presented below) produce a GRiS that is not evenly distributed along the full transcript expression range. In these cases, a larger smoother span may be needed to produce a smooth calibration curve. In the case of the simulation study described below, we chose 0.50 for the smoother span (see Fig. 4A). The tradeoff is that increasing the smoother span can lead to calibration curves that do not properly track the GRiS at the extreme ends of the transcript expression range. We recommend starting with the default value of 0.25, but checking the calibration curves plotted by the GRSN method for continuity with the GRiS (see Fig. 2, column 2).\nFigure 4 Simulation study showing the performance of GRSN. Differential gene expression was simulated starting with a dataset containing 10 biological replicates. The simulation was repeated 100 times and each time the 10 samples were divided into two equal groups, gene expression was simulated in the second group, and a randomly generated, non-linear artifact (skew) was applied to each sample. In each simulation, GRSN was applied to correct the simulated skew. A. First row – sample 8 of 10 from simulation 100 of 100. Left hand panel shows M vs. A plot before simulated skew. Second panel from left shows the introduction of simulated skew. Third panel from left shows GRiS and calibration curve from GRSN process. Right hand panel shows data after GRSN correction of simulated skew. Second row – same as first row, but for simulation 81 of 100. B. GRSN improves gene selection results. Left panel shows true positive gene selection results for simulated up and down genes as indicated before simulated skew is added, after simulated skew is added, and after GRSN is used to correct the simulated skew. Critical portions of the y-axis scale are expanded at the top and bottom of the graph. Middle panel shows false negatives with the bottom portion of the y-axis scale expanded at the bottom. Right panel shows false positive results. Data is represented using standard Tukey box plots. C. Average Fold Change (FC) variation. The average FC for each simulated FC value is plotted showing the variation over all simulated genes and 100 simulations. The left third of the graph shows values for simulated data before skew is introduced, the middle third shows values with skew added, and the right third shows values after GRSN correction of skew as indicated. Box plots are shown.\n\nG"}