PMC:2644708 / 49711-55770 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"19055840-16199517-8161630","span":{"begin":550,"end":552},"obj":"16199517"},{"id":"19055840-16646809-8161631","span":{"begin":4208,"end":4210},"obj":"16646809"}],"text":"Methods\n\nSoftware\nAll statistical analysis, method implementation, and graphing for figures, was performed using the open-source programming language R, Versions 2.0.1 – 2.4.0 [26], and Bioconductor packages . Affymetrix® CEL files were processed using the MAS 5.0 and RMA algorithms implemented in the BioConductor package: affy Version 1.5.8 [27] (based on the R programming language) and by dChip® 2006 (Build date: Feb. 16, 2006). Gene Set Enrichment Analysis (GSEA) was performed using the author's R language implementation version GSEA.1.0.R [25].\n\nGRSN Implementation\nAt the heart of the GRSN method is the selection of a global rank-invariant set. Microarray data is passed to GRSN as a matrix of expression values with rows representing transcripts (probe sets) and columns representing samples. The Affymetrix® control transcripts are excluded. Non log scaled data is expected. If necessary, a fixed value is added to all matrix entries to ensure that no entry is less than 0.25. This is to ensure that when the data is log base 2 (log2) scaled, no values will be less than negative 2.0. The data is then log2 scaled. Next, a rank matrix of the same dimension as the input matrix is calculated. For each column of the input matrix, the rank (order in a sorted list) of each transcript in that column is calculated and used to populate the rank matrix. The variance of each row of the rank matrix is calculated and the rows are then ranked based on their variance. The rows with the highest variance are discarded. A subset of the original log2 scaled matrix is produced, containing only the rows corresponding to the remaining rows in the rank matrix. Using this new matrix, another rank matrix is calculated. This process is repeated in an iterative manner four (4) times. (The number of rows to discard per iteration is calculated by subtracting the selected size (default is 5000) for the Global Rank-invariant Set (GRiS) from the original row count and dividing the result by four.) The remaining transcripts are considered the GRiS. Next, reference values for each of the transcripts in the GRiS are obtained by calculating the trimmed mean from all samples for each of the respective transcripts. A calibration curve is then calculated for each sample by comparing the GRiS values from the given sample to the GRiS reference values using an M vs. A plot. The plotted points representing the GRiS should be centered about the horizontal line where M = 0. The calibration curve is obtained by drawing a smoothed line through these points using lowess [13] as implemented in R (the default smoother span used with GSRN is 0.25, but this value may be adjusted if needed). The vertical displacement between the smoothed calibration curve and the horizontal line (M = 0) is used as an intensity dependant adjustment to be applied to all transcripts in the given sample. This adjustment is subtracted from the log2 scaled data (this is equivalent to applying an intensity dependant scaling factor to non log scaled data). It should be noted that the lowess smoothing and the calculation of the intensity dependant adjustment values are done on the data after it has been transformed for the M vs. A plot. After the intensity dependant adjustment is applied, the M vs. A transformation is reversed. Our implementation of GRSN as an \"R\" script with directions is available [see Additional file 3].\n\nGRSN Graphical Visualization\nQuality control graphs are plotted for each sample as it is normalized. For each sample, three M vs. A scatter plots are generated: The first is pre GRSN, and compares the given sample to the virtual reference sample (a pseudo sample with each transcript set to the trimmed mean value for that transcript from all samples in a dataset). The second shows the GRiS transcripts only, both before and after adjustment, and includes the lowess smoothed calibration curve. The third shows the given sample, after adjustment, compared to the virtual reference sample.\n\nData simulation study\nGRSN was evaluated using simulated data produced by adding simulated gene expression and simulated non-linear artifacts (skew) to a cell culture dataset containing 10 replicated controls [14]. The 10 samples were first processed using the RMA method. Next, all possible permutations for dividing these 10 replicate samples into two groups of 5 (set A and set B) were calculated. Out of the 252 possible permutations, we randomly selected 100, thus ensuring that we had 100 randomly selected, yet distinct partitions of the original data. For each of these 100 distinct partitions (hereafter called a simulation), we added simulated Fold Change (FC) by applying a multiplication factor to selected transcripts (genes) in all samples in set B. For each simulation, we randomly selected 2500 genes. The first 1000 (out of the 2500 selected) were multiplied by +1.5 to simulate a FC of 1.5 up. A factor of +2 was applied to the next 500, a factor of +4 to the next 300, a factor of +8 to the next 200, a factor of 2/3 to the next 200, a factor of 1/2 to the next 200, and a factor of 1/4 to the final 100 genes. Thus, up regulation with FC in the range of 1.5 to 8 was simulated for 2000 genes and down regulation with FC in the range of -1.5 to -4 was simulated in 500 genes. The next step in each simulation was to add distinct, randomly generated skew to each sample. This was done by randomly generating curves similar to the calibration curves produced by GRSN when normalizing RMA processed data and using these as \"calibration curves\" to add intensity-dependent skew to the simulated samples. The final step in each simulation was to apply GRSN to correct the introduced skew. The simulated data was analyzed at each of the three stages in the simulation process just described, 1) after simulated FC was introduced, 2) after simulated skew was added, and 3) after GRSN was applied to correct the simulated skew. The results of these analyses were tabulated and/or averaged over all 100 simulations and reported in the results section.\n"}