PMC:2644708 / 11116-15966 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"19055840-11532216-8161604","span":{"begin":124,"end":125},"obj":"11532216"}],"text":"Global Rank-invariant Set Normalization\nGRSN is based on the general idea of rank-invariant genes presented by Li and Wong [4]. We extend this idea to select a single, globally rank-invariant set of endogenous genes to be used to normalize all samples in a dataset. These are genes believed to be consistently expressed in all samples within a given dataset and should appear in roughly the same rank order in each sample when sorted by expression level. Importantly, this ordering, or rank, should not be affected by the types of non-linear artifacts that this normalization method is designed to correct.\nAn overview of the GRSN method is shown in Figure 1B. Briefly, all transcripts (representing endogenous genes) are ranked in each sample of a dataset based on expression (as calculated by summarizing probe sets using established methods such as RMA or MAS 5.0). The variance of the rank order for each transcript is then calculated across all of the samples. Transcripts with the highest rank variance are discarded. The remaining transcripts are again ranked and the process is repeated in an iterative fashion. This iteration cycle is important because, for datasets with unbalanced numbers of up and down regulated transcripts, there can be a global shift of transcript rank order caused by the most differentially regulated transcripts. This global shift of the rank order will disappear as the most differentially regulated transcripts (with the highest rank variance) are discarded during the first few iteration cycles. Note that if we require the global rank-invariant set to have rank variance of zero for all transcripts, we will not typically have enough transcripts for an effective calibration curve, i.e. the set of transcripts with exactly the same rank order in all samples is too small. Therefore, the iteration cycle is terminated when a reasonable number of approximately rank-invariant transcripts remain (5000 by default). These probe sets are considered the \"Global Rank-invariant Set\" (GRiS).\nA single virtual reference sample is then created by taking the trimmed mean (mean after removing 25% of the values from the top and bottom of the range) expression value (over the entire dataset) for each summarized probe set (transcript), and M vs. A plots are generated comparing each sample to this virtual reference. This provides a visualization of the effect of applying GRSN. Fig. 2, column 1 shows the M vs. A plots comparing sample (N3) to the virtual reference of the GB dataset after the data is summarized using MAS 5.0 (first row), RMA (second row), or dChip (last row). We then generate a M vs. A plot of the identified GRiS transcripts only, comparing expression values from a given sample to expression values from the virtual reference sample (Fig. 2, column 2, blue points). We use lowess [13] to fit a smooth curve through these points (green line). This smoothed curve is used as the calibration curve for this sample. We then calculate an intensity-dependent adjustment for transcripts in each sample which, when applied, will center the sample's GRiS on the horizontal line of the M vs. A plot (at M = 0). Fig. 2, column 2, also shows the GRiS after calibration of this sample (red dots). Fig. 2, column 3, then shows all transcripts after calibration of the sample compared to the virtual reference sample (red dots). This process is repeated for each sample, with a different calibration curve generated each time (using the same GRiS). Using the trimmed mean values of the GRiS as the reference for normalization provides a robust average across all samples so that the linearity of the normalized data is not affected by a few samples with anomalous non-linear artifacts. Note that these intensity-dependent adjustments are applied additively to the log scaled data and that this is equivalent to an intensity-dependent scaling of the original, non-log scaled data.\nFigure 2 GRSN corrects non-linear distortions apparent in different summary methods. M vs. A plots demonstrating the GRSN method applied to the MAS 5.0, RMA, and dChip® probe set summary methods. Column 1 shows M vs. A plots comparing one selected sample to the virtual reference sample created by taking the trimmed mean expression value of each probe set in that dataset. Column 2 shows the global rank-invariant set (GRiS) of 5,000 probe sets before GRSN normalization in blue and after normalization in red (note change in y-axis scale). The smoothed curve through the rank-invariant set is shown in green. This is the calibration curve used to normalize the selected sample. Column 3 shows all probe sets after GRSN normalization of the selected sample compared to the virtual reference sample. The sample shown is N3 from the GB dataset. The probe set summary methods used are (from top to bottom): MAS 5.0, RMA, and dChip®.\n\nD"}

    NEUROSES

    {"project":"NEUROSES","denotations":[{"id":"T148","span":{"begin":3944,"end":3950},"obj":"PATO_0001199"}],"text":"Global Rank-invariant Set Normalization\nGRSN is based on the general idea of rank-invariant genes presented by Li and Wong [4]. We extend this idea to select a single, globally rank-invariant set of endogenous genes to be used to normalize all samples in a dataset. These are genes believed to be consistently expressed in all samples within a given dataset and should appear in roughly the same rank order in each sample when sorted by expression level. Importantly, this ordering, or rank, should not be affected by the types of non-linear artifacts that this normalization method is designed to correct.\nAn overview of the GRSN method is shown in Figure 1B. Briefly, all transcripts (representing endogenous genes) are ranked in each sample of a dataset based on expression (as calculated by summarizing probe sets using established methods such as RMA or MAS 5.0). The variance of the rank order for each transcript is then calculated across all of the samples. Transcripts with the highest rank variance are discarded. The remaining transcripts are again ranked and the process is repeated in an iterative fashion. This iteration cycle is important because, for datasets with unbalanced numbers of up and down regulated transcripts, there can be a global shift of transcript rank order caused by the most differentially regulated transcripts. This global shift of the rank order will disappear as the most differentially regulated transcripts (with the highest rank variance) are discarded during the first few iteration cycles. Note that if we require the global rank-invariant set to have rank variance of zero for all transcripts, we will not typically have enough transcripts for an effective calibration curve, i.e. the set of transcripts with exactly the same rank order in all samples is too small. Therefore, the iteration cycle is terminated when a reasonable number of approximately rank-invariant transcripts remain (5000 by default). These probe sets are considered the \"Global Rank-invariant Set\" (GRiS).\nA single virtual reference sample is then created by taking the trimmed mean (mean after removing 25% of the values from the top and bottom of the range) expression value (over the entire dataset) for each summarized probe set (transcript), and M vs. A plots are generated comparing each sample to this virtual reference. This provides a visualization of the effect of applying GRSN. Fig. 2, column 1 shows the M vs. A plots comparing sample (N3) to the virtual reference of the GB dataset after the data is summarized using MAS 5.0 (first row), RMA (second row), or dChip (last row). We then generate a M vs. A plot of the identified GRiS transcripts only, comparing expression values from a given sample to expression values from the virtual reference sample (Fig. 2, column 2, blue points). We use lowess [13] to fit a smooth curve through these points (green line). This smoothed curve is used as the calibration curve for this sample. We then calculate an intensity-dependent adjustment for transcripts in each sample which, when applied, will center the sample's GRiS on the horizontal line of the M vs. A plot (at M = 0). Fig. 2, column 2, also shows the GRiS after calibration of this sample (red dots). Fig. 2, column 3, then shows all transcripts after calibration of the sample compared to the virtual reference sample (red dots). This process is repeated for each sample, with a different calibration curve generated each time (using the same GRiS). Using the trimmed mean values of the GRiS as the reference for normalization provides a robust average across all samples so that the linearity of the normalized data is not affected by a few samples with anomalous non-linear artifacts. Note that these intensity-dependent adjustments are applied additively to the log scaled data and that this is equivalent to an intensity-dependent scaling of the original, non-log scaled data.\nFigure 2 GRSN corrects non-linear distortions apparent in different summary methods. M vs. A plots demonstrating the GRSN method applied to the MAS 5.0, RMA, and dChip® probe set summary methods. Column 1 shows M vs. A plots comparing one selected sample to the virtual reference sample created by taking the trimmed mean expression value of each probe set in that dataset. Column 2 shows the global rank-invariant set (GRiS) of 5,000 probe sets before GRSN normalization in blue and after normalization in red (note change in y-axis scale). The smoothed curve through the rank-invariant set is shown in green. This is the calibration curve used to normalize the selected sample. Column 3 shows all probe sets after GRSN normalization of the selected sample compared to the virtual reference sample. The sample shown is N3 from the GB dataset. The probe set summary methods used are (from top to bottom): MAS 5.0, RMA, and dChip®.\n\nD"}