PMC:2644708 / 42994-46532
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"19055840-15693945-8161628","span":{"begin":2058,"end":2060},"obj":"15693945"}],"text":"Discussion\nWith any data manipulation there is a risk of adding noise. This is a classic problem with background subtraction where subtracting one noisy signal from another noisy signal can actually double the noise. The goal of GRSN is to remove intensity-dependent, technical variation in the data, but we need to make sure that we do not increase the random noise at the same time. GRSN minimizes this risk in a number of ways. First, by using a global rank-invariant set, we ensure that any noise associated with the selection of the rank-invariant set is applied equally to all samples. This both reduces the risk of adding random variance among replicate samples and reduces the risk of adding bias between sample groups. Second, the iterative method used to select the rank-invariant set ensures that the set is not biased by unbalanced numbers of up or down regulated genes or by unequal degrees of up or down regulation. Third, a single robust virtual reference sample is generated by taking the trimmed mean value (among all samples) of each gene in the global rank-invariant set. This ensures that the linearity of the normalized data is not affected by a few abnormal samples, but is a robust reflection of the dataset as a whole. Forth, the actual intensity-dependent calibration applied to each sample is calculated using a robust lowess smoothing algorithm through many points. Therefore, each sample is compared to the same \"representative\" virtual reference and the actual calibration is calculated using a rigorous averaging of many reference and comparison data points.\nAn important distinction between GRSN and other normalization methods is simply when it is applied. We are applying it to the probe set level data after it has been processed and summarized by other methods. This has the advantage that the expression value for a given transcript should more accurately reflect its actual value than will the individual probe values. Other authors have also advocated the application of additional normalization at the probe set level [10]. While we have not investigated applying GRSN to probe level data, we have tried applying the RMA type of quantile normalization to probe set level data. Interestingly, compared to the standard method of applying quantile normalization at the probe level, this yields results more similar to GRSN. We have also investigated substituting lowess normalization for the quantile normalization step in the RMA method (at the probe level) and found it to lead to increased non-linear skew. When GRSN is applied, this increased skew is significantly reduced (data not shown). This suggests that the quantile normalization step is not the cause of the skew and neither is a simple substitution of lowess the solution to the skew.\nIt is important to note that the typical degree of skewing seen in datasets processed with the RMA method is relatively small in terms of absolute fold change. Therefore, a typical fold change cutoff threshold of 1.5 or 2.0 will mask most of this effect and avoid most false positives. For example, if a fold change cutoff of 2.0 is applied to the data shown in Fig. 7C, all but one of the false positive results reported will be masked, although this is not true for the data in Fig. 7A. However, even when the false positives are masked, they are still affecting the False Discovery Rate calculation for all genes, leading to overly optimistic values. This is probably more of a problem than the errors in the fold change values for the selected genes."}
NEUROSES
{"project":"NEUROSES","denotations":[{"id":"T498","span":{"begin":2533,"end":2539},"obj":"PATO_0001199"},{"id":"T499","span":{"begin":2605,"end":2612},"obj":"PATO_0001997"},{"id":"T500","span":{"begin":2728,"end":2734},"obj":"PATO_0001503"},{"id":"T501","span":{"begin":2762,"end":2770},"obj":"CHEBI_75958"},{"id":"T502","span":{"begin":2904,"end":2909},"obj":"PATO_0000587"},{"id":"T503","span":{"begin":2984,"end":2993},"obj":"PATO_0000152"},{"id":"T504","span":{"begin":3369,"end":3373},"obj":"PATO_0001470"},{"id":"T505","span":{"begin":3369,"end":3373},"obj":"PATO_0000161"},{"id":"T469","span":{"begin":247,"end":256},"obj":"PATO_0000049"},{"id":"T470","span":{"begin":1261,"end":1270},"obj":"PATO_0000049"},{"id":"T471","span":{"begin":379,"end":383},"obj":"PATO_0001309"},{"id":"T472","span":{"begin":379,"end":383},"obj":"PATO_0000165"},{"id":"T473","span":{"begin":461,"end":470},"obj":"PATO_0000438"},{"id":"T474","span":{"begin":543,"end":552},"obj":"PATO_0000438"},{"id":"T475","span":{"begin":781,"end":790},"obj":"PATO_0000438"},{"id":"T476","span":{"begin":1076,"end":1085},"obj":"PATO_0000438"},{"id":"T477","span":{"begin":833,"end":843},"obj":"PATO_0000758"},{"id":"T478","span":{"begin":918,"end":928},"obj":"PATO_0000076"},{"id":"T479","span":{"begin":946,"end":952},"obj":"PATO_0002310"},{"id":"T480","span":{"begin":1198,"end":1204},"obj":"PATO_0002310"},{"id":"T481","span":{"begin":1338,"end":1344},"obj":"PATO_0002310"},{"id":"T482","span":{"begin":1018,"end":1023},"obj":"PATO_0000002"},{"id":"T483","span":{"begin":1840,"end":1845},"obj":"PATO_0000002"},{"id":"T484","span":{"begin":1911,"end":1916},"obj":"PATO_0000002"},{"id":"T485","span":{"begin":1171,"end":1179},"obj":"PATO_0000460"},{"id":"T486","span":{"begin":1715,"end":1720},"obj":"CHEBI_50406"},{"id":"T487","span":{"begin":1942,"end":1947},"obj":"CHEBI_50406"},{"id":"T488","span":{"begin":2041,"end":2046},"obj":"CHEBI_50406"},{"id":"T489","span":{"begin":2111,"end":2116},"obj":"CHEBI_50406"},{"id":"T490","span":{"begin":2194,"end":2199},"obj":"CHEBI_50406"},{"id":"T491","span":{"begin":2305,"end":2310},"obj":"CHEBI_50406"},{"id":"T492","span":{"begin":2482,"end":2487},"obj":"CHEBI_50406"},{"id":"T493","span":{"begin":1994,"end":2005},"obj":"CHEBI_33232"},{"id":"T494","span":{"begin":2511,"end":2515},"obj":"CHEBI_25016"},{"id":"T495","span":{"begin":2511,"end":2515},"obj":"CHEBI_27889"},{"id":"T496","span":{"begin":2519,"end":2528},"obj":"PATO_0000470"},{"id":"T497","span":{"begin":2573,"end":2582},"obj":"PATO_0000470"}],"text":"Discussion\nWith any data manipulation there is a risk of adding noise. This is a classic problem with background subtraction where subtracting one noisy signal from another noisy signal can actually double the noise. The goal of GRSN is to remove intensity-dependent, technical variation in the data, but we need to make sure that we do not increase the random noise at the same time. GRSN minimizes this risk in a number of ways. First, by using a global rank-invariant set, we ensure that any noise associated with the selection of the rank-invariant set is applied equally to all samples. This both reduces the risk of adding random variance among replicate samples and reduces the risk of adding bias between sample groups. Second, the iterative method used to select the rank-invariant set ensures that the set is not biased by unbalanced numbers of up or down regulated genes or by unequal degrees of up or down regulation. Third, a single robust virtual reference sample is generated by taking the trimmed mean value (among all samples) of each gene in the global rank-invariant set. This ensures that the linearity of the normalized data is not affected by a few abnormal samples, but is a robust reflection of the dataset as a whole. Forth, the actual intensity-dependent calibration applied to each sample is calculated using a robust lowess smoothing algorithm through many points. Therefore, each sample is compared to the same \"representative\" virtual reference and the actual calibration is calculated using a rigorous averaging of many reference and comparison data points.\nAn important distinction between GRSN and other normalization methods is simply when it is applied. We are applying it to the probe set level data after it has been processed and summarized by other methods. This has the advantage that the expression value for a given transcript should more accurately reflect its actual value than will the individual probe values. Other authors have also advocated the application of additional normalization at the probe set level [10]. While we have not investigated applying GRSN to probe level data, we have tried applying the RMA type of quantile normalization to probe set level data. Interestingly, compared to the standard method of applying quantile normalization at the probe level, this yields results more similar to GRSN. We have also investigated substituting lowess normalization for the quantile normalization step in the RMA method (at the probe level) and found it to lead to increased non-linear skew. When GRSN is applied, this increased skew is significantly reduced (data not shown). This suggests that the quantile normalization step is not the cause of the skew and neither is a simple substitution of lowess the solution to the skew.\nIt is important to note that the typical degree of skewing seen in datasets processed with the RMA method is relatively small in terms of absolute fold change. Therefore, a typical fold change cutoff threshold of 1.5 or 2.0 will mask most of this effect and avoid most false positives. For example, if a fold change cutoff of 2.0 is applied to the data shown in Fig. 7C, all but one of the false positive results reported will be masked, although this is not true for the data in Fig. 7A. However, even when the false positives are masked, they are still affecting the False Discovery Rate calculation for all genes, leading to overly optimistic values. This is probably more of a problem than the errors in the fold change values for the selected genes."}