PMC:2644708 / 23194-29960 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"19055840-16273092-8161606","span":{"begin":1281,"end":1283},"obj":"16273092"}],"text":"GRSN improves statistical performance in simulated data\nWhen evaluating the performance of GRSN on a given microarray dataset, we are confronted with the typical problem of not knowing a priori which transcripts are truly regulated and by how much. Therefore, we have created simulated datasets where we have artificially introduced differential gene expression so that we do know a priori which genes are regulated and to what degree. We then introduce simulated, systematic, non-linear artifacts (skew) typical of what are seen in real world datasets. This data allows us to evaluate the ability of standard statistical methods to identify the correct up and down regulated genes before the simulated artifacts are introduced, after they are introduced, and after applying GRSN to correct the simulated artifacts. Thus, the performance of GRSN can be evaluated with respect to reducing unwanted variance and improving statistical gene selection performance.\nTo create a relatively realistic simulated dataset, we used a dataset from a cell culture model with 10 biological control replicates (run on Affymetrix® HG-U133_Plus_2 GeneChips® and processed using the RMA method) to obtain typical background variance (the non-linear artifacts for this dataset were relatively small) [17]. In the first stage of the simulation we randomly partitioned the samples in this dataset into two equal subsets, A and B. We then randomly selected unique subsets of genes and introduced simulated Fold Changes (FC) in the B samples. 1000 genes were set with a FC of 1.5 up, 500 2 fold up, 300 4 fold up, 200 8 fold up; and then 200 were set down 1.5 fold, 200 down 2 fold, and 100 down 4 fold. This gives a total of 2000 up regulated genes with FC in the range of 1.5 to 8 compared to only 500 down regulated genes with FC in the range of -1.5 to -4 so that both the number and degree of up and down regulation is heavily biased in the up direction. In the second stage of the simulation we added random non-linear skew to each sample. The third and final stage of the simulation was to apply GRSN to correct the skew just added. We have repeated this complete simulation, starting with the random partitioning of the original 10 control samples, 100 times (randomly selecting 100 unique permutations from the 252 possible permutations). Figure 4A shows M vs. A plots demonstrating typical skews introduced in a selected sample in two of the 100 different simulations. This figure shows a selected sample compared to the virtual reference sample both before and after the introduction of a simulated skew, and then shows the effect of applying GRSN to correct the simulated skew (compare these plots to Fig. 2).\nAt each stage of each simulation (after simulated FC is introduced, after simulated skew is added, and after GRSN is applied to correct the simulated skew), the Standard Deviation (SD) within replicates and the average FC between A and B sample subsets is calculated for each gene. A goal of GRSN is to reduce the SD among replicates. As shown in Table 1, the average SD among replicates is highest in the data with simulated skew and is substantially reduced when GRSN correction is applied. The SD after GRSN correction is almost identical to the SD for the original data before simulated skew is introduced (see Table 1). In addition to removing unwanted technical variation, it is important to preserve biologically relevant variation. In this simulation, the biologically relevant variation is the simulated FC introduced in sample set B. Here we calculate the average for all simulated FC ranges up or down across all simulations. In our study, the average FC value stays relatively constant (within 2–3%) at each stage of the simulation (see Table 1), demonstrating that GRSN does not adversely affect the relevant variation (also see Fig. 4C).\nTable 1 GRSN reduces standard deviation while preserving introduced fold change in a simulated data study. Average Standard Deviation (SD) among replicates and average Fold Change (FC) are reported for each stage of our simulated data study (see text for description). SD values for each gene are calculated separately for sample set A and sample set B and then averaged across both sample sets and all 100 simulations. FC values are calculated for each gene by taking the average for sample set B and dividing by the average for sample set A then averaging over all 100 simulations. The values reported in each column are (from left to right) 1) the average SD for all genes, 2) the average SD for the up and down regulated genes, 3) the average FC for up regulated genes, and 4) the average FC for down regulated genes. Values reported in the top row are for data with simulated FC only. Values in the middle row are for data with simulated skew added. The bottom row reports values after GRSN correction of the simulated skew. Next we evaluated the effects of the introduced skews and GRSN correction on statistical gene selection performance in our simulated datasets. Statistically significant genes were selected with eBayes using a FC cutoff of 1.2 and a FDR cutoff of 0.05. We evaluated the numbers of True Positive (TP) (genes with actual simulated FC), False Positive (FP), and False Negative (FN) genes found at each stage of the data simulation for each of the 100 simulations run. Figure 4B shows the results using box plots showing the range of gene selection results across all 100 simulations. The statistical results from the data with simulated artifacts, but no GRSN correction, vary widely from simulation to simulation, resulting in a substantial reduction in identified true positives (middle data set in left plot), and an abundance of false negatives and false positives (middle data sets in middle and right-hand plots). False negatives are more common than false positives due to the random nature of the introduced skew. However, GRSN corrects these issues and the results both before the simulated artifacts and after the simulated artifacts have been corrected with GRSN are very stable (Fig. 4B, compare left and right-hand data sets in each box plot).\nWe also evaluated the ability of GRSN to preserve the Fold Change (FC) values introduced in the above simulation. We tabulated the average FC for each range over all 100 simulations. This tabulation was done for each stage of the simulation: after simulated FC, after simulated skew, and after GRSN correction. Box plots were used to summarize the results for each FC range and each stage. As seen in Fig. 4C, the variation in FC for each simulated FC range is increased substantially by the simulated skew, but the application of GRSN restores both the mean FC and the variation in FC to values very close to the pre-skew values."}

    NEUROSES

    {"project":"NEUROSES","denotations":[{"id":"T103","span":{"begin":481,"end":487},"obj":"PATO_0001199"},{"id":"T104","span":{"begin":1223,"end":1229},"obj":"PATO_0001199"},{"id":"T105","span":{"begin":1273,"end":1278},"obj":"PATO_0000587"},{"id":"T106","span":{"begin":1434,"end":1440},"obj":"PATO_0000430"},{"id":"T107","span":{"begin":1853,"end":1859},"obj":"PATO_0001056"},{"id":"T108","span":{"begin":1853,"end":1859},"obj":"PATO_0001555"},{"id":"T109","span":{"begin":1853,"end":1859},"obj":"PATO_0000070"},{"id":"T110","span":{"begin":1886,"end":1896},"obj":"PATO_0000076"},{"id":"T111","span":{"begin":1925,"end":1934},"obj":"PATO_0000039"},{"id":"T112","span":{"begin":1994,"end":2000},"obj":"PATO_0001199"},{"id":"T113","span":{"begin":2267,"end":2273},"obj":"PATO_0000430"},{"id":"T114","span":{"begin":2868,"end":2877},"obj":"PATO_0002175"},{"id":"T115","span":{"begin":2909,"end":2916},"obj":"PATO_0000461"},{"id":"T116","span":{"begin":3058,"end":3065},"obj":"PATO_0000461"},{"id":"T117","span":{"begin":3564,"end":3571},"obj":"PATO_0000461"},{"id":"T118","span":{"begin":3653,"end":3660},"obj":"PATO_0000461"},{"id":"T119","span":{"begin":3150,"end":3157},"obj":"PATO_0001997"},{"id":"T120","span":{"begin":3664,"end":3669},"obj":"PATO_0000002"},{"id":"T121","span":{"begin":3687,"end":3695},"obj":"PATO_0000438"}],"text":"GRSN improves statistical performance in simulated data\nWhen evaluating the performance of GRSN on a given microarray dataset, we are confronted with the typical problem of not knowing a priori which transcripts are truly regulated and by how much. Therefore, we have created simulated datasets where we have artificially introduced differential gene expression so that we do know a priori which genes are regulated and to what degree. We then introduce simulated, systematic, non-linear artifacts (skew) typical of what are seen in real world datasets. This data allows us to evaluate the ability of standard statistical methods to identify the correct up and down regulated genes before the simulated artifacts are introduced, after they are introduced, and after applying GRSN to correct the simulated artifacts. Thus, the performance of GRSN can be evaluated with respect to reducing unwanted variance and improving statistical gene selection performance.\nTo create a relatively realistic simulated dataset, we used a dataset from a cell culture model with 10 biological control replicates (run on Affymetrix® HG-U133_Plus_2 GeneChips® and processed using the RMA method) to obtain typical background variance (the non-linear artifacts for this dataset were relatively small) [17]. In the first stage of the simulation we randomly partitioned the samples in this dataset into two equal subsets, A and B. We then randomly selected unique subsets of genes and introduced simulated Fold Changes (FC) in the B samples. 1000 genes were set with a FC of 1.5 up, 500 2 fold up, 300 4 fold up, 200 8 fold up; and then 200 were set down 1.5 fold, 200 down 2 fold, and 100 down 4 fold. This gives a total of 2000 up regulated genes with FC in the range of 1.5 to 8 compared to only 500 down regulated genes with FC in the range of -1.5 to -4 so that both the number and degree of up and down regulation is heavily biased in the up direction. In the second stage of the simulation we added random non-linear skew to each sample. The third and final stage of the simulation was to apply GRSN to correct the skew just added. We have repeated this complete simulation, starting with the random partitioning of the original 10 control samples, 100 times (randomly selecting 100 unique permutations from the 252 possible permutations). Figure 4A shows M vs. A plots demonstrating typical skews introduced in a selected sample in two of the 100 different simulations. This figure shows a selected sample compared to the virtual reference sample both before and after the introduction of a simulated skew, and then shows the effect of applying GRSN to correct the simulated skew (compare these plots to Fig. 2).\nAt each stage of each simulation (after simulated FC is introduced, after simulated skew is added, and after GRSN is applied to correct the simulated skew), the Standard Deviation (SD) within replicates and the average FC between A and B sample subsets is calculated for each gene. A goal of GRSN is to reduce the SD among replicates. As shown in Table 1, the average SD among replicates is highest in the data with simulated skew and is substantially reduced when GRSN correction is applied. The SD after GRSN correction is almost identical to the SD for the original data before simulated skew is introduced (see Table 1). In addition to removing unwanted technical variation, it is important to preserve biologically relevant variation. In this simulation, the biologically relevant variation is the simulated FC introduced in sample set B. Here we calculate the average for all simulated FC ranges up or down across all simulations. In our study, the average FC value stays relatively constant (within 2–3%) at each stage of the simulation (see Table 1), demonstrating that GRSN does not adversely affect the relevant variation (also see Fig. 4C).\nTable 1 GRSN reduces standard deviation while preserving introduced fold change in a simulated data study. Average Standard Deviation (SD) among replicates and average Fold Change (FC) are reported for each stage of our simulated data study (see text for description). SD values for each gene are calculated separately for sample set A and sample set B and then averaged across both sample sets and all 100 simulations. FC values are calculated for each gene by taking the average for sample set B and dividing by the average for sample set A then averaging over all 100 simulations. The values reported in each column are (from left to right) 1) the average SD for all genes, 2) the average SD for the up and down regulated genes, 3) the average FC for up regulated genes, and 4) the average FC for down regulated genes. Values reported in the top row are for data with simulated FC only. Values in the middle row are for data with simulated skew added. The bottom row reports values after GRSN correction of the simulated skew. Next we evaluated the effects of the introduced skews and GRSN correction on statistical gene selection performance in our simulated datasets. Statistically significant genes were selected with eBayes using a FC cutoff of 1.2 and a FDR cutoff of 0.05. We evaluated the numbers of True Positive (TP) (genes with actual simulated FC), False Positive (FP), and False Negative (FN) genes found at each stage of the data simulation for each of the 100 simulations run. Figure 4B shows the results using box plots showing the range of gene selection results across all 100 simulations. The statistical results from the data with simulated artifacts, but no GRSN correction, vary widely from simulation to simulation, resulting in a substantial reduction in identified true positives (middle data set in left plot), and an abundance of false negatives and false positives (middle data sets in middle and right-hand plots). False negatives are more common than false positives due to the random nature of the introduced skew. However, GRSN corrects these issues and the results both before the simulated artifacts and after the simulated artifacts have been corrected with GRSN are very stable (Fig. 4B, compare left and right-hand data sets in each box plot).\nWe also evaluated the ability of GRSN to preserve the Fold Change (FC) values introduced in the above simulation. We tabulated the average FC for each range over all 100 simulations. This tabulation was done for each stage of the simulation: after simulated FC, after simulated skew, and after GRSN correction. Box plots were used to summarize the results for each FC range and each stage. As seen in Fig. 4C, the variation in FC for each simulated FC range is increased substantially by the simulated skew, but the application of GRSN restores both the mean FC and the variation in FC to values very close to the pre-skew values."}