PMC:2644708 / 36535-40806 JSONTXT

Annnotations TAB JSON ListView MergeView

    NEUROSES

    {"project":"NEUROSES","denotations":[{"id":"T374","span":{"begin":299,"end":305},"obj":"PATO_0001199"},{"id":"T375","span":{"begin":642,"end":651},"obj":"PATO_0000152"},{"id":"T376","span":{"begin":686,"end":695},"obj":"PATO_0000152"},{"id":"T377","span":{"begin":1488,"end":1497},"obj":"PATO_0000152"},{"id":"T378","span":{"begin":681,"end":685},"obj":"PATO_0001470"},{"id":"T379","span":{"begin":681,"end":685},"obj":"PATO_0000161"},{"id":"T380","span":{"begin":840,"end":843},"obj":"CHEBI_40799"},{"id":"T381","span":{"begin":844,"end":849},"obj":"PATO_0000014"},{"id":"T382","span":{"begin":1138,"end":1143},"obj":"PATO_0000014"},{"id":"T383","span":{"begin":1289,"end":1293},"obj":"PATO_0000318"},{"id":"T384","span":{"begin":1450,"end":1454},"obj":"PATO_0000318"},{"id":"T385","span":{"begin":1368,"end":1371},"obj":"PATO_0000322"},{"id":"T386","span":{"begin":1438,"end":1444},"obj":"PATO_0000324"},{"id":"T387","span":{"begin":1455,"end":1465},"obj":"PATO_0001855"},{"id":"T388","span":{"begin":1520,"end":1524},"obj":"PATO_0000318"},{"id":"T389","span":{"begin":1833,"end":1837},"obj":"PATO_0000318"},{"id":"T390","span":{"begin":1923,"end":1927},"obj":"PATO_0000318"},{"id":"T391","span":{"begin":1520,"end":1524},"obj":"PATO_0000318"},{"id":"T392","span":{"begin":1833,"end":1837},"obj":"PATO_0000318"},{"id":"T393","span":{"begin":1923,"end":1927},"obj":"PATO_0000318"},{"id":"T394","span":{"begin":1520,"end":1524},"obj":"PATO_0000318"},{"id":"T395","span":{"begin":1833,"end":1837},"obj":"PATO_0000318"},{"id":"T396","span":{"begin":1923,"end":1927},"obj":"PATO_0000318"},{"id":"T397","span":{"begin":1520,"end":1524},"obj":"PATO_0000318"},{"id":"T398","span":{"begin":1833,"end":1837},"obj":"PATO_0000318"},{"id":"T399","span":{"begin":1923,"end":1927},"obj":"PATO_0000318"},{"id":"T400","span":{"begin":1520,"end":1524},"obj":"PATO_0000318"},{"id":"T401","span":{"begin":1833,"end":1837},"obj":"PATO_0000318"},{"id":"T402","span":{"begin":1923,"end":1927},"obj":"PATO_0000318"},{"id":"T403","span":{"begin":1520,"end":1524},"obj":"PATO_0000318"},{"id":"T404","span":{"begin":1833,"end":1837},"obj":"PATO_0000318"},{"id":"T405","span":{"begin":1923,"end":1927},"obj":"PATO_0000318"},{"id":"T406","span":{"begin":1520,"end":1524},"obj":"PATO_0000318"},{"id":"T407","span":{"begin":1833,"end":1837},"obj":"PATO_0000318"},{"id":"T408","span":{"begin":1923,"end":1927},"obj":"PATO_0000318"},{"id":"T409","span":{"begin":1520,"end":1524},"obj":"PATO_0000318"},{"id":"T410","span":{"begin":1833,"end":1837},"obj":"PATO_0000318"},{"id":"T411","span":{"begin":1923,"end":1927},"obj":"PATO_0000318"},{"id":"T412","span":{"begin":1520,"end":1524},"obj":"PATO_0000318"},{"id":"T413","span":{"begin":1833,"end":1837},"obj":"PATO_0000318"},{"id":"T414","span":{"begin":1923,"end":1927},"obj":"PATO_0000318"},{"id":"T415","span":{"begin":1520,"end":1524},"obj":"PATO_0000318"},{"id":"T416","span":{"begin":1833,"end":1837},"obj":"PATO_0000318"},{"id":"T417","span":{"begin":1923,"end":1927},"obj":"PATO_0000318"},{"id":"T418","span":{"begin":1520,"end":1524},"obj":"PATO_0000318"},{"id":"T419","span":{"begin":1833,"end":1837},"obj":"PATO_0000318"},{"id":"T420","span":{"begin":1923,"end":1927},"obj":"PATO_0000318"},{"id":"T421","span":{"begin":3245,"end":3250},"obj":"PATO_0000586"},{"id":"T422","span":{"begin":3544,"end":3547},"obj":"PATO_0000322"},{"id":"T423","span":{"begin":3588,"end":3592},"obj":"PATO_0000318"},{"id":"T424","span":{"begin":3599,"end":3603},"obj":"CHEBI_25434"},{"id":"T425","span":{"begin":3632,"end":3637},"obj":"PATO_0000002"},{"id":"T426","span":{"begin":3720,"end":3725},"obj":"PATO_0000002"},{"id":"T427","span":{"begin":3646,"end":3652},"obj":"PATO_0000324"},{"id":"T428","span":{"begin":3751,"end":3760},"obj":"PATO_0000152"},{"id":"T429","span":{"begin":3816,"end":3825},"obj":"PATO_0000152"},{"id":"T430","span":{"begin":4100,"end":4109},"obj":"PATO_0000152"},{"id":"T431","span":{"begin":3857,"end":3868},"obj":"CHEBI_33232"}],"text":"Implications of GRSN for gene discovery\nAn important goal of many microarray-based studies is the identification of genes with statistically significant differential expression between experimental conditions. As seen with the simulated data study and as shown in real data sets in Fig. 5B, the non-linear skew seen in some datasets is likely to significantly impact standard statistical methods for selecting differentially regulated genes. To analyze this we compared statistical results before and after applying GRSN normalization to a number of datasets. We selected significant up and down regulated transcripts that pass a Fold Change threshold of 1.5 and a False Discovery Rate threshold of 0.05 (similar results are obtained using FDR values ranging from 0.01 to 0.20). In Fig. 7A, we use the same M vs. A plots as in Fig. 5B, but add color coding to visualize genes selected as statistically up or down regulated between two sample classes. The \"S\" shaped skew in the data effects both the calculation of statistical significance and the calculation of fold change. In Fig. 7A, genes from the GB dataset summarized with RMA are color coded based on meeting both a statistical and a fold change cutoff. Transcripts found significant only when GRSN is not applied are indicated in blue, transcripts found significant only when GRSN is applied are indicated in red, and transcripts found significant in both cases are indicated in yellow. The blue horizontal lines indicate the FC threshold applied to select the blue transcripts (left panel) and the red horizontal lines indicate the FC threshold applied to select the red transcripts (right panel). In this case, it can be seen that the skew is pushing large groups of genes in or out of the selected fold change range. In Fig. 7B–D the color coding is modified to show the blue genes only on the left and red genes only on the right and in Fig. 7B there are more blue genes lost with the application of GRSN than there are red genes gained. This result is misleading, because the median p-value for the yellow genes (genes found in both cases) has decreased (improved) from 0.00023 to 0.00021 with the application of GRSN. The reason less genes are found after applying GRSN is due to the FDR adjustment. The p-value required to meet the 0.05 FDR threshold before application of GRSN was 0.0015 while the required p-value after GRSN application was 0.0010. Therefore, the p-value threshold associated with the given FDR value became more stringent after applying GRSN. This is most likely due to the distribution of p-values for genes that did not make the FC cutoff. The cause of this is well illustrated in Fig. 7C with the SS dataset. Here, statistics alone, with the FDR threshold reduced to 0.10 and with no fold change cutoff, are used to select genes. In this case, it appears that there are large groups of genes that are detected as statistically significant due solely to the effect of the \"S\" shaped skew in the data. In fact, applying GRSN in this case reduces the total number of genes selected from approximately 1,800 to only 171 genes significantly up regulated and 295 genes significantly down regulated (1,344 significant genes were removed and only 20 added). These large numbers of false positive results will cause overly optimistic FDR calculations for all genes and removing these false positive results with the use of GRSN results in fewer genes passing the FDR cutoff even when the actual p-values have improved. In Fig. 7D there are a significant number of red genes added when GRSN is applied and no blue genes lost. In this case, the median p-value for the yellow genes improved significantly from 0.000076 to 0.000042 while the p-value required to meet the FDR threshold changed from 0.00020 to 0.00070. In this case, the FDR threshold became less stringent with the application of GRSN. Presumably the benefit from a decrease in variance among replicates out weighed any bias in the FDR calculation introduced by the removal of false positives (no false positive are shown because they are \"masked\" by the FC threshold). In summary, systematic distortions in microarray datasets are likely to adversely impact statistical calculations leading to unreliable gene selection results."}