PMC:4979052 / 29535-32003
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"27600351-12034852-69477565","span":{"begin":320,"end":322},"obj":"12034852"}],"text":"3.4. Sequence Effect Size\nNucleic acid folding and formation of DNA/DNA or DNA/RNA duplexes on surfaces are fundamental reactions for any microarray assay and largely depend on the hybridization conditions. For example, the temperature and ion composition of the hybridization reactions can affect nucleic acid binding [36]. Different hybridization conditions can lead to sequence-dependent variations in the probe intensity signals, which can further propagate to the gene expression estimates and therefore constitute potential artifacts.\nWe assess the sequence effect using the parameter log(Kdiff) referring to the maximum pairwise difference between all positional-dependent NN-sensitivity profiles (see Methods Section 2.3). For example, a value of log(Kdiff) = 5 for a particular hybridization means that, on the average, two hypothetical probes (most likely with the sequences AAA…A and CCC…C) would differ in their intensity values by 5 orders of magnitude. log(Kdiff)can thus be thought of as the maximum strength, or impact, of the sequence effect.\nAs previously we computed log(Kdiff) for all 8131 samples of the HumanArraySet plotting the respective density distribution in Figure 4a. By trend qc-excluded samples show a lower maximum sensitivity amplitude, rendering it a potential marker for low quality samples. Based on the observation that barely any good quality samples (\u003c0.1%) exhibit a smaller maximum sensitivity amplitude, log(Kdiff) = 3 is chosen as conservative threshold selecting samples with critically low sequence effect size. This applies to a fraction of 4.1% of all samples.\nFigure 4 Distribution of summary parameters related to sequence effects for qc-included/qc-excluded samples of the HumanExpressionAtlas. Panel (a) shows the parameter log(Kdiff) as a measure of the total strength of the sequence effect. The inset shows the corresponding sequence profiles for two selected samples (red and blue dots, in the same color as the profiles); Panel (b) shows the intensity increase due to the (GGG)1 motif, δI(GGG)1. The largest correlation of the log(Kdiff) parameter with the first five principal components of the HumanExpressionAtlas (in absolute scales) is r = −0.17 with the third principal component. Correlations for the remaining principal components are smaller than |r| = 0.11. In conclusion, the sequence effect size is not a technical variable with a large impact on the most common patterns in the expression space."}