PMC:4979052 / 6704-8728 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"27600351-20379172-69477552","span":{"begin":134,"end":135},"obj":"20379172"},{"id":"27600351-19015125-69477553","span":{"begin":681,"end":682},"obj":"19015125"},{"id":"27600351-12538238-69477554","span":{"begin":1138,"end":1140},"obj":"12538238"}],"text":"2.1. Human Expression Data\nWe have downloaded the HumanExpressionAtlas data set (E-MTAB-62 on Array Express) compiled by Lukk et al. [7] consisting of 5372 (“qc-included”) samples hybridized to Affymetrix HG-U133a microarrays. This data set comprises in total 206 different studies from 163 different laboratories. Using text mining and manual curation, each sample was assigned one of 369 biological groups representing distinct human cell and tissue types, disease states and cell lines. The resulting expression space, the combined and processed gene expression data from this diverse collection of human samples, can be queried using the dedicated database ArrayExpress Atlas [8].\nThe 5372 samples have been selected from a larger data set of 8268 samples after application of strict quality control (qc). This was based on the quality measures scaling factor, average background, percentage of present calls, RNA degradation from whole array, Normalized Unscaled Standard Errors, and Relative Log Expression computed from the array data using Bioconductor [9]. Quality thresholds were selected based on the recommendations given in [10] adjusted to the distribution of the quality measures within this data set. Most of these metrics are typically the first choice for judging whether the samples have or have not sufficient quality relative to the complete set, while however being quite unspecific for any particular technical effect.\nWe obtained a full list of the 8268 samples from the authors and downloaded the remaining 2896 (“qc-excluded”) samples from public databases. From these, 137 samples could however not be retained as they were removed from the databases, leaving in total 8268 − 137 = 8131 samples. The full set of 8268 unique samples represents virtually all HG-U133a data publicly available in the two major public databases GEO and ArrayExpress in 2006 with no restrictions on the type of samples. This HumanArraysSet therefore is a representative sample of available human microarray experiments."}