2. Experimental Section A search for NSCLC and organism Homo sapiens, in the ArrayExpress database yielded 223 datasets [19]. The search was then refined to only include Affymetrix data. This reduced the number of datasets to 115. From these six datasets were chosen where the study looked only at NSCLC, it was a transcriptome profiling array and the number of samples was above 40. The datasets used in the study are listed in Table 1. In total there are 669 arrays in the combined data. microarrays-03-00212-t001_Table 1 Table 1 Datasets from ArrayExpress used in the data analysis. All of the analysis was carried out using R version 3.1.0 and Bioconductor version 2.14, using R‑studio version 0.98 as a graphical user interface [25,26]. Quality checking of the arrays was not carried out, as the objective of the study was to test the robustness of the analysis. The data from HGU133aplus2 and HGFocus arrays was normalized using rma, gcrma and farms as described previously [16]. The E-GEOD-40725 is an exon dataset and could only be normalized using rma and the E-GEOD-43458 dataset could only be normalized with rma and farms. This is because feature tables for the other normalization methods are not available for some array designs. After normalization differential expression was calculated using limma for the multiple testing correction and Bayesian fitting on the complete dataset [27]. Pathway analysis was carried out against the Reactome database using ReactomePA [28].