3.2.1. Pathways that Are Differentially Expressed between Normal Lung Tissue and Tumors Two of the datasets E-GEOD-18842 and E-GEOD-19188 compare normal lung tissue to NSCLC or tumor tissue, respectively. Of these E-GEOD-18842 has a much better defined list of differentially expressed genes. Again this is going to partly be a result of the larger size of the dataset as this results in a much lower standard error in the expression levels of the comparison groups and an increase in power of the experiment, but noise is also a factor as the final list contains almost all of the genes on the array. A cut-off had to be used for E-GEOD-19188 and only the top 2000 genes were used for pathway analysis. From the E-GEOD-18842 the three different normalization methods produce a very similar pathway analysis. The rma and farms normalized data give identical pathway analysis and identify 16 significant pathways (Figure 3). The results for the gcrma normalized data contain 12 out of the 16 previously identified pathways: deposition of new CENPA-containing nucleosomes at the centromere; telomere maintenance; unwinding of DNA and nucleosome assembly are absent, and separation of sister chromatids; resolution of sister chromatid cohesion; E2F mediated regulation of DNA replication; mitotic metaphase and anaphase; mitotic anaphase; DNA strand elongation and G2/M checkpoints are added. The results from E-GEOD-19188 are less definitive (Figure 4). From the rma and gcrma normalized data over 40 pathways are identified as significant. There is considerable overlap between these two pathway sets with 37 of the top 40 pathways being conserved between the two analyses. The farms normalized data only yields 29 significant pathways and some of those identified are notably different from the other results, especially the pathways associated with erythrocytes that are absent from the other pathway analyses (O2/CO2 exchange in erythrocytes, hemostasis, erythrocytes take up oxygen and release carbon dioxide, erythrocytes take up carbon dioxide and release oxygen). Associations between hemostasis, platelets and cancer were proposed in the 1980s, but these associations have only recently become the focus of renewed attention for pathway analysis [29,30,31]. Figure 3 Pathway Analysis for E-GEOD-18842, Normal-NSCLC (left) After rma normalization; (right) After gcrma normalisation. The overlapping pathways between the E-GEOD-18842 and E-GEOD-19188 contain many of the usual suspects, such as the control of mitosis and control of the cell cycle including checkpoints. The E2F mediated regulation of DNA replication has also been identified as important in a number of different cancers, as this transcription factor regulates cyclin E [32]. Two other pathways of interest are the Polo-like kinase mediated events and the activation of ATR in response to replication stress. Polo‑like kinase (PLK1) is a DNA damage checkpoint and it has been a target for cancer therapy [33,34]. Ataxia-telangiectasia and Rad3-related protein (ATR) halts DNA replication when replication forks are stalled and need to be repaired. The absence of gene creates fragile sites in the chromosome. ATR has been targeted as a possible cancer preventative or in boosting the effectiveness of existing therapies [35,36]. Figure 4 Pathway Analysis for E-GEOD-19188, Healthy-Tumor (left) After rma normalization; (right) After farms normalization. 3