4. Conclusions The analysis shows that there is a clear genetic demarcation between healthy lung tissue and NSCLC which survives different datasets and methods of normalization. The results for the different NSCLC phenotypes (adenocarcinoma and squamous cell carcinoma) are much less clear. A comparison of the pathways which are overrepresented between the NSCLC phenotypes with normal lung tissue provides a clear core of altered cellular functions. However, when comparing the NSCLC phenotypes themselves, it is not possible to draw a clear and consistent delineation between their transcriptomes across all of the datasets. Most of the common pathway differences between the NSCLC phenotypes concern extra-cellular processes, and these are more likely to be associated with metastasis than phenotype differentiation. A possible distinguishing feature is the lipid biosynthetic pathways identified in the E-GEOD-40275 dataset, but these pathways were not reproduced in the other datasets. The small numbers of pathways that seem to distinguish adenocarcinoma and squamous cell carcinoma suggests that the histological differences might not be as strongly represented at the genetic level. This result agrees with a previous study that has shown that there are distinct sub-groups within adenocarcinoma and a second study that suggested that there might be overlap in gene expression between the adenocarcinoma and squamous cell carcinoma groups, such that alternative sub-classes to those identified by histology might exist at the genetic level [16,49]. Many of the pathways that have been identified as targets in this study have previously been associated with cancer, either in lung cancer or in other tissues. These include the regulation of cell cycle and mitosis. Regulation of rRNA expression by SIRT1 and of histone methylation by PRC2 as well as the WNT signaling pathways are specific targets that distinguish healthy and tumor cells. The Amyloid pathway is an interesting additional target, because of its association with other age related diseases. Future experiments need to focus on including a larger number of participants in order to sample disease diversity more effectively. Current studies have also not considered experimental repeats to reduce the effects of between sample variations. These are important when between subject variability is high because a component of this might be due to problems with experimental precision. In addition to new experimental results a complete meta-analysis of all of the available NSCLC data would improve the statistical power and reliability in predicting differentially expressed genes and pathways. This analysis would include the large datasets that are available from the Cancer Genome Atlas, which contain data for another 1000 cases. However, a serious challenge in incorporating these data is that they come from a number of alternative platforms and so combining them is difficult.