2.2. DNA Methylation Measurements and Pre-Processing The dataset contains methylation data extracted using the more recent chip of Illumina, i.e., the Infinium Human Methylation 450K BeadChip, that includes 485,577 probes (482,421 CpG sites, 3091 non-CpG sites and 65 random SNPs) (see Table 2 for the genomic distribution of CpGs classified in different groups: promoter, body, 3′UTR and intergenic [26]). Methylation of each CpG site in this chip is measured based on two channel intensities IMeth and IUn-Meth, available for all probes, similarly with the two-channel arrays in transcriptome analysis. To date, two methods are used to measure DNA methylation. The first one is β-value, which is used to measure the percentage of methylation (ranging from 0 to 1). It is defined as β = IMeth/(IMeth + IUn-Meth). β-value, possesses an intuitively direct, biological interpretation, expressing roughly the amount of CpG methylation measured in the collected DNA extracted from a biological sample (population of cells), as percentage of methylation. The second method applied is the widely known from its use in gene expression microarray analysis, M-value [38]. The logarithmic ratio of the methylated versus the unmethylated signal intensities, quoted as M-value, M = log(IMeth/IUn-Meth), is used by this method as a DNA methylation measurement estimate. The M-value, used here to measure methylation of CpG sites, expresses the part of the given, probed epigenomic region, which has evaded methylation. It is also statistically more valid and applicable in differential and other statistical analyses compared to the alternative methylation metric of β-value, as it is approximately homoscedastic [38]. In agreement with the well-established statistical strategy for the adoption of transformations that render the data distributions symmetric, that is widely applied in the processing of microarray data, M-values were used to measure and correct DNA methylation signal intensities. This has been done using a novel intensity based method previously presented by authors in [34], which additionally uses quality controls (QCs), corresponding to the same technical control sample embedded in the methylation arrays. M-signal distribution is normalized, taking into account the average intensity level of both channels I = 0.5 × log(IMeth × IUn-Meth) and the variation of QCs incorporated in each chip which are used to estimate error estimates. microarrays-04-00647-t002_Table 2 Table 2 Genomic distribution of CpGs classified in different groups: promoter, body, 3′UTR and intergenic [26]. The normalization takes place in two successive steps: (i) within-chip; and (ii) across all probes. Normalization within chip incorporates the calculation of an error estimator across all intensity levels after partitioning the intensity space I into percentiles. The probe estimates for all arrays are then updated: the corrected M-value of a probe results by subtracting the respective error calculated for the corresponding intensity level. Normalization across all probes is applied next exploiting the standard deviation of the M-values for each probe across all QCs. The M-values of each probe across all samples are then updated, by subtracting the probe based error estimate. In whole, the impact of technical bias in the signal estimates is mitigated through the consecutive normalization steps as already shown in [34].