2.4.2. GORevenge-based Selection of CpG Sites As we aimed at the application of an orthogonal to the evolutionary feature selection methodology, we incorporated to our study the selection of CpG sites from the pool of genes corresponding to the initially pre-selected CpG sites, exploiting their putative functional role through GORevenge (Available online: www.grissom.gr/gorevenge, [33]). Starting from a list of genes/gene ontology (GO) terms, GORevenge exploits the functional information included in the GO tree semantics and outputs a series of functionally related genes/GO terms. The finally selected genes/GO terms may be possibly not included in the inputted list; thus, it can aid the elucidation of hidden functional regulatory effects among genes and can therefore promote a system’s level interpretation. GORevenge uses a stepwise mechanism, starting from the initially considered genes set or GO terms. In the first phase, genes are collected, not only when linked to a given GO term, but also to its neighboring ones, i.e., its parents and children GO terms. These genes are considered to belong to the same functional clique, which is defined by the use of distance based functional similarity criteria. For genes that are annotated by several terms, a pruning phase follows, where GO terms are eliminated when the in-between distance of those terms falls under certain similarity distance. GOrevenge incorporates Resnik semantic similarity metrics [42] and is able to probe specific categories of the GO, i.e., molecular function (MF), biological process (BP), and cellular component (CC). Finally, a prioritized list of genes is exported based on the GOs linked to them, after the pruning stage, thus measuring the centrality of the genes in the functional mechanism as proposed by initial genes/GOs. GORevenge was applied here, using as input to the algorithm, the set of unique genes related to the pre-selected in terms of statistics CpG sites. Specifically, a set of unique 3415 genes ids were derived based in the pre-selected 5719 CpG sites and the annotation of the Infinium Human Methylation 450K BeadChip. We applied GORevenge using as input the set of 3415 genes. Resnik semantic similarity metric, Bubble genes algorithm, and a relaxation equal to 0.15 were used as algorithm parameters (see [33] for more details on the parameters). We retrieved 235 and 210 genes included in the list of genes submitted, using BP and MF functional aspects in the algorithm, respectively. The resulting genes correspond to a total of 249 unique gene ids, which in turn correspond to a total of 352 CpG sites based on the annotation of the chip. These features represent a small, conclusive set of simultaneously significantly differentiated and with high regulatory impact (they are linked with the highest number of distinct, highly non-overlapping, biological processes) genes, which may reliably be used for the training of predictive classifiers.