2.4.1. Evolutionary Selection of CpG Sites This feature selection method uses a genetic algorithm (GA) [40] to select the most robust features that feed an internally used, 12-nn weighted classifier [41]. GA mimics the mechanisms of natural evolution and applies biologically inspired operators on a randomly chosen initial population of candidate solutions, in order to optimize a fitness criterion. In the current study, an initial population of 500 random chromosomes is created during the first step of the GA. Each chromosome, representing a candidate solution for the feature selection task, is a binary mask of N binary digits (0 or 1), equal in number to the dimensionality of the complete features set, that reflect the use (or not) of the corresponding feature. Each chromosome is restricted to be a binary mask, corresponding to a solution that includes a subset of features, which number lies in a predefined range. The genetic operators of selection, crossover and mutation are then applied to the initial generation of chromosomes. The selection operator selects, through a roulette wheel scheme, the chromosomes to participate in the operators of crossover and mutation, based on the fitness value of each chromosome previously calculated, i.e., the total accuracy (number of samples correctly classified) that the corresponding feature subset yields using the 12-nn weighted classifier on a three-fold cross validation basis. The crossover operator mates random pairs of the selected chromosomes with probability Pc=0.5 based on a uniform crossover operator, while bits within a chromosome are mutated (switched from 0 to 1 or vice-versa) with probability Pm = 1/N when the mutation operator is applied. The whole procedure is repeated until the stopping criterion of attaining a maximum number of generations is reached and the best performing chromosome in the last generation is the one that represents the finally chosen feature subset. The k-nn weighted classifier was used internally in the evolutionary algorithm due to the rather low computational cost it raises, compared to other alternatives e.g., artificial neural network, and the need for executing and evaluating the classifier for a large number of rounds within feature selection algorithm. The setting k = 12 comes for that the least number of samples in the testing sets used is 13, so we would like the classifier (optimized here and applied later on the testing sets) not to examine a neighbor of samples greater in size than that of one class.