3.2. Stage 2 Correlation is used in this project as a proxy for inhibitory or excitatory behavior between differences in the expression levels of two genes. As a first step, the linear correlation values between potential biomarkers are obtained. The following step was to arrange the correlation values in a matrix. To construct this matrix, first the differences between control and cancer tissues had to be calculated for each gene. Then, the absolute values of the correlation coefficients were calculated among each pair of genes based on these differences and stored in the said matrix. The absolute correlation values were consequently associated to the arcs in a fully connected graph with nodes representing potential biomarker genes. The resulting graph made possible the use of the formulation of the TSP. The optimal solution to this particular TSP is the tour among the genes of interest with the largest possible correlation, or similarly, the most correlated cyclic path as shown in Figure 5. It must be recalled at this point that there are a total of 28! ≈3.04 ×1029  ways in which a cyclic path can be drawn among the 28 genes. The TSP formulation allows a wide range of analyses. In this case, the idea was to test the stability of the TSP solutions. In order to do so, TSP solutions were obtained using increasing numbers of potential biomarkers in the list of genes presented in Table 1 following the increasing order of the efficient frontier in which these were found. Starting with five genes, each time five more genes were introduced until the list was depleted on each case. Path segments that persisted across both databases were identified. Furthermore, path segments that persisted along the entire study were deemed the most stable. The results of this study were then matched against known biological pathways publicly available in the Kyoto Encyclopedia of Genes and Genomes (KEGG) [49]. A python script was written to make this process more efficient. This script is provided in Appendix B. Table 2 summarizes the results for each progressive analysis that introduced five genes at a time. As shown in Table 2, (LRPPRC with MTHFD2) and (RPN1 with COX17) are adjacent in the correlated cyclic path when the optimal solution is obtained for 25 and 28 genes. In addition, gene S100A14 is adjacent to TSPO when the optimal solution for 5, 15, 20, and 25 genes is found. Figure 5 Highest Correlated Cyclic Path among the 28 genes identified in Stage 1. microarrays-04-00287-t002_Table 2 Table 2 Adjacent genes in the solutions for the correlated cyclic path found adding five genes at a time. A search for biological pathways in KEGG databases was conducted, however not every gene could be linked to a pathway. When comparing the known biological pathways with the obtained optimal solutions, for database GSE 7803 [9] and GSE 9750 [10], the only genes that appeared adjacent in the correlated cyclic path for both were COX17 with RPN1 and the only KEGG pathway common to both has the identifier 01100 that corresponds to the collection of Methabolic pathways. On the other hand, for database GSE 7803 [9], medium correlation was observed between genes HNRPA3 with BUD31, and both gene products are present in KEGG pathway 03040 that corresponds to the splisosome. For database GSE 9750 [10], PLCE1 is adjacent to PIK3R1, both gene products share the following KEGG pathways: 04012 that corresponds to the ErbB signaling pathway, 04015 Ras signaling pathway, 04015 Rap1 signaling pathway, 04066 HIF-1 signaling pathway, 04070 Phosphatidylinositol signaling system, 04370 VEGF signaling pathway, 04650 Natural killer cell mediated cytotoxicity, 04660 T cell receptor signaling pathway, 04664 Fc epsilon RI signaling pathway, 04666 Fc gamma R-mediated phagocytosis, 04670 Leukocyte transendothelial migration, 04722 Neurotrophin signaling pathway, 04750 Inflammatory mediator regulation of TRP channels, 04919 Thyroid hormone signaling pathway, 05169 Epstein-Barr virus infection, 05200 Pathways in cancer, 05200 Pathways in cancer, 05214 Glioma, 05223 Non-small cell lung cancer, and the 05231 KEGG pathway that corresponds to Choline metabolism in cancer. microarrays-04-00287-t003_Table 3 Table 3 Selected genes localization. In cancer there are chromosomal physical changes that produce gains or losses of certain genes. To explore if the position of the genes in the cyclic path could also provide information about these chromosomal changes, the location of each gene was consider (this information was obtained from [50]). This information is listed in Table 3. All chromosomes in Table 3 have been reported as having changes in cervical cancer, in regions close to the ones where the selected genes belong. It is interesting to note that some of the genes that are neighbors in the cyclic path are also neighbors in their genetic localization. HIPK1, NUCKS1, SMG7, and CRABP2 are all in chromosome 1, the first two genes of the list are adjacent in the cyclic path and the others are scatter through the cycle. Reported changes in chromosome 1 in cervical cancer include: gains in the 1p region [51,52,53], increment on the 1q32.1–32.2 genes expression [44], aneusomy of the chromosome [54] among others. Three genes are in chromosome 2, HNRNPA3, LRPPRC and MTHFD2. There are several changes in chromosome 2 related to cervical cancer, for example reduced expression of genes in 2p has been reported [55], it has also been reported that deletions of the 2q33–q37 are common in cervical carcinoma [56] as well as loss of heterozygosity at 2q35–q37.1 [57]. COX17, RNP1, MAP4, and SMC4 (separated by three genes from the group), and ATP2C (adjacent to SMC4) are all in chromosome 3. Changes in chromosome 3 have been extensively reported for cervical cancer. Gain of chromosome 3q has been reported in pre-cancer and cancer of the cervix (these are some of the reports: [58,59,60,61]) while loss of 3p12-p14 has also been observed [62] and loss of heterozygosity on chromosome 3p has been also reported in this cancer [55]. C5orf22 and PIK3R1 are both in chromosome 5. Chromosome 5 is known to have alterations in cervical cancer [61,63,64,65]. BUD31 and PRSS2 belong to chromosomes 7, there are known changes of this chromosome in cervical cancer [66,67,68]. SNTG1 and RAD21 are in chromosome 8, examples of reported changes in this chromosome can be found in: [69,70,71,72]. Genes SSX1 and GRIA3 are both in X chromosome. Examples of the association of changes in chromosome X in cervical cancer can be found in [73,74,75]. Genes PFDN4, CASP1, PLCE1, ICOSLG, GADD45G, SMARCA2, and TSPO are located in different chromosomes, and there are reports for changes in each one of these chromosomes in cervix cancer, for examples the reader is refer to: [10,61,76,77,78,79,80,81,82,83,84,85,86,87,88,89]. The results suggest that the chromosomal gains and losses known for cervical cancer could include bigger regions. It is clear that true experimental validation is critical to further support the results of the proposed pipeline analysis at this point. It is also important, however, to notice its potential for biological discovery. Every time that a biological pathway is discovered, it basically is a problem of selecting a path by systematically choosing pairs of genes with scientific basis. If a mathematical point of view is adopted, this practice implies that the solution is built heuristically as opposed to optimally. This insight has important implications for the adoption of optimization methods in Medicine and Biology.