PMC:4573573 / 13468-18602
Annnotations
{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/4573573","sourcedb":"PMC","sourceid":"4573573","source_url":"https://www.ncbi.nlm.nih.gov/pmc/4573573","text":"2.3. Stage 2: Finding the Most Correlated Path\nIt is proposed that the most correlated path among the list of candidate genes identified in the previous stage can be found optimally. To this end, the optimization problem identified in the literature as the Travelling Salesman Problem (TSP), is introduced here as a viable model.\nThe TSP is generally stated as follows: a salesman needs to visit n cities and needs to minimize the travel distance starting and finishing in the city of origin. Each city must be visited only once. The solution, then, is a tour. In n cities, there is a total of n! tours. If a particular city of origin is selected a priori, then the number of tours is (n-1)!. In our case, the objective is to find the tour among n genes of interest that maximizes the sum of the absolute values of pairwise correlations. This tour would then be interpreted as a surrogate for a biological pathway, defined as “a series of actions among molecules in a cell” [6], and more specifically for a genetic signaling pathway. A biological pathway “can provide clues about what goes wrong when a disease strikes.” [6].\nAs a first approximation, it is proposed that the absolute values of linear correlation coefficients computed among a list of genes of potential biomarkers be used to construct networks such as the one presented in Figure 4, where the TSP can be readily applied. The idea of using a linear statistical correlation is, indeed, widely used in the literature as a means to uncover genetic coexpression. This information, in turn, should help cancer researchers in understanding the disease as well as look for targeted treatments. The paper by Kumari et al. [7] has studied different coexpression measurements, recommending to carry out a preliminary study to determine the most appropriate one for different objectives. It is, then, convenient at this point to resort to the use of the Pearson correlation coefficient as a starting point in this work.\nFigure 4 Representation of the many options for a cyclic path for 5 genes. The TSP can, indeed, be understood as an optimization problem. Consider that cij represents the cost of traveling from city i to city j and let yij be a binary variable, indicating whether or not the salesman travels from city i to city j. Additionally let us define flow variables xij on each arc (i,j) and assume that the salesman has n-1 units available at node 1, which is arbitrarily selected as a “source node”, and he must deliver 1 unit to each of the other nodes [7]. The optimization model is as follows: (5a) Minimize ∑(i,j)∈Acijyij (5b) ∑1≤j≤nyij=1 ∀i=1,2,…,n (5c) ∑1≤i≤nyij=1 ∀ j=1,2,…,n (5d) Nx=b (5e) xij≤(n−1)yij ∀(i,j)∈A (5f) xij≥0 ∀(i,j)∈A (5g) yij=0 or 1 ∀(i,j)∈A\nFollowing the description in [8], let A’ = {(i,j): yij =1} and let A’’ ={(i,j): xij \u003e0}. The constraints (5b) and (5c) imply that exactly one arc of A’ leaves and enters any node i; therefore, A’ is the union of node disjoint cycles containing all of the nodes of N. In general, any integer solution satisfying (5b) and (5c) will be a union of disjoint cycles; if any such solution contains more than once cycle; they are referred to as subtours, since they pass through only a subset of nodes.\nIn constraint (5d) N is an nxm matrix, called the node-arc incidence matrix of the minimum cost flow problem. Each column Nij in the matrix corresponds to the variable xij. The column Nij has a +1 in the ith row, a −1 in the jth row; the rest of its entries are zero. Constraint (5d) ensures that A” is connected since we need to send 1 unit of flow from node 1 to every other node via arcs in A”. The forcing constraints (5e) imply that A” is a subset A’. These conditions imply that the arc set A’ is connected and thus cannot contain subtours [8].\nThe TSP is known to be a hard problem to solve to optimality; however, with a manageable number of entities (nodes) optimality is well within reach. In our group’s experience it has been possible to obtain the optimal TSP tour with a list of up to 100 genes in less than 1 hour of computing time in a personal computer. The Branch and Bound—an exact algorithm—was used to this end, as coded in Matlab. An exact algorithm is defined as one capable to arrive to a global optimal solution—provided that one exists—with certainty. Although it is also possible to use heuristics to approach the TSP, it must be understood that a heuristic method by definition does not provide certainty on arriving to a global optimal solution.\nReferring back to Figure 4, it should be now apparent that in n genes associated to the nodes in the network, it is possible to obtain pairwise correlations to connect all genes among them resulting in a fully connected network. This network, in turn, can be mathematically translated into formulation (5a)–(5g) to identify the most correlated path. Thus, at the end of this stage, the most correlated path among all candidate genes from Stage 1, will be available as a proxy for a signalling path. The application of this two-stage analysis pipeline is demonstrated next in the context of cervix cancer.","divisions":[{"label":"Title","span":{"begin":0,"end":46}},{"label":"Figure caption","span":{"begin":1976,"end":2053}}],"tracks":[{"project":"2_test","denotations":[{"id":"26388997-23226279-69475717","span":{"begin":1682,"end":1683},"obj":"23226279"},{"id":"26388997-23226279-69475717","span":{"begin":1682,"end":1683},"obj":"23226279"},{"id":"26388997-23226279-118161786","span":{"begin":2527,"end":2528},"obj":"23226279"},{"id":"26388997-23226279-118161786","span":{"begin":2527,"end":2528},"obj":"23226279"},{"id":"T14627","span":{"begin":1682,"end":1683},"obj":"23226279"},{"id":"T41227","span":{"begin":1682,"end":1683},"obj":"23226279"},{"id":"26388997-23226279-69475718","span":{"begin":2527,"end":2528},"obj":"23226279"},{"id":"26388997-23226279-69475718","span":{"begin":2527,"end":2528},"obj":"23226279"}],"attributes":[{"subj":"26388997-23226279-69475717","pred":"source","obj":"2_test"},{"subj":"26388997-23226279-69475717","pred":"source","obj":"2_test"},{"subj":"26388997-23226279-118161786","pred":"source","obj":"2_test"},{"subj":"26388997-23226279-118161786","pred":"source","obj":"2_test"},{"subj":"T14627","pred":"source","obj":"2_test"},{"subj":"T41227","pred":"source","obj":"2_test"},{"subj":"26388997-23226279-69475718","pred":"source","obj":"2_test"},{"subj":"26388997-23226279-69475718","pred":"source","obj":"2_test"}]}],"config":{"attribute types":[{"pred":"source","value type":"selection","values":[{"id":"2_test","color":"#ec93e3","default":true}]}]}}