PMC:1570465 / 19299-32624 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"16916460-10743561-1687081","span":{"begin":642,"end":644},"obj":"10743561"}],"text":"Graph pruning techniques\nIn this section, we introduce a number of successively more powerful optimality-preserving dead-end elimination (DEE) techniques for pruning graphs corresponding to motif finding problems. The basic idea is to discard vertices and/or edges that cannot possibly be part of the optimal solution.\n\nBasic clique-bounds DEE\nThe idea of our first pruning technique is as follows. Suppose there exists a clique of weight C* in G. Then a vertex u, whose participation in any possible clique in G reduces the weight of that clique below C*, is incompatible with the optimal alignment and can be safely eliminated (similar to [36]).\nFor u ∈ Vi define star(u) to be a selection of vertices from every graph part other than Vi. Let Fu be the value induced by the edge weights for a star(u) that form best pairwise alignments with u:\nF u = ∑ j ≠ i max ⁡ v ∈ V j   w u v       ( 1 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaWgaaWcbaGaemyDauhabeaakiabg2da9maaqafabaWaaCbeaeaacyGGTbqBcqGGHbqycqGG4baEaSqaaiabdAha2jabgIGiolabdAfawnaaBaaameaacqWGQbGAaeqaaaWcbeaakiabbccaGaWcbaGaemOAaOMaeyiyIKRaemyAaKgabeqdcqGHris5aOGaem4DaC3aaSbaaSqaaiabdwha1jabdAha2bqabaGccaWLjaGaaCzcamaabmGabaGaeGymaedacaGLOaGaayzkaaaaaa@4A63@\nIf u were to participate in any clique in G, it cannot possibly contribute more than Fu to the weight of the clique. Similarly, let be the value of the best possible star(u) among all u ∈ Vi:\nF i ∗ = max ⁡ u ∈ V i F u       ( 2 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaOGaeyypa0ZaaCbeaeaacyGGTbqBcqGGHbqycqGG4baEaSqaaiabdwha1jabgIGiolabdAfawnaaBaaameaacqWGPbqAaeqaaaWcbeaakiabdAeagnaaBaaaleaacqWG1bqDaeqaaOGaaCzcaiaaxMaadaqadiqaaiabikdaYaGaayjkaiaawMcaaaaa@41EF@\nFi∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaaa@3038@ is an upper bound on what any vertex in Vi can contribute to any alignment.\nNow, if Fz, the most a vertex z ∈ Vk can contribute to a clique, assuming the best possible contributions from all other graph parts, is insufficient compared to the value C* of an existing clique, i.e. if\nF z \u003c 2 × C ∗ − ∑ i ≠ k F i ∗ ,       ( 3 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaWgaaWcbaGaemOEaOhabeaakiabgYda8iabikdaYiabgEna0kabdoeadnaaCaaaleqabaGaey4fIOcaaOGaeyOeI0YaaabuaeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaqaaiabdMgaPjabgcMi5kabdUgaRbqab0GaeyyeIuoakiabcYcaSiaaxMaacaWLjaWaaeWaceaacqaIZaWmaiaawIcacaGLPaaaaaa@4575@\nz can be discarded. The clique value C* is used with a factor of 2 since two edges are accounted for between every pair of graph parts in the above inequality.\nIn fact, the values of Fi∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaaa@3038@ are further constrained by requiring a connection to z when z is under consideration. That is, when considering a node z ∈ Vk to eliminate, and calculating Fi∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaaa@3038@ according to Equation 2 among all possible u ∈ Vi, the Fu of Equation 1 is instead computed as:\nF u = w z u + ∑ j ≠ i , k max ⁡ v ∈ V j   w u v       ( 4 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaWgaaWcbaGaemyDauhabeaakiabg2da9iabdEha3naaBaaaleaacqWG6bGEcqWG1bqDaeqaaOGaey4kaSYaaabuaeaadaWfqaqaaiGbc2gaTjabcggaHjabcIha4bWcbaGaemODayNaeyicI4SaemOvay1aaSbaaWqaaiabdQgaQbqabaaaleqaaaqaaiabdQgaQjabgcMi5kabdMgaPjabcYcaSiabdUgaRbqab0GaeyyeIuoakiabbccaGiabdEha3naaBaaaleaacqWG1bqDcqWG2bGDaeqaaOGaaCzcaiaaxMaadaqadiqaaiabisda0aGaayjkaiaawMcaaaaa@5212@\nThe value of C* can be computed from any \"good\" alignment. We use the weight of the clique imposed by the best overall star.\n\nTighter constraints for clique-bounds DEE\nFor a vertex u ∈ Vi and every other Vj, an edge has to connect u to some v ∈ Vj in any alignment. When calculating Fu, we can constrain its value by considering three-way alignments and requiring that the vertices in the best star(u) chosen as neighbors of u in graph parts other than Vj are also good matches to v. Performing this computation for every pair of u, Vj and considering every edge incident on u would be too costly. Therefore, we only consider such three-way alignments for every vertex u ∈ Vi and the next part Vi+1 of the graph (with the last and first parts paired). Essentially, this procedure shifts the emphasis onto edges, allowing better alignments and bounds, and yet eliminates vertices by considering the best edge incident on it.\nFor a given edge (u, v) with endpoints u ∈ Vi and v ∈ Vi+1 we consider an adjacent double star with two centers at u and v, and sharing all the endpoints xj in the other graph parts, denoted as dstar(u, v); the weight of such a dstar(u, v) is 2wuv+∑j≠ij≠i+1(wuxj+wvxj) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqaIYaGmcqWG3bWDdaWgaaWcbaGaemyDauNaemODayhabeaakiabgUcaRmaaqadabaGaeiikaGIaem4DaC3aaSbaaSqaaiabdwha1jabdIha4naaBaaameaacqWGQbGAaeqaaaWcbeaakiabgUcaRiabdEha3naaBaaaleaacqWG2bGDcqWG4baEdaWgaaadbaGaemOAaOgabeaaaSqabaGccqGGPaqkaSqaauaabeqaceaaaeaacqWGQbGAcqGHGjsUcqWGPbqAaeaacqWGQbGAcqGHGjsUcqWGPbqAcqGHRaWkcqaIXaqmaaaabaaaniabggHiLdaaaa@4EE6@. Now consider a clique {u1 ∈ V1, ..., uN ∈ VN} of some value C*, and the sum of its double stars:\n∑ i = 1 N ( 2 w u i u i + 1 + ∑ j ≠ i j ≠ i + 1 ( w u j u i + w u j u i + 1 ) ) = 2 ∑ i = 1 N ∑ j ≠ i w u j u i       ( 5 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeWbqaaiabcIcaOiabikdaYiabdEha3naaBaaaleaacqWG1bqDdaWgaaadbaGaemyAaKgabeaaliabdwha1naaBaaameaacqWGPbqAcqGHRaWkcqaIXaqmaeqaaaWcbeaakiabgUcaRmaaqafabaGaeiikaGIaem4DaC3aaSbaaSqaaiabdwha1naaBaaameaacqWGQbGAaeqaaSGaemyDau3aaSbaaWqaaiabdMgaPbqabaaaleqaaOGaey4kaSIaem4DaC3aaSbaaSqaaiabdwha1naaBaaameaacqWGQbGAaeqaaSGaemyDau3aaSbaaWqaaiabdMgaPjabgUcaRiabigdaXaqabaaaleqaaOGaeiykaKIaeiykaKcaleaafaqabeGabaaabaGaemOAaOMaeyiyIKRaemyAaKgabaGaemOAaOMaeyiyIKRaemyAaKMaey4kaSIaeGymaedaaaqab0GaeyyeIuoaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd6eaobqdcqGHris5aOGaeyypa0JaeGOmaiZaaabCaeaadaaeqbqaaiabdEha3naaBaaaleaacqWG1bqDdaWgaaadbaGaemOAaOgabeaaliabdwha1naaBaaameaacqWGPbqAaeqaaaWcbeaaaeaacqWGQbGAcqGHGjsUcqWGPbqAaeqaniabggHiLdaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGobGta0GaeyyeIuoakiaaxMaacaWLjaWaaeWaceaacqaI1aqnaiaawIcacaGLPaaaaaa@7C24@\nThis sum is equal to 4C*, as each edge (ui, uj) is counted four times. We define Fuv with for an edge (u, v) with endpoints u ∈ Vi and v ∈ Vi+1 as\nFuv can be viewed as the weight of the best dstar centered at the pair of vertices u, v (or edge (u, v)) and it is the best possible contribution to any alignment, if the edge (u, v) was required to be a part of the alignment. We define Fu for u ∈ Vi and Fi∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaaa@3038@ for part i similarly to the above definitions as\nF u = max ⁡ v ∈ V i + 1 F u v       ( 7 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaWgaaWcbaGaemyDauhabeaakiabg2da9maaxababaGagiyBa0MaeiyyaeMaeiiEaGhaleaacqWG2bGDcqGHiiIZcqWGwbGvdaWgaaadbaGaemyAaKMaey4kaSIaeGymaedabeaaaSqabaGccqWGgbGrdaWgaaWcbaGaemyDauNaemODayhabeaakiaaxMaacaWLjaWaaeWaceaacqaI3aWnaiaawIcacaGLPaaaaaa@446A@\nF i ∗ = max ⁡ u ∈ V i F u       ( 8 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaOGaeyypa0ZaaCbeaeaacyGGTbqBcqGGHbqycqGG4baEaSqaaiabdwha1jabgIGiolabdAfawnaaBaaameaacqWGPbqAaeqaaaWcbeaakiabdAeagnaaBaaaleaacqWG1bqDaeqaaOGaaCzcaiaaxMaadaqadiqaaiabiIda4aGaayjkaiaawMcaaaaa@41FB@\nFu is the value of the best dstar centered on vertex u ∈ Vi and some vertex v ∈ Vi+1, and Fi∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaaa@3038@ is the value of the best dstar centered on any pair of vertices u ∈ Vi and v ∈ Vi+1.\nFor any clique {u1 ∈ V1, ..., uN ∈ VN} of value C* in the graph, by Equations 5–8 we have\n4 C ∗ ≤ ∑ i = 1... N F u i u i + 1 ≤ ∑ i = 1... N F u i ≤ ∑ i = 1... N F i ∗       ( 9 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqaI0aancqWGdbWqdaahaaWcbeqaaiabgEHiQaaakiabgsMiJoaaqafabaGaemOray0aaSbaaSqaaiabdwha1naaBaaameaacqWGPbqAaeqaaSGaemyDau3aaSbaaWqaaiabdMgaPjabgUcaRiabigdaXaqabaaaleqaaaqaaiabdMgaPjabg2da9iabigdaXiabc6caUiabc6caUiabc6caUiabd6eaobqab0GaeyyeIuoakiabgsMiJoaaqafabaGaemOray0aaSbaaSqaaiabdwha1naaBaaameaacqWGPbqAaeqaaaWcbeaaaeaacqWGPbqAcqGH9aqpcqaIXaqmcqGGUaGlcqGGUaGlcqGGUaGlcqWGobGtaeqaniabggHiLdGccqGHKjYOdaaeqbqaaiabdAeagnaaDaaaleaacqWGPbqAaeaacqGHxiIkaaaabaGaemyAaKMaeyypa0JaeGymaeJaeiOla4IaeiOla4IaeiOla4IaemOta4eabeqdcqGHris5aOGaaCzcaiaaxMaadaqadiqaaiabiMda5aGaayjkaiaawMcaaaaa@6583@\nThen Equation 3, with 2C* replaced by 4C*, can be used to eliminate vertices in the same way as before, eliminating a vertex z in a particular graph part if Fz, the value of its best adjacent dstar, is insufficient considering best possible contributions from all other graph parts. For best pruning results the value of C* should be as high as possible; we choose C* as the clique weight induced by the best overall dstar.\n\nGraph decomposition\nWe also use a divide-and-conquer graph decomposition approach for pruning vertices. For every graph part i and vertex u ∈ Vi we consider induced subgraphs Gu = (Vu, Eu) in turn, where Vu = u ∪ V\\Vi. Application of the clique-bounds DEE technique to graphs Gu is very effective since one of the graph parts, Giu MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGhbWrdaqhaaWcbaGaemyAaKgabaGaemyDauhaaaaa@30BE@ contains only one vertex, u, and all the F and F* values that need to be recomputed for the new graph Gu are greatly constrained. The process of updating the F and F* values is efficient as the changes are localized to one part in the graph. Importantly, the best known clique value C* remains intact, since the clique of that larger value exists in the original graph and can be used for the decomposed one, helping to eliminate vertices. For some of the vertices u, iterative application of the DEE criterion and re-computation of the F and F* values causes Gu to become disconnected, implying that vertex u cannot be part of the optimal alignment. Such a vertex u is marked for deletion, and that information is propagated to all subsequently considered induced subgraphs, further constraining the corresponding F and F* values and helping to eliminate other vertices in turn."}