PMC:1592098 / 74741-86947
Annnotations
{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1592098","sourcedb":"PMC","sourceid":"1592098","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1592098","text":"Calculating the shared leaf set sizes\nThe algorithms presented above all rely on O(1) time access to the size of the shared leaf set |F ∩ G| for any pair of subtrees F of T and G of T', where F and G each has size at least two, i.e. contains more than a single leaf. We will refer to |F ∩ G| as the intersection size of subtree F and G. In [9] and O(n2) time and space algorithm is presented for computing the intersection sizes of all pairs of subtrees F and G of two binary input trees. A straightforward generalization of this algorithm to two input trees T and T' of arbitrary degrees results in an O(n2dd') time and O(n2) space algorithm, which gives a worst case running time of O(n4).\nIn this section we will present an improved algorithm for computing the intersection sizes of all pairs of subtrees of T and T' which runs in time O(n + |V||V'|) and space O(|V||V'|). We will assume that the size of each subtree F of T and G of T', i.e. |F| and |G|, is available in time O(1). This can be achieved as presented in the next section. Our algorithm for computing all intersection sizes is as follows. Choose an arbitrary node r in T and an arbitrary node r' in T'. Rooting the trees in r and r' respectively gives rise to two rooted trees Tr and T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@, Fig. 10 shows an example of rooting a tree. Calculating the shared leaf set sizes of Tr and T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@ and all subtrees in both trees can be done using:\nFigure 10 An arbitrary node, r, is chosen as the root in the tree, leading to a rooted tree. | T r ∩ T ′ r | = ∑ i ∑ j | F i ∩ G j | = ∑ i | F i ∩ G | = ∑ j | F ∩ G j | , ( 15 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGG8baFcqWGubavdaWgaaWcbaGaemOCaihabeaakiabgMIihlqbdsfauzaafaWaaSbaaSqaaiabdkhaYbqabaGccqGG8baFcqGH9aqpdaaeqbqaamaaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8jabg2da9maaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrcqGG8baFcqGH9aqpdaaeqbqaaiabcYha8jabdAeagjabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeiilaWcaleaacqWGQbGAaeqaniabggHiLdaaleaacqWGPbqAaeqaniabggHiLdaaleaacqWGQbGAaeqaniabggHiLdaaleaacqWGPbqAaeqaniabggHiLdGccaWLjaGaaCzcamaabmGabaGaeGymaeJaeGynaudacaGLOaGaayzkaaaaaa@6853@\nwhere Fi are all subtrees of Tr and Gj are all subtrees of T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@. This can be calculated using dynamic programming in time O(n2):\n∑ v ∈ V ∑ v ′ ∈ V ′ d v d v ′ + ∑ v ∈ V ∑ l ′ ∈ L ′ d v + ∑ l ∈ L ∑ v ′ ∈ V ′ d v ′ + ∑ l ∈ L ∑ l ′ ∈ L ′ 1 = O ( n 2 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaaqafabaGaemizaq2aaSbaaSqaaiabdAha2bqabaGccqWGKbazdaWgaaWcbaGafmODayNbauaaaeqaaOGaey4kaSYaaabuaeaadaaeqbqaaiabdsgaKnaaBaaaleaacqWG2bGDaeqaaOGaey4kaSYaaabuaeaadaaeqbqaaiabdsgaKnaaBaaaleaacuWG2bGDgaqbaaqabaaabaGafmODayNbauaacqGHiiIZcuWGwbGvgaqbaaqab0GaeyyeIuoaaSqaaiabdYgaSjabgIGiolabdYeambqab0GaeyyeIuoakiabgUcaRmaaqafabaWaaabuaeaacqaIXaqmcqGH9aqpcqWGpbWtdaqadiqaaiabd6gaUnaaCaaaleqabaGaeGOmaidaaaGccaGLOaGaayzkaaaaleaacuWGSbaBgaqbaiabgIGiolqbdYeamzaafaaabeqdcqGHris5aaWcbaGaemiBaWMaeyicI4SaemitaWeabeqdcqGHris5aaWcbaGafmiBaWMbauaacqGHiiIZcuWGmbatgaqbaaqab0GaeyyeIuoaaSqaaiabdAha2jabgIGiolabdAfawbqab0GaeyyeIuoaaSqaaiqbdAha2zaafaGaeyicI4SafmOvayLbauaaaeqaniabggHiLdaaleaacqWG2bGDcqGHiiIZcqWGwbGvaeqaniabggHiLdaaaa@74CD@\nExcept |Tr ∩ T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@|, the shared leafset sizes calculated by (15) are the shared leaf set sizes of all rooted subtrees of T and T' that do not contain the nodes r and r'. Assuming that the subtree F of T does not contain r, then the subtree F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@does contain r and similarly for r' and subtrees G and G¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGhbWrgaqeaaaa@2DDB@ of T'. The shared leaf set sizes of these trees that do contain r and r' can be calculated from the intersection sizes that we have available using (16):\n| F ¯ ∩ G | = | G | − | F ∩ G | | F ∩ G ¯ | = | F | − | F ∩ G | | F ¯ ∩ G ¯ | = n − ( | G | + | F | − | F ∩ G | ) ( 16 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqaaeWadaaabaGaeiiFaWNafmOrayKbaebacqGHPiYXcqWGhbWrcqGG8baFaeaacqGH9aqpaeaacqGG8baFcqWGhbWrcqGG8baFcqGHsislcqGG8baFcqWGgbGrcqGHPiYXcqWGhbWrcqGG8baFaeaacqGG8baFcqWGgbGrcqGHPiYXcuWGhbWrgaqeaiabcYha8bqaaiabg2da9aqaaiabcYha8jabdAeagjabcYha8jabgkHiTiabcYha8jabdAeagjabgMIihlabdEeahjabcYha8bqaaiabcYha8jqbdAeagzaaraGaeyykICSafm4raCKbaebacqGG8baFaeaacqGH9aqpaeaacqWGUbGBcqGHsisldaqadiqaaiabcYha8jabdEeahjabcYha8jabgUcaRiabcYha8jabdAeagjabcYha8jabgkHiTiabcYha8jabdAeagjabgMIihlabdEeahjabcYha8bGaayjkaiaawMcaaaaacaWLjaGaaCzcamaabmGabaGaeGymaeJaeGOnaydacaGLOaGaayzkaaaaaa@7587@\nIn other words all shared leaf set sizes can be calculated in time O(n2). First the shared leaf set sizes for subtrees that to not contain an arbitrary node r and r' from each tree are calculated in time O(n2). Then the shared leaf set sizes of subtrees that do contain r or r' (or both) are calculated constant time for each shared leaf set. Since there are O(n2) shared leaf sets the total time usage is O(n2).\nThe reduction to time O(n + |V||V'|) and space O(|V||V'|) is done by handling the cases where F, Fi, G or Gj is a leaf in a special way. For each pair of nodes v, v' we let Leaf [v, v'] be the number of leaves directly connected to v that have the same label as a leaf directly connected to v'. Leaf [v, v'] is constructable in time O(n + |V|||V'|) in the following way: First, set Leaf [v, v'] = 0 for all pairs of nodes v, v'. Given a leaf number, x, there is a unique node, node(x), in T and a unique node, node'(x), in T'. For each leaf number, x, we increment Leaf [node(x), node'(x)]. There are n such numbers, and by assumption, the leaves can be found in constant time given a number. Thus Leaf [v, v'] is constructable in time O(n + |V||V'|). We choose r and r' in T and T' and create two rooted trees Tr and T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@. The non-leaf children of r in Tr are F1,..., Fx and the non-leaf children of r' in T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@ are G1,..., Gy. The intersection size of the two trees can be defined recursively as:\n| T r ∩ T ′ r | = L e a f [ r , r ′ ] + ∑ i = 1 x | F i ∩ T ′ r | + ∑ j = 1 y | T r ∩ G j | − ∑ i = 1 x ∑ j = 1 y | F i ∩ G j | ( 17 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGG8baFcqWGubavdaWgaaWcbaGaemOCaihabeaakiabgMIihlqbdsfauzaafaWaaSbaaSqaaiabdkhaYbqabaGccqGG8baFcqGH9aqpcqWGmbatcqWGLbqzcqWGHbqycqWGMbGzdaWadiqaaiabdkhaYjabcYcaSiqbdkhaYzaafaaacaGLBbGaayzxaaGaey4kaSYaaabCaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaemyAaKgabeaakiabgMIihlqbdsfauzaafaWaaSbaaSqaaiabdkhaYbqabaGccqGG8baFaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabdIha4bqdcqGHris5aOGaey4kaSYaaabCaeaacqGG8baFcqWGubavdaWgaaWcbaGaemOCaihabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeyOeI0YaaabCaeaadaaeWbqaaiabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabdMha5bqdcqGHris5aaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemiEaGhaniabggHiLdaaleaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWG5bqEa0GaeyyeIuoakiaaxMaacaWLjaWaaeWaceaacqaIXaqmcqaI3aWnaiaawIcacaGLPaaaaaa@84B8@\nThe first term counts all leaves directly connected to both r and r'. The second term counts all leaves connected directly to r', that are also in Tr, but not directly connected to r. The third term counts all leaves connected directly to r, that are also in T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@, but not directly connected to r'. Summing these three terms counts all leaves present in both subtrees, but leaves not connected directly to the roots are counted twice, and are subtracted by the last term.\nSince (17) is only summing over the non-leaf children of a given internal node, calculating the shared leaf set sizes of all these pairs of subtrees can be done using dynamic programming in time:\n( n + | V | | V ′ | ) + ∑ v ∈ V i d v + ∑ v ′ ∈ V ′ i d v ′ + ∑ v ∈ V ∑ v ′ ∈ V ′ i d v i d v ′ = O ( n + | V | | V ′ | ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqadiqaaiabd6gaUjabgUcaRiabcYha8jabdAfawjabcYha8jabcYha8jqbdAfawzaafaGaeiiFaWhacaGLOaGaayzkaaGaey4kaSYaaabuaeaacqWGPbqAcqWGKbazdaWgaaWcbaGaemODayhabeaakiabgUcaRmaaqafabaGaemyAaKMaemizaq2aaSbaaSqaaiqbdAha2zaafaaabeaaaeaacuWG2bGDgaqbaiabgIGiolqbdAfawzaafaaabeqdcqGHris5aOGaey4kaSYaaabuaeaadaaeqbqaaiabdMgaPjabdsgaKnaaBaaaleaacqWG2bGDaeqaaOGaemyAaKMaemizaq2aaSbaaSqaaiqbdAha2zaafaaabeaakiabg2da9iabd+eapnaabmGabaGaemOBa4Maey4kaSIaeiiFaWNaemOvayLaeiiFaWNaeiiFaWNafmOvayLbauaacqGG8baFaiaawIcacaGLPaaaaSqaaiqbdAha2zaafaGaeyicI4SafmOvayLbauaaaeqaniabggHiLdaaleaacqWG2bGDcqGHiiIZcqWGwbGvaeqaniabggHiLdaaleaacqWG2bGDcqGHiiIZcqWGwbGvaeqaniabggHiLdaaaa@74EC@\nBy the same arguments as above, the rest of the shared leaf set sizes can be computed in time O(|V||V'|) and space O(|V||V'|). Therefore the total running time of the algorithm is O(n + |V||V'|) and space usage is O(|V||V'|).","divisions":[{"label":"title","span":{"begin":0,"end":37}},{"label":"p","span":{"begin":38,"end":691}},{"label":"p","span":{"begin":692,"end":1976}},{"label":"figure","span":{"begin":1977,"end":2070}},{"label":"label","span":{"begin":1977,"end":1986}},{"label":"caption","span":{"begin":1988,"end":2070}},{"label":"p","span":{"begin":1988,"end":2070}},{"label":"p","span":{"begin":2071,"end":2989}},{"label":"p","span":{"begin":2990,"end":3404}},{"label":"p","span":{"begin":3405,"end":4544}},{"label":"p","span":{"begin":4545,"end":5809}},{"label":"p","span":{"begin":5810,"end":6774}},{"label":"p","span":{"begin":6775,"end":7187}},{"label":"p","span":{"begin":7188,"end":8757}},{"label":"p","span":{"begin":8758,"end":9981}},{"label":"p","span":{"begin":9982,"end":10739}},{"label":"p","span":{"begin":10740,"end":10935}},{"label":"p","span":{"begin":10936,"end":11980}}],"tracks":[]}