Methods Consider two input trees, and assume that a quartet has butterfly topology in both trees, i.e. that one pair of the four leaves is separated from the other pair by an edge in the tree in both trees. We say that the butterfly quartet is shared, if it has the same butterfly topology in both trees. Otherwise, we say that the butterfly quartet is nonshared. We let shared(T, T') denote the number of butterflies shared between tree T and tree T', i.e. the number of quartets that are butterflies with the same topology in tree T and tree T', and let nonshared (T, T') denote the number of quartets that are butterflies in both T and T' but with different topology. By our definition of shared, the number of butterfly quartets in a single tree can be stated as the number of butterfly quartets shared between the tree and itself, i.e. shared(T, T) or shared(T', T') for the number of butterfly quartets in T and T' respectively. (This notation also emphasizes that computing the number of butterfly quartets in a single tree by our algorithm is performed as a comparison of the tree against itself.) In [11] we argue that the quartet distance between T and T', qdist(T, T'), can be found by focusing only on the computation of the number of shared and nonshared butterfly quartets between two trees, i.e. it is unnecessary to consider non-butterfly quartets explicitely. More specifically, we show that: qdist ( T , T ′ ) = shared ( T , T ) + shared ( T ′ , T ′ ) −   2 ⋅ shared ( T , T ′ ) − nonshared ( T , T ′ )       ( 1 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqadeGabaaabaGaeeyCaeNaeeizaqMaeeyAaKMaee4CamNaeeiDaqNaeiikaGIaemivaqLaeiilaWIafmivaqLbauaacqGGPaqkcqGH9aqpcqqGZbWCcqqGObaAcqqGHbqycqqGYbGCcqqGLbqzcqqGKbazcqGGOaakcqWGubavcqGGSaalcqWGubavcqGGPaqkcqGHRaWkcqqGZbWCcqqGObaAcqqGHbqycqqGYbGCcqqGLbqzcqqGKbazcqGGOaakcuWGubavgaqbaiabcYcaSiqbdsfauzaafaGaeiykaKcabaGaeyOeI0IaeeiiaaIaeGOmaiJaeyyXICTaee4CamNaeeiAaGMaeeyyaeMaeeOCaiNaeeyzauMaeeizaqMaeiikaGIaemivaqLaeiilaWIafmivaqLbauaacqGGPaqkcqGHsislcqqGUbGBcqqGVbWBcqqGUbGBcqqGZbWCcqqGObaAcqqGHbqycqqGYbGCcqqGLbqzcqqGKbazcqGGOaakcqWGubavcqGGSaalcuWGubavgaqbaiabcMcaPaaacaWLjaGaaCzcamaabmGabaGaeGymaedacaGLOaGaayzkaaaaaa@7CB7@ The proof of this formula is a follows. Let Q denote the number of quartets which have butterfly topology in T and non-butterfly topology in T'. Symmetrically, let Q' denote the number of quartets which have butterfly topology in T' and non-butterfly topology in T. A butterfly quartet in T is either a butterfly quartet in T' or a non-butterfly quartet in T'. The number of butterfly quartets in T, shared(T, T), can thus be expressed as the sum shared(T, T') + nonshared(T, T') + Q. Similarly, the sum shared (T', T') = Q' + shared (T, T') + nonshared (T, T'). It is now straightforward to verify that the righthand side of (1) adds up Q + Q' + nonshared(T, T') which is the number of quartets where the quartet topologies differ in T and T', i.e. qdist(T,T'). In the section below, we describe how to use (1) to compute the quartet distance in time O(n + |V||V'| min{id, id'}), more precisely O(n + |V||V'|) for a preprocessing step, after which we can use O(|V||V'|) for calculating shared(T, T'), O(|V||V'|{id, id'}) for calculating nonshared(T,T'), O(|V|) for calculating shared(T,T) and O(|V'|) for calculating shared(T',T'). Terminology Let T and T' be two unrooted trees. In this paper we will explicitly refer to the leaves of a tree as leaves and the non-leaf nodes as internal node. We will assume that T and T' each has n labelled leaves numbered 1,..., n such that the leaf numbered x in T has the same label as the leaf numbered x in T'. The leaf sets are denoted L and L' for T and T' respectively, note that L = L'. We will use V and V' to denote the internal nodes in T and T' respectively. The degree of an internal node v is the number of subtrees connected to it, and is denoted dv. The internal degree of an internal node v, idv, is the number of non-leaf subtrees connected to it. We will assume that no internal node in T and T' has degree two, and we will denote the maximal internal degree of all internal nodes in T and T' by id and id' respectively. Let v be an internal node in T, and let F1, ..., Fdv MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaWgaaWcbaGaemizaq2aaSbaaWqaaiabdAha2bqabaaaleqaaaaa@30EB@ be the subtrees connected to it, as shown in Fig. 6 We call these the subtrees of v. We say that v claims all butterfly quartets ab|cd where a,b ∈ Fi, c ∈ Fk and d ∈ Fm for i ≠ k ≠ m (see Fig. 7). With this definition, each butterfly quartet is claimed by exactly two internal nodes. Figure 6 An internal node v ∈ T with subtrees F1,...,Fd, here dv = 6. Adding the subscript yz|w|x to an internal node claiming the butterfly quartet wx|yz, indicates that the leaves y and z are found in a single subtree of the internal node, while the leaves w and x are found in different subtrees. For example, considering the quartet ab|cd, v and v' in Fig. 7 are written as vab|c|d and v′ab|c|d MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG2bGDgaqbamaaBaaaleaacqWGHbqycqWGIbGycqGG8baFcqWGJbWycqGG8baFcqWGKbazaeqaaaaa@3691@. Figure 7 Internal nodes v ∈ T and v' ∈ T', each claiming the quartet ab|cd. Given a subtree F of T, and a subtree G of T', we call the intersection F ∩ G a shared leaf set, i.e. the set of leaves present in both F and G. The size of the shared leaf set, |F ∩ G|, then denotes the number of leaves present in both F and G. The size of a single subtree F is similarly denoted |F|. We will use F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@ to represent the subtree of T containing all leaves not in F and similarly for G¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGhbWrgaqeaaaa@2DDB@ and G in T', see Fig. 8 for an example. Note that F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@ and G¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGhbWrgaqeaaaa@2DDB@ are also subtrees of T and T' respectively, and thus |F ∩ G|, |F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@ ∩ G|, |F ∩ G¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGhbWrgaqeaaaa@2DDB@| and |F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@ ∩ G¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGhbWrgaqeaaaa@2DDB@| are all sizes of shared leaf sets between a single subtree from T and a single subtree from T'. In the presentation of the algorithms below, we will assume that we have access to |F|, |G| and |F ∩ G| for all subtrees F of T and G of T' in time O(1). At the end of the section we will describe how this can be achieved by an O(n) time preprocessing step, which does not affect the asymptotic worst case running time of the presented algorithms. Figure 8 A rooted subtree F, and its complement rooted subtree F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@. Counting shared butterfly quartets For each pair of internal nodes v, v' from T, T' we want to count the number of shared butterfly quartets claimed by both internal nodes, shared(v, v'). Assume that F1, ..., Fdv MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaWgaaWcbaGaemizaq2aaSbaaWqaaiabdAha2bqabaaaleqaaaaa@30EB@ are subtrees of v and G1, ..., Gdv′ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGhbWrdaWgaaWcbaGaemizaq2aaSbaaWqaaiqbdAha2zaafaaabeaaaSqabaaaaa@30F9@ are subtrees of v'. We wish to count all quartets on the form ab|cd where a, b ∈ Fi ∩ Gj, c ∈ Fk ∩ Gl and d ∈ Fm ∩ Gn, i ≠ k ≠ m, j ≠ l ≠ n (see Fig. 7). Counting the possible combinations of a and b is expressed by the following double sum, which sums over all pairs of subtrees of v and v': ∑ i ∑ j ( | F i ∩ G j | 2 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaaqafabaWaaeWaceaafaqabeGabaaabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaSqaaiabdQgaQbqab0GaeyyeIuoaaSqaaiabdMgaPbqab0GaeyyeIuoaaaa@4022@ Given that a and b are in Fi ∩ Gj, we need to find c and d in F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@i ∩ G¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGhbWrgaqeaaaa@2DDB@j. The number of possible choices of c and d is expressed by: ( | F ¯ i ∩ G ¯ j | 2 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqadiqaauaabeqaceaaaeaacqGG8baFcuWGgbGrgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaaa@3954@ However when finding c and d in F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@i ∩ G¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGhbWrgaqeaaaa@2DDB@j, the condition that c and d must be in different subtrees is not satisfied. Therefore we subtract the number of times c and d are in the same subtree of v and v': ∑ k ≠ i ( | F k ∩ G ¯ j | 2 ) + ∑ l ≠ j ( | F ¯ i ∩ G l | 2 ) − ∑ k ≠ i ∑ l ≠ j ( | F k ∩ G l | 2 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaacqGHRaWkdaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jqbdAeagzaaraWaaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaSqaaiabdYgaSjabgcMi5kabdQgaQbqab0GaeyyeIuoaaSqaaiabdUgaRjabgcMi5kabdMgaPbqab0GaeyyeIuoakiabgkHiTmaaqafabaWaaabuaeaadaqadiqaauaabeqaceaaaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaem4AaSgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGSbaBaeqaaOGaeiiFaWhabaGaeGOmaidaaaGaayjkaiaawMcaaaWcbaGaemiBaWMaeyiyIKRaemOAaOgabeqdcqGHris5aaWcbaGaem4AaSMaeyiyIKRaemyAaKgabeqdcqGHris5aaaa@6EC5@ Any pair in Fk ∩ Gl is counted twice, once in |Fk ∩ G¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGhbWrgaqeaaaa@2DDB@j| and once in |F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@i ∩ Gl|, therefore these pairs are subtracted once using the double sum above. (2) expresses the number of ways c and d can be found in different subtrees, given that a and b are found in Fi ∩ Gj: ( | F ¯ i ∩ G ¯ j | 2 ) − ∑ k ≠ i ( | F k ∩ G ¯ j | 2 ) − ∑ l ≠ j ( | F ¯ i ∩ G l | 2 ) + ∑ k ≠ i ∑ l ≠ j ( | F k ∩ G l | 2 )       ( 2 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqadiqaauaabeqaceaaaeaacqGG8baFcuWGgbGrgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaacqGHsisldaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaSqaaiabdUgaRjabgcMi5kabdMgaPbqab0GaeyyeIuoakiabgkHiTmaaqafabaWaaeWaceaafaqabeGabaaabaGaeiiFaWNafmOrayKbaebadaWgaaWcbaGaemyAaKgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGSbaBaeqaaOGaeiiFaWhabaGaeGOmaidaaaGaayjkaiaawMcaaaWcbaGaemiBaWMaeyiyIKRaemOAaOgabeqdcqGHris5aOGaey4kaSYaaabuaeaadaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFaeaacqaIYaGmaaaacaGLOaGaayzkaaaaleaacqWGSbaBcqGHGjsUcqWGQbGAaeqaniabggHiLdaaleaacqWGRbWAcqGHGjsUcqWGPbqAaeqaniabggHiLdGccaWLjaGaaCzcamaabmGabaGaeGOmaidacaGLOaGaayzkaaaaaa@802F@ We can now compute the number of shared butterfly quartets between two internal nodes, ie. the number of butterfly quartets claimed by both internal nodes with the same topology: shared ( v , v ′ ) = ∑ i ∑ j ( | F i ∩ G j | 2 ) ( ( | F ¯ i ∩ G ¯ j | 2 ) − ∑ k ≠ i ( | F k ∩ G ¯ j | 2 ) − ∑ l ≠ j ( | F ¯ i ∩ G l | 2 ) + ∑ k ≠ i ∑ l ≠ j ( | F k ∩ G l | 2 ) )       ( 3 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqaceGabaaabaGaee4CamNaeeiAaGMaeeyyaeMaeeOCaiNaeeyzauMaeeizaq2aaeWaceaacqWG2bGDcqGGSaalcuWG2bGDgaqbaaGaayjkaiaawMcaaiabg2da9maaqafabaWaaabuaeaadaqadiqaauaabeqaceaaaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaemyAaKgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhabaGaeGOmaidaaaGaayjkaiaawMcaaaWcbaGaemOAaOgabeqdcqGHris5aaWcbaGaemyAaKgabeqdcqGHris5aOWaaeqaceaadaqadiqaauaabeqaceaaaeaacqGG8baFcuWGgbGrgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaacqGHsisldaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaSqaaiabdUgaRjabgcMi5kabdMgaPbqab0GaeyyeIuoaaOGaayjkaaaabaWaaeGaceaacqGHsisldaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jqbdAeagzaaraWaaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaacqGHRaWkdaaeqbqaamaaqafabaWaaeWaceaafaqabeGabaaabaGaeiiFaWNaemOray0aaSbaaSqaaiabdUgaRbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaSqaaiabdYgaSjabgcMi5kabdQgaQbqab0GaeyyeIuoaaSqaaiabdUgaRjabgcMi5kabdMgaPbqab0GaeyyeIuoaaSqaaiabdYgaSjabgcMi5kabdQgaQbqab0GaeyyeIuoaaOGaayzkaaaaaiaaxMaacaWLjaWaaeWaceaacqaIZaWmaiaawIcacaGLPaaaaaa@A3C6@ If the trees, T and T', have a shared quartet ab|cd, then there are two internal nodes in each tree that claims this quartet: vab|c|d and vcd|a|b in T and v′ab|c|d MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG2bGDgaqbamaaBaaaleaacqWGHbqycqWGIbGycqGG8baFcqWGJbWycqGG8baFcqWGKbazaeqaaaaa@3691@ and v′cd|a|b MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG2bGDgaqbamaaBaaaleaacqWGJbWycqWGKbazcqGG8baFcqWGHbqycqGG8baFcqWGIbGyaeqaaaaa@3691@ in T'. Since both shared(vab|c|d, v′ab|c|d MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG2bGDgaqbamaaBaaaleaacqWGHbqycqWGIbGycqGG8baFcqWGJbWycqGG8baFcqWGKbazaeqaaaaa@3691@) and shared(vcd|a|b, v′cd|a|b MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG2bGDgaqbamaaBaaaleaacqWGJbWycqWGKbazcqGG8baFcqWGHbqycqGG8baFcqWGIbGyaeqaaaaa@3691@) will count the quartet, the total number of shared quartets between the two trees is: shared ( T , T ′ ) = 1 2 ∑ v ∈ T ∑ v ′ ∈ T ′ shared ( v , v ′ ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGZbWCcqqGObaAcqqGHbqycqqGYbGCcqqGLbqzcqqGKbazcqGGOaakcqWGubavcqGGSaalcuWGubavgaqbaiabcMcaPiabg2da9maalaaabaGaeGymaedabaGaeGOmaidaamaaqafabaWaaabuaeaacqqGZbWCcqqGObaAcqqGHbqycqqGYbGCcqqGLbqzcqqGKbazcqGGOaakcqWG2bGDcqGGSaalcuWG2bGDgaqbaiabcMcaPaWcbaGafmODayNbauaacqGHiiIZcuWGubavgaqbaaqab0GaeyyeIuoaaSqaaiabdAha2jabgIGiolabdsfaubqab0GaeyyeIuoaaaa@570E@ It is straightforward to observe that calculating shared(v, v') using a direct computation of (3) takes time O(dv2dv′2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGKbazdaqhaaWcbaGaemODayhabaGaeGOmaidaaOGaemizaq2aa0baaSqaaiqbdAha2zaafaaabaGaeGOmaidaaaaa@348C@). It is however not necessary for shared(v, v') to sum over all subtrees of v and v'. Since each term in the sums involves taking a 2-subset from a shared leaf set, we need only to consider subtrees that are not leaves. This reduces the running time to O(idv2idv′2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGPbqAcqWGKbazdaqhaaWcbaGaemODayhabaGaeGOmaidaaOGaemyAaKMaemizaq2aa0baaSqaaiqbdAha2zaafaaabaGaeGOmaidaaaaa@3742@). This running time can be improved even more, we start by expressing (2) in a different way: ( | F ¯ i ∩ G ¯ j | 2 ) ︸ I − ∑ k ≠ i ( | F k ∩ G ¯ j | 2 ) ︸ I I − ∑ l ≠ j ( | F ¯ i ∩ G l | 2 ) ︸ I I I + ∑ k ≠ i ∑ l ≠ j ( | F k ∩ G l | 2 ) ︸ I V = ( | F ¯ i ∩ G ¯ j | 2 ) ︸ I − ∑ k ( | F k ∩ G ¯ j | 2 ) + ( | F i ∩ G ¯ j | 2 ) ︸ I I − ∑ l ( | F ¯ i ∩ G l | 2 ) + ( | F ¯ i ∩ G j | 2 ) ︸ I I I + ∑ k ∑ l ( | F k ∩ G l | 2 ) − ∑ k ( | F k ∩ G j | 2 ) − ∑ l ( | F i ∩ G l | 2 ) + ( | F i ∩ G j | 2 ) ︸ I V       ( 4 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeWabaaabaWaaGbaaeaadaqadiqaauaabeqaceaaaeaacqGG8baFcuWGgbGrgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaSqaaiabdMeajbGccaGL44pacqGHsisldaagaaqaamaaqafabaWaaeWaceaafaqabeGabaaabaGaeiiFaWNaemOray0aaSbaaSqaaiabdUgaRbqabaGccqGHPiYXcuWGhbWrgaqeamaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhabaGaeGOmaidaaaGaayjkaiaawMcaaaWcbaGaem4AaSMaeyiyIKRaemyAaKgabeqdcqGHris5aaWcbaGaemysaKKaemysaKeakiaawIJ=aiabgkHiTmaayaaabaWaaabuaeaadaqadiqaauaabeqaceaaaeaacqGG8baFcuWGgbGrgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFaeaacqaIYaGmaaaacaGLOaGaayzkaaaaleaacqWGSbaBcqGHGjsUcqWGQbGAaeqaniabggHiLdaaleaacqWGjbqscqWGjbqscqWGjbqsaOGaayjo+dGaey4kaSYaaGbaaeaadaaeqbqaamaaqafabaWaaeWaceaafaqabeGabaaabaGaeiiFaWNaemOray0aaSbaaSqaaiabdUgaRbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaSqaaiabdYgaSjabgcMi5kabdQgaQbqab0GaeyyeIuoaaSqaaiabdUgaRjabgcMi5kabdMgaPbqab0GaeyyeIuoaaSqaaiabdMeajjabdAfawbGccaGL44pacqGH9aqpaeaadaagaaqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jqbdAeagzaaraWaaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcuWGhbWrgaqeamaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhabaGaeGOmaidaaaGaayjkaiaawMcaaaWcbaGaemysaKeakiaawIJ=aiabgkHiTmaayaaabaWaaabuaeaadaqadiqaauaabeqaceaaaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaem4AaSgabeaakiabgMIihlqbdEeahzaaraWaaSbaaSqaaiabdQgaQbqabaGccqGG8baFaeaacqaIYaGmaaaacaGLOaGaayzkaaGaey4kaSYaaeWaceaafaqabeGabaaabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcuWGhbWrgaqeamaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhabaGaeGOmaidaaaGaayjkaiaawMcaaaWcbaGaem4AaSgabeqdcqGHris5aaWcbaGaemysaKKaemysaKeakiaawIJ=aiabgkHiTmaayaaabaWaaabuaeaadaqadiqaauaabeqaceaaaeaacqGG8baFcuWGgbGrgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFaeaacqaIYaGmaaaacaGLOaGaayzkaaGaey4kaSYaaeWaceaafaqabeGabaaabaGaeiiFaWNafmOrayKbaebadaWgaaWcbaGaemyAaKgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhabaGaeGOmaidaaaGaayjkaiaawMcaaaWcbaGaemiBaWgabeqdcqGHris5aaWcbaGaemysaKKaemysaKKaemysaKeakiaawIJ=aaqaaiabgUcaRmaayaaabaWaaabuaeaadaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFaeaacqaIYaGmaaaacaGLOaGaayzkaaGaeyOeI0caleaacqWGSbaBaeqaniabggHiLdaaleaacqWGRbWAaeqaniabggHiLdGcdaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFaeaacqaIYaGmaaaacaGLOaGaayzkaaGaeyOeI0caleaacqWGRbWAaeqaniabggHiLdGcdaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFaeaacqaIYaGmaaaacaGLOaGaayzkaaGaey4kaSYaaeWaceaafaqabeGabaaabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaSqaaiabdYgaSbqab0GaeyyeIuoaaSqaaiabdMeajjabdAfawbGccaGL44paaaGaaCzcaiaaxMaadaqadiqaaiabisda0aGaayjkaiaawMcaaaaa@3050@ Let S j = ∑ k ( | F k ∩ G j | 2 ) S ′ i = ∑ l ( | F i ∩ G l | 2 ) S ¯ j = ∑ k ( | F k ∩ G ¯ j | 2 ) S ¯ ′ i = ∑ l ( | F ¯ i ∩ G l | 2 ) S = ∑ k ∑ l ( | F k ∩ G l | 2 )       ( 5 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqadeqbbaaaaeaacqWGtbWudaWgaaWcbaGaemOAaOgabeaakiabg2da9maaqafabaWaaeWaceaafaqabeGabaaabaGaeiiFaWNaemOray0aaSbaaSqaaiabdUgaRbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaSqaaiabdUgaRbqab0GaeyyeIuoaaOqaaiqbdofatzaafaWaaSbaaSqaaiabdMgaPbqabaGccqGH9aqpdaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFaeaacqaIYaGmaaaacaGLOaGaayzkaaaaleaacqWGSbaBaeqaniabggHiLdaakeaacuWGtbWugaqeamaaBaaaleaacqWGQbGAaeqaaOGaeyypa0ZaaabuaeaadaqadiqaauaabeqaceaaaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaem4AaSgabeaakiabgMIihlqbdEeahzaaraWaaSbaaSqaaiabdQgaQbqabaGccqGG8baFaeaacqaIYaGmaaaacaGLOaGaayzkaaaaleaacqWGRbWAaeqaniabggHiLdaakeaacuWGtbWugaqegaqbamaaBaaaleaacqWGPbqAaeqaaOGaeyypa0ZaaabuaeaadaqadiqaauaabeqaceaaaeaacqGG8baFcuWGgbGrgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFaeaacqaIYaGmaaaacaGLOaGaayzkaaaaleaacqWGSbaBaeqaniabggHiLdaakeaacqWGtbWucqGH9aqpdaaeqbqaamaaqafabaWaaeWaceaafaqabeGabaaabaGaeiiFaWNaemOray0aaSbaaSqaaiabdUgaRbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaSqaaiabdYgaSbqab0GaeyyeIuoaaSqaaiabdUgaRbqab0GaeyyeIuoaaaGccaWLjaGaaCzcamaabmGabaGaeGynaudacaGLOaGaayzkaaaaaa@9604@ We can ignore leaf subtrees, so we need to compute idv' different S¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqeaaaa@2DF3@j's and Sj's which can each be computed in O(idv) time. Symmetrically each of the idv S¯′i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqegaqbamaaBaaaleaacqWGPbqAaeqaaaaa@2F85@'s and S′i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqbamaaBaaaleaacqWGPbqAaeqaaaaa@2F6E@'s takes time O(id′v MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGPbqAcuWGKbazgaqbamaaBaaaleaacqWG2bGDaeqaaaaa@3105@) to compute, and the total time of computing S is O(idvidv'). The total time of computing all sums mentioned is thus O(idvidv') and this is the key to reducing the time usage of shared(v,v'). Using the sums we can express (4) as: ( | F ¯ i ∩ G ¯ j | 2 ) ︸ I − S ¯ j + ( | F i ∩ G ¯ j | 2 ) ︸ I I − S ¯ ′ i + ( | F ¯ i ∩ G j | 2 ) ︸ I I I + S − S j − S ′ i + ( | F i ∩ G j | 2 ) ︸ I V MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaagaaqaamaabmGabaqbaeqabiqaaaqaaiabcYha8jqbdAeagzaaraWaaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcuWGhbWrgaqeamaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhabaGaeGOmaidaaaGaayjkaiaawMcaaaWcbaGaemysaKeakiaawIJ=aiabgkHiTmaayaaabaGafm4uamLbaebadaWgaaWcbaGaemOAaOgabeaakiabgUcaRmaabmGabaqbaeqabiqaaaqaaiabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8bqaaiabikdaYaaaaiaawIcacaGLPaaaaSqaaiabdMeajjabdMeajbGccaGL44pacqGHsisldaagaaqaaiqbdofatzaaryaafaWaaSbaaSqaaiabdMgaPbqabaGccqGHRaWkdaqadiqaauaabeqaceaaaeaacqGG8baFcuWGgbGrgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFaeaacqaIYaGmaaaacaGLOaGaayzkaaaaleaacqWGjbqscqWGjbqscqWGjbqsaOGaayjo+dGaey4kaSYaaGbaaeaacqWGtbWucqGHsislcqWGtbWudaWgaaWcbaGaemOAaOgabeaakiabgkHiTiqbdofatzaafaWaaSbaaSqaaiabdMgaPbqabaGccqGHRaWkdaqadiqaauaabeqaceaaaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaemyAaKgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhabaGaeGOmaidaaaGaayjkaiaawMcaaaWcbaGaemysaKKaemOvayfakiaawIJ=aaaa@8394@ Provided that the sums S¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqeaaaa@2DF3@j, S¯′i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqegaqbamaaBaaaleaacqWGPbqAaeqaaaaa@2F85@, Sj, S′i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqbamaaBaaaleaacqWGPbqAaeqaaaaa@2F6E@ and S have been calculated, (4) can be calculated in time O(l). Since calculation of the sums is independent on the calculation of shared(v, v'), these calculations can be done serially as shown in the algorithm below, thereby reducing the time usage of shared(T, T') to: ∑ v ∈ T ∑ v ′ ∈ T ′ i d v i d v ′ = ∑ v ∈ T i d v ∑ v ′ ∈ T ′ i d v ′ = 2 ( | V | − 1 ) ⋅ 2 ( | V ′ | − 1 ) = O ( | V | | V ′ | ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaaqafabaGaemyAaKMaemizaq2aaSbaaSqaaiabdAha2bqabaGccqWGPbqAcqWGKbazdaWgaaWcbaGafmODayNbauaaaeqaaaqaaiqbdAha2zaafaGaeyicI4SafmivaqLbauaaaeqaniabggHiLdGccqGH9aqpdaaeqbqaaiabdMgaPjabdsgaKnaaBaaaleaacqWG2bGDaeqaaaqaaiabdAha2jabgIGiolabdsfaubqab0GaeyyeIuoakmaaqafabaGaemyAaKMaemizaq2aaSbaaSqaaiqbdAha2zaafaaabeaaaeaacuWG2bGDgaqbaiabgIGiolqbdsfauzaafaaabeqdcqGHris5aOGaeyypa0JaeGOmaiZaaeWaceaacqGG8baFcqWGwbGvcqGG8baFcqGHsislcqaIXaqmaiaawIcacaGLPaaacqGHflY1cqaIYaGmdaqadiqaaiabcYha8jqbdAfawzaafaGaeiiFaWNaeyOeI0IaeGymaedacaGLOaGaayzkaaGaeyypa0Jaem4ta80aaeWaceaacqGG8baFcqWGwbGvcqGG8baFcqGG8baFcuWGwbGvgaqbaiabcYha8bGaayjkaiaawMcaaaWcbaGaemODayNaeyicI4SaemivaqfabeqdcqGHris5aaaa@7911@ ALGORITHM – CALCULATING THE NUMBER OF SHARED BUTTERFLY QUARTETS BETWEEN T AND T' Requires: T, T' two input trees with the same leaf set. Ensures: Res = shared(T, T') Res ← 0 for v internal node in T do    for v' internal node in T' do       Calculate sums S¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqeaaaa@2DF3@j, S¯′i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqegaqbamaaBaaaleaacqWGPbqAaeqaaaaa@2F85@, Sj, S′i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqbamaaBaaaleaacqWGPbqAaeqaaaaa@2F6E@ and S       Res ← Res + shared(v, v')    end for end for Res ← Res2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWcaaqaaGqaciab=jfasjab=vgaLjab=nhaZbqaaiabikdaYaaaaaa@319C@ Counting nonshared butterfly quartets For each pair of internal nodes v, v' we want to count the number of nonshared butterfly quartets claimed by both internal nodes, nonshared(v, v'). Such quartets have the property that a pair of leaves found in the same subtree of v will be found in different subtrees of v' and vice versa, i.e. a nonshared quartet with leaves a, b, c and d, has a ∈ Fi ∩ Gj, b ∈ Fi ∩ Gl, c ∈ Fk ∩ Gn and d ∈ Fm ∩ Gj (see Fig. 9). The following expression counts all nonshared quartets related to a pair of nodes v and v', obeying that if two leaves of the quartet are in one subtree of v they are in different subtrees of v' and vice versa: Figure 9 Internal nodes v ∈ T claiming the quartet ab|cd and v' ∈ T' claiming the quartet ad|bc. ∑ i ∑ j | F i ∩ G j | | F i ∩ G ¯ j | | F ¯ i ∩ G ¯ j | | F ¯ i ∩ G j |       ( 6 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8jabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8jabcYha8jqbdAeagzaaraWaaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcuWGhbWrgaqeamaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeiiFaWNafmOrayKbaebadaWgaaWcbaGaemyAaKgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhaleaacqWGQbGAaeqaniabggHiLdaaleaacqWGPbqAaeqaniabggHiLdGccaWLjaGaaCzcamaabmGabaGaeGOnaydacaGLOaGaayzkaaaaaa@5F95@ Even though (6) satisfies the property of nonshared quartets, it possibly counts more than the number of nonshared quartets claimed by an internal node in each tree. The problem is that given two internal nodes, they do not nescessarily claim the quartets counted by (6). If we denote the leaves of an nonshared quartet a, b, c and d, the first, second, third and fourth factors in (6) counts the number of choices of a, b, c and d respectively. The first and second factor choose a and b from Fi, while the third and fourth choose c and d from F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@i. In the cases where c and d are chosen from the same subtree Fk, k ≠ i of v, v does not claim the quartet. We must subtract these quartets, which can be counted as: ∑ i ∑ j ∑ k ≠ i | F i ∩ G j | | F i ∩ G ¯ j | | F k ∩ G ¯ j | | F k ∩ G j |       ( 7 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaaqafabaWaaabuaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaemyAaKgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcuWGhbWrgaqeamaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeiiFaWNaemOray0aaSbaaSqaaiabdUgaRbqabaGccqGHPiYXcuWGhbWrgaqeamaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeiiFaWNaemOray0aaSbaaSqaaiabdUgaRbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8bWcbaGaem4AaSMaeyiyIKRaemyAaKgabeqdcqGHris5aaWcbaGaemOAaOgabeqdcqGHris5aOGaaCzcaiaaxMaadaqadiqaaiabiEda3aGaayjkaiaawMcaaaWcbaGaemyAaKgabeqdcqGHris5aaaa@6613@ Similarly there are cases where b and c are chosen from the same subtree Gl, l ≠ j of v', which we must also subtract. These can be counted as: ∑ i ∑ j ∑ l ≠ j | F i ∩ G j | | F i ∩ G l | | F ¯ i ∩ G l | | F ¯ i ∩ G j |       ( 8 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaaqafabaWaaabuaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaemyAaKgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8jabcYha8jqbdAeagzaaraWaaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8jabcYha8jqbdAeagzaaraWaaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8bWcbaGaemiBaWMaeyiyIKRaemOAaOgabeqdcqGHris5aaWcbaGaemOAaOgabeqdcqGHris5aaWcbaGaemyAaKgabeqdcqGHris5aOGaaCzcaiaaxMaadaqadiqaaiabiIda4aGaayjkaiaawMcaaaaa@6619@ The cases where both c and d are chosen from the same subtree Fk, k ≠ i of v and b and c are chosen from the same subtree Gl, l ≠ j of v' are included in both the expressions above and therefore they must be added again. The following expression counts the number of these cases: ∑ i ∑ j ∑ k ≠ i ∑ l ≠ j | F i ∩ G j | | F i ∩ G l | | F k ∩ G l | | F k ∩ G j |       ( 9 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaaqafabaWaaabuaeaadaaeqbqaaiabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFcqGG8baFcqWGgbGrdaWgaaWcbaGaemyAaKgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGSbaBaeqaaOGaeiiFaWNaeiiFaWNaemOray0aaSbaaSqaaiabdUgaRbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8jabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFaSqaaiabdYgaSjabgcMi5kabdQgaQbqab0GaeyyeIuoaaSqaaiabdUgaRjabgcMi5kabdMgaPbqab0GaeyyeIuoaaSqaaiabdQgaQbqab0GaeyyeIuoaaSqaaiabdMgaPbqab0GaeyyeIuoakiaaxMaacaWLjaWaaeWaceaacqaI5aqoaiaawIcacaGLPaaaaaa@6C97@ Combining equations (6), (7), (8) and (9), gives a way of calculating the number of nonshared quartets between two internal nodes v and v': n o n s h a r e d ( v , v ′ ) = ∑ i ∑ j ( | F i ∩ G j | | F i ∩ G ¯ j | | F ¯ i ∩ G ¯ j | | F ¯ i ∩ G j | − ∑ k ≠ i | F i ∩ G j | | F i ∩ G ¯ j | | F k ∩ G ¯ j | | F k ∩ G j | − ∑ l ≠ j | F i ∩ G j | | F i ∩ G l | | F ¯ i ∩ G l | | F ¯ i ∩ G j | + ∑ k ≠ i ∑ l ≠ j | F i ∩ G j | | F i ∩ G l | | F k ∩ G l | | F k ∩ G j | )       ( 10 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqadeabbaaaaeaaieaacqWFUbGBcqWFVbWBcqWFUbGBcqWFZbWCcqWFObaAcqWFHbqycqWFYbGCcqWFLbqzcqWFKbazdaqadiqaaGqaciab+zha2jab+XcaSiqb+zha2zaafaaacaGLOaGaayzkaaGaeyypa0ZaaabuaeaadaaeqbqaamaabeGabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8jabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8jabcYha8jqbdAeagzaaraWaaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcuWGhbWrgaqeamaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeiiFaWNafmOrayKbaebadaWgaaWcbaGaemyAaKgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhacaGLOaaaaSqaaiabdQgaQbqab0GaeyyeIuoaaSqaaiabdMgaPbqab0GaeyyeIuoaaOqaceaawpGaaCzcaiabgkHiTmaaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8jabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8jabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSafm4raCKbaebadaWgaaWcbaGaemOAaOgabeaakiabcYha8jabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFaSqaaiabdUgaRjabgcMi5kabdMgaPbqab0GaeyyeIuoaaOqaciaakpaa6pGaaCzcaiabgkHiTmaaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8jabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFcqGG8baFcuWGgbGrgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFcqGG8baFcuWGgbGrgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFaSqaaiabdYgaSjabgcMi5kabdQgaQbqab0GaeyyeIuoaaOqaamaabiGabiWaaO5aaibbaihbcaWLjaGaey4kaSYaaabuaeaadaaeqbqaaiabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFcqGG8baFcqWGgbGrdaWgaaWcbaGaemyAaKgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGSbaBaeqaaOGaeiiFaWNaeiiFaWNaemOray0aaSbaaSqaaiabdUgaRbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8jabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFaSqaaiabdYgaSjabgcMi5kabdQgaQbqab0GaeyyeIuoaaSqaaiabdUgaRjabgcMi5kabdMgaPbqab0GaeyyeIuoaaOGaayzkaaaaaiaaxMaacaWLjaWaaeWaceaacqaIXaqmcqaIWaamaiaawIcacaGLPaaaaaa@0F94@ Assuming that the trees have a nonshared quartet with topology form ab|cd in T, and ad|bc in T', there are two internal nodes in each tree claiming the quartet: vab|c|d and vcd|a|b in T and v′ad|b|c MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG2bGDgaqbamaaBaaaleaacqWGHbqycqWGKbazcqGG8baFcqWGIbGycqGG8baFcqWGJbWyaeqaaaaa@3691@ and v′bc|a|d MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG2bGDgaqbamaaBaaaleaacqWGIbGycqWGJbWycqGG8baFcqWGHbqycqGG8baFcqWGKbazaeqaaaaa@3691@ in T'. All of the four combinations of these will identify the quartet as nonshared. Therefore the total number of nonshared quartets between the two trees is: nonshared ( T , T ′ ) = 1 4 ∑ v ∈ T ∑ v ′ ∈ T ′ nonshared ( v , v ′ ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGUbGBcqqGVbWBcqqGUbGBcqqGZbWCcqqGObaAcqqGHbqycqqGYbGCcqqGLbqzcqqGKbazcqGGOaakcqWGubavcqGGSaalcuWGubavgaqbaiabcMcaPiabg2da9maalaaabaGaeGymaedabaGaeGinaqdaamaaqafabaWaaabuaeaacqqGUbGBcqqGVbWBcqqGUbGBcqqGZbWCcqqGObaAcqqGHbqycqqGYbGCcqqGLbqzcqqGKbazcqGGOaakcqWG2bGDcqGGSaalcuWG2bGDgaqbaiabcMcaPaWcbaGafmODayNbauaacqGHiiIZcuWGubavgaqbaaqab0GaeyyeIuoaaSqaaiabdAha2jabgIGiolabdsfaubqab0GaeyyeIuoaaaa@5F68@ Direct computation of nonshared(v, v') using (10) takes time O(dv2dv′2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGKbazdaqhaaWcbaGaemODayhabaGaeGOmaidaaOGaemizaq2aa0baaSqaaiqbdAha2zaafaaabaGaeGOmaidaaaaa@348C@). Each of the subtrees has to be intersected with two disjoint sets in each of the sums. This means that if a subtree is only a leaf, at least one of these intersections will be zero, and the term will be zero. Therefore we can ignore subtrees that consist of a single leaf, just like when computing shared(T, T'), and reduce the time usage to O(idv2idv′2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGPbqAcqWGKbazdaqhaaWcbaGaemODayhabaGaeGOmaidaaOGaemyAaKMaemizaq2aa0baaSqaaiqbdAha2zaafaaabaGaeGOmaidaaaaa@3742@). The time usage of the calculation of nonshared(v, v') can be further improved by rewriting (9): ∑ i ∑ j ∑ k ≠ i ∑ l ≠ j | F i ∩ G j | | F i ∩ G l | | F k ∩ G l | | F k ∩ G j | = ∑ i ∑ j | F i ∩ G j | ∑ k ≠ i | F k ∩ G j | ∑ l ≠ j | F i ∩ G l | | F k ∩ G l | = ∑ i ∑ j | F i ∩ G j | ∑ k ≠ i | F k ∩ G j | ( ∑ l | F i ∩ G l | | F k ∩ G l | − | F i ∩ G j | | F k ∩ G j | ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqaaeWabaaabaWaaabuaeaadaaeqbqaamaaqafabaWaaabuaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaemyAaKgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8jabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFcqGG8baFcqWGgbGrdaWgaaWcbaGaem4AaSgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhaleaacqWGSbaBcqGHGjsUcqWGQbGAaeqaniabggHiLdaaleaacqWGRbWAcqGHGjsUcqWGPbqAaeqaniabggHiLdaaleaacqWGQbGAaeqaniabggHiLdaaleaacqWGPbqAaeqaniabggHiLdGccqGH9aqpaeaadaaeqbqaamaaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8bWcbaGaemOAaOgabeqdcqGHris5aaWcbaGaemyAaKgabeqdcqGHris5aOWaaabuaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaem4AaSgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhaleaacqWGRbWAcqGHGjsUcqWGPbqAaeqaniabggHiLdGcdaaeqbqaaiabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFcqGG8baFcqWGgbGrdaWgaaWcbaGaem4AaSgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGSbaBaeqaaOGaeiiFaWhaleaacqWGSbaBcqGHGjsUcqWGQbGAaeqaniabggHiLdGccqGH9aqpaeaadaaeqbqaamaaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8bWcbaGaemOAaOgabeqdcqGHris5aaWcbaGaemyAaKgabeqdcqGHris5aOWaaabuaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaem4AaSgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhaleaacqWGRbWAcqGHGjsUcqWGPbqAaeqaniabggHiLdGcdaqadiqaamaaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8jabcYha8jabdAeagnaaBaaaleaacqWGRbWAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdYgaSbqabaGccqGG8baFaSqaaiabdYgaSbqab0GaeyyeIuoakiabgkHiTiabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFcqGG8baFcqWGgbGrdaWgaaWcbaGaem4AaSgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhacaGLOaGaayzkaaaaaaaa@F676@ Inspired by the precomputing of sums used in shared(v, v'), (5), we calculate for each i, k, k ≠ i the sum: S i , k = ∑ l | F i ∩ G l | | F k ∩ G l |       ( 11 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWudaWgaaWcbaGaemyAaKMaeiilaWIaem4AaSgabeaakiabg2da9maaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8bWcbaGaemiBaWgabeqdcqGHris5aOGaeiiFaWNaemOray0aaSbaaSqaaiabdUgaRbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemiBaWgabeaakiabcYha8jaaxMaacaWLjaWaaeWaceaacqaIXaqmcqaIXaqmaiaawIcacaGLPaaaaaa@4ED6@ There are O(idv2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGPbqAcqWGKbazdaqhaaWcbaGaemODayhabaGaeGOmaidaaaaa@31EC@) of these sums and each takes time O(idv') to calculate, so the time complexity for calculating all sums is O(idv2idv′ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGPbqAcqWGKbazdaqhaaWcbaGaemODayhabaGaeGOmaidaaOGaemyAaKMaemizaq2aaSbaaSqaaiqbdAha2zaafaaabeaaaaa@364F@). In the case that idv' ≤ idv, we can switch v and v' and thus get time usage O(idvidv' min{idv, idv'}). Assuming that the sums have been calculated, (9) can now be calculated in time O(idvidv'min{idv, idv'}) by the expression: ∑ i ∑ j | F i ∩ G j | ∑ k ≠ i | F k ∩ G j | ( S i , k − | F i ∩ G j | | F k ∩ G j | )       ( 12 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8bWcbaGaemOAaOgabeqdcqGHris5aaWcbaGaemyAaKgabeqdcqGHris5aOWaaabuaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaem4AaSgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaW3aaeWaceaacqWGtbWudaWgaaWcbaGaemyAaKMaeiilaWIaem4AaSgabeaakiabgkHiTiabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFcqGG8baFcqWGgbGrdaWgaaWcbaGaem4AaSgabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWhacaGLOaGaayzkaaaaleaacqWGRbWAcqGHGjsUcqWGPbqAaeqaniabggHiLdGccaWLjaGaaCzcamaabmGabaGaeGymaeJaeGOmaidacaGLOaGaayzkaaaaaa@6E4A@ By substituting (9) with (12) in (10), we can calculate nonshared(v, v', in time O(idvidv'min{idv, idv'}). Since calculation of the sums is independent of the calculation of nonshared(v, v'), these calculations can be done serially as shown in the algorithm below. ALGORITHM – CALCULATING THE NUMBER OF NONSHARED BUTTERFLY QUARTETS BETWEEN T AND T' Requires: T, T' two input trees with the same leaf set. Ensures: Res = nonshared(T, T') Res ← 0 for v internal node in T do    for v' internal node in T' do       Calculate sums Si,k       Res ← Res + nonshared(v, v')    end for end for Res ← Res4 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWcaaqaaGqaciab=jfasjab=vgaLjab=nhaZbqaaiabisda0aaaaaa@31A0@ The time complexity of the algorithm is: ∑ v ∈ T ∑ v ′ ∈ T ′ i d v i d v ′ min ⁡ { i d v , i d v ′ } ≤ min ⁡ { i d , i d ′ } ∑ v ∈ T i d v ∑ v ′ ∈ T ′ i d v ′ = O ( | V | | V ′ | min ⁡ { i d , i d ′ } ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaaqafabaGaemyAaKMaemizaq2aaSbaaSqaaiabdAha2bqabaGccqWGPbqAcqWGKbazdaWgaaWcbaGafmODayNbauaaaeqaaOGagiyBa0MaeiyAaKMaeiOBa42aaiWabeaacqWGPbqAcqWGKbazdaWgaaWcbaGaemODayhabeaakiabcYcaSiabdMgaPjabdsgaKnaaBaaaleaacuWG2bGDgaqbaaqabaaakiaawUhacaGL9baacqGHKjYOcyGGTbqBcqGGPbqAcqGGUbGBdaGadeqaaiabdMgaPjabdsgaKjabcYcaSiabdMgaPjqbdsgaKzaafaaacaGL7bGaayzFaaWaaabuaeaacqWGPbqAcqWGKbazdaWgaaWcbaGaemODayhabeaakmaaqafabaGaemyAaKMaemizaq2aaSbaaSqaaiqbdAha2zaafaaabeaakiabg2da9iabd+eapnaabmGabaGaeiiFaWNaemOvayLaeiiFaWNaeiiFaWNafmOvayLbauaacqGG8baFcyGGTbqBcqGGPbqAcqGGUbGBdaGadeqaaiabdMgaPjabdsgaKjabcYcaSiabdMgaPjqbdsgaKzaafaaacaGL7bGaayzFaaaacaGLOaGaayzkaaaaleaacuWG2bGDgaqbaiabgIGiolqbdsfauzaafaaabeqdcqGHris5aaWcbaGaemODayNaeyicI4SaemivaqfabeqdcqGHris5aaWcbaGafmODayNbauaacqGHiiIZcuWGubavgaqbaaqab0GaeyyeIuoaaSqaaiabdAha2jabgIGiolabdsfaubqab0GaeyyeIuoaaaa@8E85@ Counting butterfly quartets in a single tree Reusing the idea of precomputing certain sums enables us to calculate the number of butterfly quartets in a single tree T in time O(|V|). Since the number of butterfly quartets in a single tree is the number of butterfly quartets shared between the tree and itself, we will use shared(T, T) to denote the number of butterfly quartet in T. This notation also emphasizes that computing the number is essentially a comparison of the tree against itself. Given a node v in T we can express the number of quartets it claims in the following way: ∑ i ( F i 2 ) ( ( F ¯ i 2 ) − ∑ j ≠ i ( F j 2 ) ) ,       ( 13 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabdAeagnaaBaaaleaacqWGPbqAaeqaaaGcbaGaeGOmaidaaaGaayjkaiaawMcaamaabmGabaWaaeWaceaafaqabeGabaaabaGafmOrayKbaebadaWgaaWcbaGaemyAaKgabeaaaOqaaiabikdaYaaaaiaawIcacaGLPaaacqGHsisldaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabdAeagnaaBaaaleaacqWGQbGAaeqaaaGcbaGaeGOmaidaaaGaayjkaiaawMcaaaWcbaGaemOAaOMaeyiyIKRaemyAaKgabeqdcqGHris5aaGccaGLOaGaayzkaaaaleaacqWGPbqAaeqaniabggHiLdGccqGGSaalcaWLjaGaaCzcamaabmGabaGaeGymaeJaeG4mamdacaGLOaGaayzkaaaaaa@4E95@ where the Fi's are the subtrees of v. Now let S = ∑ j ( F j 2 ) , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWucqGH9aqpdaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabdAeagnaaBaaaleaacqWGQbGAaeqaaaGcbaGaeGOmaidaaaGaayjkaiaawMcaaaWcbaGaemOAaOgabeqdcqGHris5aOGaeiilaWcaaa@387D@ we can now express (13) as ∑ i ( F i 2 ) ( ( F ¯ i 2 ) − S + ( F i 2 ) ) ⋅       ( 14 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaabmGabaqbaeqabiqaaaqaaiabdAeagnaaBaaaleaacqWGPbqAaeqaaaGcbaGaeGOmaidaaaGaayjkaiaawMcaamaabmGabaWaaeWaceaafaqabeGabaaabaGafmOrayKbaebadaWgaaWcbaGaemyAaKgabeaaaOqaaiabikdaYaaaaiaawIcacaGLPaaacqGHsislcqWGtbWucqGHRaWkdaqadiqaauaabeqaceaaaeaacqWGgbGrdaWgaaWcbaGaemyAaKgabeaaaOqaaiabikdaYaaaaiaawIcacaGLPaaaaiaawIcacaGLPaaaaSqaaiabdMgaPbqab0GaeyyeIuoakiabgwSixlaaxMaacaWLjaWaaeWaceaacqaIXaqmcqaI0aanaiaawIcacaGLPaaaaaa@4B64@ S can be calculated in time O(idv)and using the precomputed S, (14) can be also calculated in time O(idv). Summing the results of (14) for all nodes in T gives the number of quartets in the tree, shared(T, T). The total time usage is ∑ v ∈ T i d v = O ( | V | ) . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaaiabdMgaPjabdsgaKnaaBaaaleaacqWG2bGDaeqaaaqaaiabdAha2jabgIGiolabdsfaubqab0GaeyyeIuoakiabg2da9iabd+eapnaabmGabaGaeiiFaWNaemOvayLaeiiFaWhacaGLOaGaayzkaaGaeiOla4caaa@4016@ Calculating the shared leaf set sizes The algorithms presented above all rely on O(1) time access to the size of the shared leaf set |F ∩ G| for any pair of subtrees F of T and G of T', where F and G each has size at least two, i.e. contains more than a single leaf. We will refer to |F ∩ G| as the intersection size of subtree F and G. In [9] and O(n2) time and space algorithm is presented for computing the intersection sizes of all pairs of subtrees F and G of two binary input trees. A straightforward generalization of this algorithm to two input trees T and T' of arbitrary degrees results in an O(n2dd') time and O(n2) space algorithm, which gives a worst case running time of O(n4). In this section we will present an improved algorithm for computing the intersection sizes of all pairs of subtrees of T and T' which runs in time O(n + |V||V'|) and space O(|V||V'|). We will assume that the size of each subtree F of T and G of T', i.e. |F| and |G|, is available in time O(1). This can be achieved as presented in the next section. Our algorithm for computing all intersection sizes is as follows. Choose an arbitrary node r in T and an arbitrary node r' in T'. Rooting the trees in r and r' respectively gives rise to two rooted trees Tr and T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@, Fig. 10 shows an example of rooting a tree. Calculating the shared leaf set sizes of Tr and T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@ and all subtrees in both trees can be done using: Figure 10 An arbitrary node, r, is chosen as the root in the tree, leading to a rooted tree. | T r ∩ T ′ r | = ∑ i ∑ j | F i ∩ G j | = ∑ i | F i ∩ G | = ∑ j | F ∩ G j | ,       ( 15 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGG8baFcqWGubavdaWgaaWcbaGaemOCaihabeaakiabgMIihlqbdsfauzaafaWaaSbaaSqaaiabdkhaYbqabaGccqGG8baFcqGH9aqpdaaeqbqaamaaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrdaWgaaWcbaGaemOAaOgabeaakiabcYha8jabg2da9maaqafabaGaeiiFaWNaemOray0aaSbaaSqaaiabdMgaPbqabaGccqGHPiYXcqWGhbWrcqGG8baFcqGH9aqpdaaeqbqaaiabcYha8jabdAeagjabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeiilaWcaleaacqWGQbGAaeqaniabggHiLdaaleaacqWGPbqAaeqaniabggHiLdaaleaacqWGQbGAaeqaniabggHiLdaaleaacqWGPbqAaeqaniabggHiLdGccaWLjaGaaCzcamaabmGabaGaeGymaeJaeGynaudacaGLOaGaayzkaaaaaa@6853@ where Fi are all subtrees of Tr and Gj are all subtrees of T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@. This can be calculated using dynamic programming in time O(n2): ∑ v ∈ V ∑ v ′ ∈ V ′ d v d v ′ + ∑ v ∈ V ∑ l ′ ∈ L ′ d v + ∑ l ∈ L ∑ v ′ ∈ V ′ d v ′ + ∑ l ∈ L ∑ l ′ ∈ L ′ 1 = O ( n 2 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqbqaamaaqafabaGaemizaq2aaSbaaSqaaiabdAha2bqabaGccqWGKbazdaWgaaWcbaGafmODayNbauaaaeqaaOGaey4kaSYaaabuaeaadaaeqbqaaiabdsgaKnaaBaaaleaacqWG2bGDaeqaaOGaey4kaSYaaabuaeaadaaeqbqaaiabdsgaKnaaBaaaleaacuWG2bGDgaqbaaqabaaabaGafmODayNbauaacqGHiiIZcuWGwbGvgaqbaaqab0GaeyyeIuoaaSqaaiabdYgaSjabgIGiolabdYeambqab0GaeyyeIuoakiabgUcaRmaaqafabaWaaabuaeaacqaIXaqmcqGH9aqpcqWGpbWtdaqadiqaaiabd6gaUnaaCaaaleqabaGaeGOmaidaaaGccaGLOaGaayzkaaaaleaacuWGSbaBgaqbaiabgIGiolqbdYeamzaafaaabeqdcqGHris5aaWcbaGaemiBaWMaeyicI4SaemitaWeabeqdcqGHris5aaWcbaGafmiBaWMbauaacqGHiiIZcuWGmbatgaqbaaqab0GaeyyeIuoaaSqaaiabdAha2jabgIGiolabdAfawbqab0GaeyyeIuoaaSqaaiqbdAha2zaafaGaeyicI4SafmOvayLbauaaaeqaniabggHiLdaaleaacqWG2bGDcqGHiiIZcqWGwbGvaeqaniabggHiLdaaaa@74CD@ Except |Tr ∩ T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@|, the shared leafset sizes calculated by (15) are the shared leaf set sizes of all rooted subtrees of T and T' that do not contain the nodes r and r'. Assuming that the subtree F of T does not contain r, then the subtree F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@does contain r and similarly for r' and subtrees G and G¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGhbWrgaqeaaaa@2DDB@ of T'. The shared leaf set sizes of these trees that do contain r and r' can be calculated from the intersection sizes that we have available using (16): | F ¯ ∩ G | = | G | − | F ∩ G | | F ∩ G ¯ | = | F | − | F ∩ G | | F ¯ ∩ G ¯ | = n − ( | G | + | F | − | F ∩ G | )       ( 16 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqaaeWadaaabaGaeiiFaWNafmOrayKbaebacqGHPiYXcqWGhbWrcqGG8baFaeaacqGH9aqpaeaacqGG8baFcqWGhbWrcqGG8baFcqGHsislcqGG8baFcqWGgbGrcqGHPiYXcqWGhbWrcqGG8baFaeaacqGG8baFcqWGgbGrcqGHPiYXcuWGhbWrgaqeaiabcYha8bqaaiabg2da9aqaaiabcYha8jabdAeagjabcYha8jabgkHiTiabcYha8jabdAeagjabgMIihlabdEeahjabcYha8bqaaiabcYha8jqbdAeagzaaraGaeyykICSafm4raCKbaebacqGG8baFaeaacqGH9aqpaeaacqWGUbGBcqGHsisldaqadiqaaiabcYha8jabdEeahjabcYha8jabgUcaRiabcYha8jabdAeagjabcYha8jabgkHiTiabcYha8jabdAeagjabgMIihlabdEeahjabcYha8bGaayjkaiaawMcaaaaacaWLjaGaaCzcamaabmGabaGaeGymaeJaeGOnaydacaGLOaGaayzkaaaaaa@7587@ In other words all shared leaf set sizes can be calculated in time O(n2). First the shared leaf set sizes for subtrees that to not contain an arbitrary node r and r' from each tree are calculated in time O(n2). Then the shared leaf set sizes of subtrees that do contain r or r' (or both) are calculated constant time for each shared leaf set. Since there are O(n2) shared leaf sets the total time usage is O(n2). The reduction to time O(n + |V||V'|) and space O(|V||V'|) is done by handling the cases where F, Fi, G or Gj is a leaf in a special way. For each pair of nodes v, v' we let Leaf [v, v'] be the number of leaves directly connected to v that have the same label as a leaf directly connected to v'. Leaf [v, v'] is constructable in time O(n + |V|||V'|) in the following way: First, set Leaf [v, v'] = 0 for all pairs of nodes v, v'. Given a leaf number, x, there is a unique node, node(x), in T and a unique node, node'(x), in T'. For each leaf number, x, we increment Leaf [node(x), node'(x)]. There are n such numbers, and by assumption, the leaves can be found in constant time given a number. Thus Leaf [v, v'] is constructable in time O(n + |V||V'|). We choose r and r' in T and T' and create two rooted trees Tr and T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@. The non-leaf children of r in Tr are F1,..., Fx and the non-leaf children of r' in T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@ are G1,..., Gy. The intersection size of the two trees can be defined recursively as: | T r ∩ T ′ r | = L e a f [ r , r ′ ] + ∑ i = 1 x | F i ∩ T ′ r | + ∑ j = 1 y | T r ∩ G j | − ∑ i = 1 x ∑ j = 1 y | F i ∩ G j |       ( 17 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGG8baFcqWGubavdaWgaaWcbaGaemOCaihabeaakiabgMIihlqbdsfauzaafaWaaSbaaSqaaiabdkhaYbqabaGccqGG8baFcqGH9aqpcqWGmbatcqWGLbqzcqWGHbqycqWGMbGzdaWadiqaaiabdkhaYjabcYcaSiqbdkhaYzaafaaacaGLBbGaayzxaaGaey4kaSYaaabCaeaacqGG8baFcqWGgbGrdaWgaaWcbaGaemyAaKgabeaakiabgMIihlqbdsfauzaafaWaaSbaaSqaaiabdkhaYbqabaGccqGG8baFaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabdIha4bqdcqGHris5aOGaey4kaSYaaabCaeaacqGG8baFcqWGubavdaWgaaWcbaGaemOCaihabeaakiabgMIihlabdEeahnaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeyOeI0YaaabCaeaadaaeWbqaaiabcYha8jabdAeagnaaBaaaleaacqWGPbqAaeqaaOGaeyykICSaem4raC0aaSbaaSqaaiabdQgaQbqabaGccqGG8baFaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabdMha5bqdcqGHris5aaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemiEaGhaniabggHiLdaaleaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWG5bqEa0GaeyyeIuoakiaaxMaacaWLjaWaaeWaceaacqaIXaqmcqaI3aWnaiaawIcacaGLPaaaaaa@84B8@ The first term counts all leaves directly connected to both r and r'. The second term counts all leaves connected directly to r', that are also in Tr, but not directly connected to r. The third term counts all leaves connected directly to r, that are also in T′r MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGubavgaqbamaaBaaaleaacqWGYbGCaeqaaaaa@2F82@, but not directly connected to r'. Summing these three terms counts all leaves present in both subtrees, but leaves not connected directly to the roots are counted twice, and are subtracted by the last term. Since (17) is only summing over the non-leaf children of a given internal node, calculating the shared leaf set sizes of all these pairs of subtrees can be done using dynamic programming in time: ( n + | V | | V ′ | ) + ∑ v ∈ V i d v + ∑ v ′ ∈ V ′ i d v ′ + ∑ v ∈ V ∑ v ′ ∈ V ′ i d v i d v ′ = O ( n + | V | | V ′ | ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqadiqaaiabd6gaUjabgUcaRiabcYha8jabdAfawjabcYha8jabcYha8jqbdAfawzaafaGaeiiFaWhacaGLOaGaayzkaaGaey4kaSYaaabuaeaacqWGPbqAcqWGKbazdaWgaaWcbaGaemODayhabeaakiabgUcaRmaaqafabaGaemyAaKMaemizaq2aaSbaaSqaaiqbdAha2zaafaaabeaaaeaacuWG2bGDgaqbaiabgIGiolqbdAfawzaafaaabeqdcqGHris5aOGaey4kaSYaaabuaeaadaaeqbqaaiabdMgaPjabdsgaKnaaBaaaleaacqWG2bGDaeqaaOGaemyAaKMaemizaq2aaSbaaSqaaiqbdAha2zaafaaabeaakiabg2da9iabd+eapnaabmGabaGaemOBa4Maey4kaSIaeiiFaWNaemOvayLaeiiFaWNaeiiFaWNafmOvayLbauaacqGG8baFaiaawIcacaGLPaaaaSqaaiqbdAha2zaafaGaeyicI4SafmOvayLbauaaaeqaniabggHiLdaaleaacqWG2bGDcqGHiiIZcqWGwbGvaeqaniabggHiLdaaleaacqWG2bGDcqGHiiIZcqWGwbGvaeqaniabggHiLdaaaa@74EC@ By the same arguments as above, the rest of the shared leaf set sizes can be computed in time O(|V||V'|) and space O(|V||V'|). Therefore the total running time of the algorithm is O(n + |V||V'|) and space usage is O(|V||V'|). Calculating the sizes of all subtree leaf sets All of the above algorithms make use of the sizes of the leaf sets of the rooted subtrees of the input trees, either directly or indirectly. Rooting T in an arbitrary node r gives rise to the rooted tree Tr. Every subtree Fx of Tr is a rooted subtree of T, and F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@x is also a rooted subtree of T. Note that the set of subtrees Fx ∪ F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@x, since one tree is the complement of the other, contains all subtrees of T and that |F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@x| = n - |Fx|. By using dynamic programming the sizes of all subtrees, Fx, can be computed by a single traversal of Tr. For each Fx the size of F¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGgbGrgaqeaaaa@2DD9@x can be computed in constant time, since n is known. This means that all leaf set sizes of a tree of arbitrary degree can be calculated in time O(n).