PMC:1524773 / 20384-27559 JSON TXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1524773","sourcedb":"PMC","sourceid":"1524773","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1524773","text":"Runtime analysis\nThere are exactly 2n x-events that are sorted in O(n log n). For every birth of a semi-square X we have to check all cliques in Q whether they contain at least one semi-square that is intersecting with X. In the worst case, the y-structure can hold up to O(n3) cliques (see Lemma 1) with O(n) elements each. For the further analysis we will denote the maximal number of cliques in the y-structure by |Q|. Checking whether X can be added to a clique or building a candidate clique can thus be done in O(n * |Q|) for all cliques in Q. Since every newborn semi-square can either be added to a clique, split it, or not intersect any of its elements, there can be at most O(|Q|) candidate cliques with size O(n). For each of them we have to check whether it is extendable by one of the active semi-squares and whether there is a second candidate clique that is equal to that one. To check for equality we have to match each candidate clique with each other candidate clique which takes O(|Q|2 * n2) in the naive approach. To check for extensibility we have to test each active semi-square whether it can extend the candidate clique. This can be done in O(|Q| * n), naively. This results in a theoretical runtime of O(|Q|2 * n2) per birth. For death events, the clique has to be printed if it is a maximal clique. Altogether the size of out is equal to the number of symbols to print all maximal cliques. Since there are at most n elements per clique, the size of out is O(n4). Furthermore, the deletion of a semi-square in a clique can result in a clique that is no longer maximal with respect to all active semi-squares or that is equal to another clique in the y-structure. Thus, we have to check for equality and extensibility again. The clique will be deleted from the y-structure if its size is 0, if it is equal to some other clique in the y-structure, or if it is extendable by one of the other active semi-squares. Also this step requires a runtime of O(|Q|2 * n2). The theoretical runtime for the whole algorithm is thus given by O(|Q|2 * n3 + out), which is in the worst case O(n9).\nWe will now present some small improvements in the data structure that help to reduce the work for most real data sets. First, as a fast look-up we will compute the adjacency matrix adj for the intersection of all pairs of semi-squares in O(n2).\nAs we noted before, each semi-square can be characterized by yi + xi, the diagonal on which the hypotenuse lies. A clique C is now represented by two ordered lists: H(C), a list of lists of semi-squares for each diagonal on which at least one semi-square has its hypotenuse, and B(C) a list of lists of semi-squares for each y-coordinate on which at least one basis of any semi-square in the clique is positioned. A clique at a given x-position is now an assembly of ordered hypotenuses and bases as depicted in Fig. 5. A new semi-square X can intersect the set of bases and hypotenuses in the following ways:\nFigure 5 Clique {2, 3,4} at x = 2 has a global ordering of B(2), B(3), B(4), H(2), H(3), H(4) and a global ordering of B(2), B(3), H(2), H(3), B(4), H(4) at x = 4. When an interval is added to the y-structure it can either intersect all of the bases of a clique, or only a part of the bases, and/or all of the hypotenuses, or just a part of them. If it intersects all bases and/or hypotenuses, it can simply be added to that clique, if it only intersects a part of the bases and/or hypotenuses, it will split the clique. 1. X intersects none of the bases and none of the hypotenuses. Since the bases are ordered, this only occurs if the upper point of X is lower than the lowest basis, or if the basis of X is higher than the highest hypotenuse. This can be calculated in O(1) because B(C) and H(C) are sorted. This results in a runtime of O(|Q|) for all cliques per birth event.\n2. X intersects either all bases and/or all hypotenuses. This is the case if X's basis is lower than the lowest basis in C and X's upper point is higher than the highest basis, and/or X's basis is lower than the lowest hypotenuse and its upper point is higher than C's highest hypotenuse. Both can be checked in O(1). If this is the case, X has just to be added to C. To insert X's basis and hypotenuse at the correct place, a binary search with X's height of the basis and the height of its hypotenuse is accomplished in O(log n) for every clique it is added to, i.e. with a runtime of at most O(|Q| log n) per birth event.\n3. X neither intersects all bases nor all hypotenuses, but at least one basis or hypotenuse in C. In this case, X will split the clique. It is easiest to just run through both lists, B(C) and H(C), and look in adj which semi-squares of C and X are intersecting. By running through the ordered lists, we can just add all intersecting semi-squares in the same order to the new candidate clique, maintaining the correct sorting. Each clique has at most n elements, look-up of adjacency is done in O(1) and adding to the list is also done in O(1) per intersecting semi-square. Last, X has to be inserted into the candidate clique, again in O(log n). Thus, generating a candidate clique can be done in O(n), resulting in |Q| * n for all cliques per birth event. To find out whether this candidate clique C' is maximal we use the following trick which is quite fast: Given the semi-square Xs representing the shortest interval, calculate the set of all active semi-squares that are not in C' but do intersect Xs. This set Z can be determined in O(n) by checking X's row in adj. Then we have to examine the extensibility of C' for only those semi-squares in Z. This takes O(n2), resulting in O(|Q| * n2) for every birth event. The check for equality is also quite cheap, because Q can again be ordered by for example the height of the lowest basis in a clique. This equality search for any other clique in the y-structure with the same height of the lowest basis can be done by a binary search in O(|Q| log |Q|). Since all cliques are ordered, a check on equality can be done in O(n), resulting in O(|Q| * n). Inserting the clique in the correct place in Q can again be done in O(|Q| log |Q|).\nAgain, the deletion of a semi-square will result in an addend of out to the runtime, and we have to check for equality and extensibility. Thus, our theoretical runtime is O(|Q| * n3+ out) which in the worst case is given by O(n6). Note that for real data sets this theoretical runtime will not be reached: First of all, intervals have very different lengths and therefore not all of them will be active at the same time. Second, most of the time the set Z of candidates for the extensibility test is very small. The same is true for the set of cliques with the same height of their lowest basis. Since the test for extensibility and uniqueness are the most expensive steps in the algorithm, this shows why the implementation can be so fast in real data sets, as will be demonstrated in section 5. But first we will give a more elaborated algorithm that shows that the runtime complexity of finding all maximal cliques in c-max-tolerance graphs is considerably lower than the corresponding runtime in max-tolerance graphs.","divisions":[{"label":"title","span":{"begin":0,"end":16}},{"label":"p","span":{"begin":17,"end":2104}},{"label":"p","span":{"begin":2105,"end":2350}},{"label":"p","span":{"begin":2351,"end":2960}},{"label":"figure","span":{"begin":2961,"end":3482}},{"label":"label","span":{"begin":2961,"end":2969}},{"label":"caption","span":{"begin":2971,"end":3482}},{"label":"p","span":{"begin":2971,"end":3482}},{"label":"p","span":{"begin":3483,"end":3841}},{"label":"p","span":{"begin":3842,"end":4466}},{"label":"p","span":{"begin":4467,"end":6153}}],"tracks":[]}

PMC:1524773 / 20384-27559 JSONTXT

Annnotations TAB JSON ListView MergeView

PMC:1524773 / 20384-27559 JSON TXT