3 Simple algorithm for computing all maximal cliques in c-max-tolerance graphs The sweep-line algorithm presented in this section is based on the semi-square intersection graph corresponding to a given c-max-tolerance graph. We will show that the theoretical runtime is a bit higher than the one given in the following section, but this algorithm is very easy to implement and it will lay the conceptual basis for the more involved algorithm presented in the next section. We will show in Section 5 that the implementation based on this simpler algorithm is nonetheless very fast, it will compute all maximal cliques in real data sets within a few seconds. The virtual sweep is done along the x-axis, handling two kinds of events: The birth of semi-squares and their death, i.e., start and end of a semi-square, respectively. As long as the sweep handles events with an x-position within the basis of a semi-square X, X is said to be active. The so-called y-structure Q holds all maximal cliques of active members at any time and every maximal clique is contained only once (y-structure invariant), where maximality refers only to the set of active semi-squares and not necessarily to the set of all semi-squares. In the first step all x-events are sorted non-decreasingly according to their x-position. If there are death and birth events at the same x-position, the birth events are first inserted into the sorted list. Then, the events are extracted from the sorted list which enables the virtual x-axis sweep. For every birth event, every maximal clique C in the y-structure is checked, whether some of the semi-squares are intersecting with the newborn semi-square X. We have to differentiate the following cases: 1. The handling is straightforward if X intersects with all semi-squares in the clique. Then, X can just be added to the clique. 2. For every clique where at least one but not all semi-squares are intersecting X, a candidate clique is built with all intersecting semi-squares together with the newborn semi-square. This operation is called the splitting of a clique. It is important to note that not all candidate cliques are really maximal, as is illustrated in Fig. 3. To maintain the y-structure invariant, it is important to insert only cliques that are maximal with respect to the set of active semi-squares. Thus, the following cases have to be checked: (a) Can any other active interval be added to the candidate clique? Then it is not a maximal clique within the set of active semi-squares and must not be added to Q. (b) After all cliques C in Q have been checked and all candidate cliques built: Is there already a clique in Q that is identical to the candidate clique? Then it must not be added to Q. Are there two or more candidate cliques that are identical? Only one of them can be added to the y-structure. 3. If after these steps none of the cliques in Q contain the newborn interval X, it is the first element of a new clique which is added to Q. If the event signals the death of a semi-square, it has to be deleted from every clique C it is an element of. If any of these cliques has been increased by the addition of a semi-square since the last deletion of a semi-square, it is a maximal clique and has to be printed out before the semi-square is deleted. Else, the semi-square is just deleted from C, without printing out the clique. A deletion can yield a reduced clique that is not unique anymore, or that is only a subset of some other clique in Q. Thus, for every clique in which some semi-square has been deleted, one has to verify its uniqueness and extensibility, and it will be deleted from Q if it is not unique or if it is extendable. Note that after checking these two properties, the y-structure invariance holds, because the y-structure just holds all maximal cliques of active semi-squares. Thus, not every maximal clique in the y-structure is a maximal clique with respect to the whole set of semi-squares. The y-structure suffices to guarantee the correctness of this algorithm, because it guarantees that all maximal cliques of active semi-squares are kept in Q. Since a semi-square X is only non-active if it has not yet begun or if the rest of its corresponding interval is not long enough for any new semi-square to overlap it enough, it cannot be an element of a clique when it is non-active. For its active time, the y-structure guarantees that all its maximal cliques are contained in Q. This algorithm is thus correct because the y-structure invariance is maintained in all steps. The pseudocode is sketched in Fig. 4. Figure 4 Algorithm for computing all maximal cliques. Runtime analysis There are exactly 2n x-events that are sorted in O(n log n). For every birth of a semi-square X we have to check all cliques in Q whether they contain at least one semi-square that is intersecting with X. In the worst case, the y-structure can hold up to O(n3) cliques (see Lemma 1) with O(n) elements each. For the further analysis we will denote the maximal number of cliques in the y-structure by |Q|. Checking whether X can be added to a clique or building a candidate clique can thus be done in O(n * |Q|) for all cliques in Q. Since every newborn semi-square can either be added to a clique, split it, or not intersect any of its elements, there can be at most O(|Q|) candidate cliques with size O(n). For each of them we have to check whether it is extendable by one of the active semi-squares and whether there is a second candidate clique that is equal to that one. To check for equality we have to match each candidate clique with each other candidate clique which takes O(|Q|2 * n2) in the naive approach. To check for extensibility we have to test each active semi-square whether it can extend the candidate clique. This can be done in O(|Q| * n), naively. This results in a theoretical runtime of O(|Q|2 * n2) per birth. For death events, the clique has to be printed if it is a maximal clique. Altogether the size of out is equal to the number of symbols to print all maximal cliques. Since there are at most n elements per clique, the size of out is O(n4). Furthermore, the deletion of a semi-square in a clique can result in a clique that is no longer maximal with respect to all active semi-squares or that is equal to another clique in the y-structure. Thus, we have to check for equality and extensibility again. The clique will be deleted from the y-structure if its size is 0, if it is equal to some other clique in the y-structure, or if it is extendable by one of the other active semi-squares. Also this step requires a runtime of O(|Q|2 * n2). The theoretical runtime for the whole algorithm is thus given by O(|Q|2 * n3 + out), which is in the worst case O(n9). We will now present some small improvements in the data structure that help to reduce the work for most real data sets. First, as a fast look-up we will compute the adjacency matrix adj for the intersection of all pairs of semi-squares in O(n2). As we noted before, each semi-square can be characterized by yi + xi, the diagonal on which the hypotenuse lies. A clique C is now represented by two ordered lists: H(C), a list of lists of semi-squares for each diagonal on which at least one semi-square has its hypotenuse, and B(C) a list of lists of semi-squares for each y-coordinate on which at least one basis of any semi-square in the clique is positioned. A clique at a given x-position is now an assembly of ordered hypotenuses and bases as depicted in Fig. 5. A new semi-square X can intersect the set of bases and hypotenuses in the following ways: Figure 5 Clique {2, 3,4} at x = 2 has a global ordering of B(2), B(3), B(4), H(2), H(3), H(4) and a global ordering of B(2), B(3), H(2), H(3), B(4), H(4) at x = 4. When an interval is added to the y-structure it can either intersect all of the bases of a clique, or only a part of the bases, and/or all of the hypotenuses, or just a part of them. If it intersects all bases and/or hypotenuses, it can simply be added to that clique, if it only intersects a part of the bases and/or hypotenuses, it will split the clique. 1. X intersects none of the bases and none of the hypotenuses. Since the bases are ordered, this only occurs if the upper point of X is lower than the lowest basis, or if the basis of X is higher than the highest hypotenuse. This can be calculated in O(1) because B(C) and H(C) are sorted. This results in a runtime of O(|Q|) for all cliques per birth event. 2. X intersects either all bases and/or all hypotenuses. This is the case if X's basis is lower than the lowest basis in C and X's upper point is higher than the highest basis, and/or X's basis is lower than the lowest hypotenuse and its upper point is higher than C's highest hypotenuse. Both can be checked in O(1). If this is the case, X has just to be added to C. To insert X's basis and hypotenuse at the correct place, a binary search with X's height of the basis and the height of its hypotenuse is accomplished in O(log n) for every clique it is added to, i.e. with a runtime of at most O(|Q| log n) per birth event. 3. X neither intersects all bases nor all hypotenuses, but at least one basis or hypotenuse in C. In this case, X will split the clique. It is easiest to just run through both lists, B(C) and H(C), and look in adj which semi-squares of C and X are intersecting. By running through the ordered lists, we can just add all intersecting semi-squares in the same order to the new candidate clique, maintaining the correct sorting. Each clique has at most n elements, look-up of adjacency is done in O(1) and adding to the list is also done in O(1) per intersecting semi-square. Last, X has to be inserted into the candidate clique, again in O(log n). Thus, generating a candidate clique can be done in O(n), resulting in |Q| * n for all cliques per birth event. To find out whether this candidate clique C' is maximal we use the following trick which is quite fast: Given the semi-square Xs representing the shortest interval, calculate the set of all active semi-squares that are not in C' but do intersect Xs. This set Z can be determined in O(n) by checking X's row in adj. Then we have to examine the extensibility of C' for only those semi-squares in Z. This takes O(n2), resulting in O(|Q| * n2) for every birth event. The check for equality is also quite cheap, because Q can again be ordered by for example the height of the lowest basis in a clique. This equality search for any other clique in the y-structure with the same height of the lowest basis can be done by a binary search in O(|Q| log |Q|). Since all cliques are ordered, a check on equality can be done in O(n), resulting in O(|Q| * n). Inserting the clique in the correct place in Q can again be done in O(|Q| log |Q|). Again, the deletion of a semi-square will result in an addend of out to the runtime, and we have to check for equality and extensibility. Thus, our theoretical runtime is O(|Q| * n3+ out) which in the worst case is given by O(n6). Note that for real data sets this theoretical runtime will not be reached: First of all, intervals have very different lengths and therefore not all of them will be active at the same time. Second, most of the time the set Z of candidates for the extensibility test is very small. The same is true for the set of cliques with the same height of their lowest basis. Since the test for extensibility and uniqueness are the most expensive steps in the algorithm, this shows why the implementation can be so fast in real data sets, as will be demonstrated in section 5. But first we will give a more elaborated algorithm that shows that the runtime complexity of finding all maximal cliques in c-max-tolerance graphs is considerably lower than the corresponding runtime in max-tolerance graphs.