4 An efficient output-sensitive algorithm to determine all maximal cliques in c-max-tolerance graphs 4.1 Introductory discussion We shortly recall the O(n3 + out) algorithm from [9]. In this algorithm, it has been shown that each maximal clique is uniquely described by the 3 parameters t, h and v denoting the hypotenuse t of the lowest semi-square, the highest base h of any semi-square and the rightmost vertical side v. The drawback of the algorithm given in [9] is that it needs O(n3) time even if there are only very few maximal cliques. In the case of c-max-tolerance graphs we can now present a considerably improved output-sensitive algorithm. Our description consists of two steps: First we will give an algorithm that computes all candidates for maximal cliques with one fixed parameter, say a given lowest hypotenuse t. We show that all maximal cliques with fixed parameter t can be determined in time O(n log n + out) where out is the size of the output. The problem is, however, that these maximal cliques could still be extendable by a semi-square with an even lower hypotenuse t'. Thus, such a maximal clique with fixed parameter t is only a candidate that has to be checked for extensibility before printing it out. In the second step, we will show how to avoid the computation of candidates that do not represent maximal cliques such that our final algorithm will truly be output-sensitive. 4.2 Maximal cliques regarding parameter t Let t be the hypotenuse of the lowest semi-square in the cliques, and let r (t) denote the whole square defined by the diagonal t. Note that it makes sense to use t in the two different but related contexts. Using the same notation as in [9], P(t) denotes the set of semi-squares that include the left endpoint of t, Q(t) is the set of semi-squares that include t's right endpoint, and R (t) denotes the set of semi-squares that intersect t but include none of its endpoints. Ps(t), Qs(t), and Rs(t) denote the set of the full squares corresponding to the set of semi-squares P(t), Q(t) and R (t), while more importantly, we will consider the sets Pr(t), Qr(t) and Rr(t) which denote the set of rectangles given by the intersection of r (t) with each single element of Ps(t), Qs(t) and Rs(t). Note that the left upper corner of r (t) is also the left upper corner for the rectangles in Pr(t), while the right lower corners are the same as that of the corresponding semi-squares in P(t). Similar facts hold for rectangles in Qr(t) and Rr(t). The following observation provides the idea to how to determine the maximal cliques with lowest semi-square r (t): Each point x ∈ r (t) is overlapped by a set of rectangles from Pr(t) ⋃ Qr(t) ⋃ Rr(t). The crucial observation is that their corresponding semi-squares are also pairwise intersecting, so x ∈ r (t) denotes a clique of semi-squares. This is important, because we state here that in the restriction to r (t) we have an equivalence between the intersection of the rectangles and the intersection of the semi-squares. Thus, the question to finding maximal cliques in the c-max-tolerance graph is now reduced to finding an area where a maximal set of rectangles intersects. The points intersected by the same set of rectangles form simple orthogonal connected polygons p. Each such polygon p is also characterized by the cardinality of this set which we call cover (p). Polygonal regions p' adjacent to p have either an additional intersection by another rectangle or one intersection missing. So the cover-variables of adjacent polygons differ by exactly 1 (see Fig. 6). If all adjacent polygons have a lower cover-parameter than cover (p), p denotes a maximal clique. We call p locally maximal in this case. Figure 6 Computing the maximal cliques regarding the hypotenuse t by intersection r (t) with Pr(t),Qr(t) and Rr(t). In the example, we have two rectangles from Pr(t), two from Rr(t) and one from Q(t). The numbers denote the cover-values of the corresponding polygons. Lemma 3 Locally maximal polygons in r (t) exactly determine the maximal cliques with lowest hypotenuse t. The algorithm We perform a left-to-right sweep. As the underlying data structure we keep the list L of all polygons which are currently intersected by the sweepline as well as a distinct list LM of all locally maximal polygons. We start our sweep at the left side of r (t) initializing list L by the polygons defined by all rectangles of Pr(t) in increasing order of their lower boundaries. LM is initialized with the topmost polygon representing the intersection of all rectangles in Pr(t). Two basic events occur while sweeping from left to right: 1. A rectangle from Pr(t) ends 2. A rectangle from Qr(t) or Rr(t) is added. 1. Let s be the rectangle from Pr(t) with the right lower corner c that ends. Since s intersects all polygons from c up to the upper boundary or r (t), the removal of s decreases the cover-variables by one for all polygons above c, the two polygons adjacent to c have to join, and the maximal polygons above c have to be output. The join-operation can be done by updating list L after locating the two polygons by binary search for c. The output operation can be performed easily by scanning list LM from the top until the y-coordinate of c has been reached. Note that there will not arise any new local maxima, and all previous maxima remain. One important speciality is that the maxima we just output should not be output even they still represent maxima. We call them false maxima, we remove them from the list LM and insert them in a list LF ordered by their y-coordinates. Note that false maxima have to be reinserted into list LM again as soon as they are covered by a new rectangle from either Qr(t) or Rr(t). This might happen as described in the next case: 2. A rectangle s from Qr(t) ⋃ Rr(t) starts to be intersected by the sweepline. We discuss the case that s ∈ Qr(t), the other case is symmetrical. Let c be the left upper corner of rectangle s. s adds a new intersection to all polygons below c, the polygon containing c is split into two and the cover-variables of all of them increase by one. Note that a new locally maximal polygon might arise below c. All false maxima below c become 'true' maxima again. They are deleted from LF and inserted again into LM. The operation can be performed by locating the polygon to be split by binary search in L, scanning the list LF until the y-coordinate of c is reached, and inserting all false maxima back into LM. Analysis: At each event we have to perform a binary search for corner c. A direct implementation includes the insertion and deletion of maxima and false maxima into the corresponding lists in time O(log n). Finally it leads to a runtime of O(n log n + C(t) log n + out), where C(t) denotes the number of maximal cliques with lowest hypotenuse t. Next we will show how to improve the efficiency of the algorithm to O(n log n + out): Instead of two separated lists LM and LF we keep only one doubly linked list LD of interleaved blocks containing false maximals and 'true' maximals ordered by y-coordinates. A block denotes a maximal sequence of maxima of one or the other kind. We keep also the blocks internally connected. Each block is created by a certain event and it will be removed eventually. We count only the number of block creations, which naturally links the number of block removals, and show that only O(n + C(t)) blocks will be created. One possible event is the creation/removal of a polygon adjacent to corner c of a single maximum in the list LD. Possibly one new block is created by splitting an old block into two. After having located the position of the block, corresponding to the y-coordinate of c, by binary search, this can be done in constant time. Clearly there are at most O(n) such events. All other events consist of browsing through the lists of blocks starting from the upper or lower boundary of rectangle r (t), output the contents and join adjacent blocks until reaching the polygon with y-coordinate of corner c. Hence in these events, each operation creates one or two new blocks, but might remove k blocks in time O(k). In total, we are using only time O(n log n) time. Lemma 4 Determining the maximal cliques with lowest parameter t takes time O(n log n + out). 4.3 Avoiding false maximal cliques when computing candidates for hypotenuse t In the previous subsection, we have described the efficient computation of all maximal cliques having a specific triangle t as their lowest element. To provide truly output-sensitive algorithms we have to notice that some of those cliques, say M, might not be maximal overall, since there might be a triangle s(t') with hypotenuse t' such that M' = M ⋃ s(t') is also a clique. Clearly t' is lower than t. Note that M' will be found when computing the maximal cliques with lowest element t'. So, when considering t we have to avoid those cliques, which are not truly maximal. 4.3.1 The intersection staircase The first idea is to compute for rectangle r (t) the intersection of r (t) with all r (t') where t' does not belong to P(t) ⋃ Q(t) ⋃ R (t). Clearly, polygons in r (t') ⋂ r (t) might represent cliques which are maximal in r (t') but are false maximal in r (t) since t' has not been considered there. To neglect the area where such rectangles r (t') intersect r (t) seems to be a good first step, although it will turn out that this is not sufficient. The union of those intersections is determined by a set of maximal rectangles, which form a kind of a staircase pattern above the diagonal of r (t) which we call the intersection staircase. The intersection staircase will be represented by a list of the right upper endpoints with decreasing y-coordinates and increasing x-coordinates. The computation of the intersection staircase can be done using a list of the rectangles r (t') decreasingly ordered according to the y-coordinate of their upper boundaries. For each rectangle r (t') we check if it intersects the diagonal of r (t) and does not belong to P(t) ⋃ Q(t) ⋃ R (t). If the right boundary of the last element of the actual intersection staircase is also intersected by the upper boundary of r (t') or if the last element of the intersection staircase ends above r (t'), r (t') is appended to the list of the intersection staircase. Clearly, the intersection staircase can be computed in time O(n log n). Unfortunately, this is not sufficient, since there might be rectangles r (t") ∈ Pr(t) ⋃ Qr(t) ⋃ Rr(t) which intersect r (t') but not the diagonal t' of r (t'), hence they have not been taken into account. Clearly, the intersection of r (t") with r (t) ⋂ r (t') has to be considered if it infers new cliques which have not been found when processing r (t'). Hence the intersection staircase which defines the forbidden area when processing r (t) has to be refined and its area will decrease (see Fig. 7). Figure 7 The grey area denotes the basic intersection staircase, which describes the maximal cliques which potentially must not be considered since they have been considered already during the computation of the maximal cliques for hypotenuses like t' lower than t. In the example, we have three diagonals t' lower than t whose rectangles r (t') intersect t. 4.3.2 Refinement of the staircase We keep the intersection staircase as an ordered list of right upper corners of the corresponding rectangles r (t'). First we describe how rectangles from P(t) influence the intersection staircase. We consider all the rectangles of P(t) and how they intersect the staircase. On this behalf, we sort the lower left corners ai of the rectangles pi according to the difference between y- and x-coordinates and process the rectangles in this ordering. Clearly, all those corners lie to the left of the left boundary of r (t). Assume that we have already processed the corners a1,..., ai-1 as well as the topmost j steps of the staircase with j ≥ 0. Let s(tj) be the actual step defined by rectangle r (tj) to be considered. We consider pi. If pi is below of the diagonal of r (tj), it can be neglected since it has been considered while processing r (tj). Otherwise, if the lower boundary intersects step s(tj), the upper part of s(tj) is cut off, and we proceed to Pi+1. If the lower boundary is even lower than the whole step, s(tj) is completely cut off, and we proceed to s(tj+1). In this case, we reconsider pi (see Fig. 8). Figure 8 The basic intersection staircase shown in grey must be refined since there are rectangles like Pi ∈ Pr(t) and r ∈ Rr(t) which have not been considered before in the runs for t' and which reduce the basic intersection staircase such that a lower staircase shown in darkgrey remains. Hence for the computation of the maximal cliques regarding t we have to reduce the area of the upper half only by the refined staircase. Clearly, since there are at most n points pi and steps s(tj), the whole process takes O(n log n). Analogously, we can refine the intersection staircase by considering the rectangles in Q(t), which have not been considered in the staircase-defining rectangles r (t'). For rectangles in R (t), the refinement is slightly different, so we will consider this in more detail: The intersection staircase now separates the area representing cliques that have not been computed before from those which either have been computed before or which have not been assigned yet. Hence we only move the intersection staircase downwards! The intersection staircase consists of a list of points c1,...,ck which are not necessarily corners of rectangles, but at least they can uniquely be assigned to rectangles r (t') ∌ Pr(t) ⋃ Qr(t) ∈ Rr(t). Assume that these points are ordered with decreasing y-coordinates. We denote the corresponding diagonals t1,...,tk analogously. First we state an important observation: Lemma 5 Only rectangles r (t') which are part of the intersection staircase have maximal diagonals. Hence the non-maximal r (t') that is not in P(t) ⋃ Q(t) ⋃ R (t) does not need to be considered. The elements of Rr(t) are similarly ordered according to the difference between y- and x-coordinate of their left lower corner. The idea is that when comparing a rectangle r (t') from the staircase with an element s from Rr(t) then the staircase should not be changed if s intersects t'. Otherwise, the intersection of s must be cut off from the staircase. More formally, we start with the topmost step of the staircase and let t' be the corresponding diagonal. Let smax be the first element in the ordered list from Rr(t). If the left lower corner is below t', then smax has been considered already in the computation of the maximal cliques regarding t'. Hence it can be disregarded and it does not change the staircase. We can delete smax from the list and let the next element be smax. If the left lower corner of smax is above t', we compute the intersection of smax with the actual step of the staircase, and remove it either completely or only partially depending on the size of the intersection. In the first case, we proceed to the next lower step of the staircase while in the second case, we remove smax from the list and get a new smax. In all cases, the operation can be done in O(1) time, and in total we have only O(n) operations. This concludes the description how we effectively restrict the area of r (t) to be considered for computation of just those maximal cliques which have not been computed before. Hence we can summarize the whole section by Theorem 1 In time O(n2 log n + out) we can determine all maximal cliques in c-max-tolerance graphs. Note that the naive bound of O(n4) for out can be improved to O(n3) by writing down only the differences between subsequent maximal cliques. This is supported by the above methods to determine the maximal cliques in a plane sweep approach.