Anchored alignments for phylogenetic footprinting Evolutionarily conserved regions in non-coding sequences represent a potentially rich source for the discovery of gene regulatory regions. While functional elements are subject to stabilizing selection, the adjacent non-functional DNA evolves much faster. Therefore, blocks of conservation, so-called phylogenetic footprints, can be detected in orthologous non-coding sequences with low overall similarity by comparative genomics [39]. Alignment algorithms, including DIALIGN, were advocated for this task. As the example in the previous section shows, however, anchoring the alignments becomes a necessity in applications to large genomic regions and clusters of paralogous genes. While interspersed repeats are normally removed ("masked") using e.g. RepeatMasker, they need to be taken into account in the context of phylogenetic footprinting: if a sequence motif is conserved hundreds of millions of years it may well have become a regulatory region even if it is (similar to) a repetitive sequence in some of the organisms under consideration [40]. The phylogenetic footprinting program TRACKER [41] was designed specifically to search for conserved non-coding sequences in large gene clusters. It is based on a similar philosophy as segment based alignment algorithms. The TRACKER program computes pairwise local alignments of all input sequences using BLASTZ [42] with non-stringent settings. BLASTZ permits alignment of long genomic sequences with large proportions of neutrally evolving regions. A post-processing step aims to remove simple repeats recognized at their low sequence complexity and regions of low conservation. The resulting list of pairwise alignments is then assembled into clusters of partially overlapping regions. Here the approach suffers from the same problem as DIALIGN, which is, however, resolved in a different way: instead of producing a single locally optimal alignment, TRACKER lists all maximal compatible sets of pairwise alignments. For the case of Figure 1(C), for instance, we obtain both M2M3 and M2M3. Since this step is performed based on the overlap of sequence intervals without explicitly considering the sequence information at all, TRACKER is very fast as long as the number of conflicting pairwise alignments remains small. In the final step DIALIGN is used to explicitly calculate the multiple sequence alignments from the subsequences that belong to individual clusters. For the initial pairwise local alignment step the search space is restricted to orthologous intergenic regions, parallel strands and chaining hits. Effectively, TRACKER thus computes alignments anchored at the genes from BLASTZ fragments. We have noticed [43] that DIALIGN is more sensitive than TRACKER in general. This is due to detection of smaller and less significant fragments with DIALIGN compared to the larger, contiguous fragments returned by BLASTZ. The combination of BLASTZ and an anchored version of DIALIGN appears to be a very promising approach for phylogenetic footprinting. It makes use of the alignment specificity of BLASTZ and the sensitivity of DIALIGN. A combination of anchoring at appropriate genes (with maximal weight) and BLASTZ hits (with smaller weights proportional e.g. to – log E values) reduces the CPU requirements for the DIALIGN alignment by more than an order of magnitude. While this is still much slower than TRACKER (20 min vs. 40 s) it increases the sensitivity of the approach by about 30 – 40% in the Fugu example, Table 1. Work in progress aims at improving the significance measures for local multiple alignments. A more thorough discussion of anchored segment-based alignments to phylogenetic footprinting will be published elsewhere.